Difference between revisions of "Rlegaspi Week 6"
(Reformatted code command line to fit within page borders and inserted answers into the questions portion) |
(Saved answers to third question about number of records.) |
||
Line 65: | Line 65: | ||
#* DrugProduct: <code>cat Product.txt | sed "s/\*\*Federal Register determination that product was not discontinued or withdrawn for safety or efficacy reasons\*\*//g" | sed "s/\*\*Federal Register determination that product was not discontinued or withdrawn for safety or efficacy\*\*//g" | sed "s/\*\*Federal Register notice determination that product was not discontinued or withdrawn for safety or efficacy reasons\*\*//g" | sed "s/\*\*Federal Registration determination that product was not discontinued or withdrawn for safety or efficacy reason\*\*//g" | sed "s/See current Annual Edition, 1.8 Description of Special Situations, Levothyroxine Sodium//g" | sed "s/'//g" | sed "s/\t/#/g" | sed -r "s/( ){2}//g" | sed "s/ \#/#/g" | sed "s/\# /#/g" | sed "s/\#\#/#null#/g" | sed "s/\#\#/#null#/g" | sed "s/\#/,/1" | sed "s/\#/,'/1" | sed "s/\#/','/1" | sed "s/\#/',/1" | sed "s/\#/,'/1" | sed "s/\#/',/1" | sed "s/\#/,'/1" | sed "s/\#/','/1" | sed "s/\r$/'/g" | sed "s/^/insert into drugproduct (ApplNo,ProductNo,Form,Dosage,ProductMktStatus,TECode,ReferenceDrug,drugname,activeingred) values(/g" | sed "s/$/);/g" | sed 1D > ~/public_html/drugproduct.sql.txt</code> | #* DrugProduct: <code>cat Product.txt | sed "s/\*\*Federal Register determination that product was not discontinued or withdrawn for safety or efficacy reasons\*\*//g" | sed "s/\*\*Federal Register determination that product was not discontinued or withdrawn for safety or efficacy\*\*//g" | sed "s/\*\*Federal Register notice determination that product was not discontinued or withdrawn for safety or efficacy reasons\*\*//g" | sed "s/\*\*Federal Registration determination that product was not discontinued or withdrawn for safety or efficacy reason\*\*//g" | sed "s/See current Annual Edition, 1.8 Description of Special Situations, Levothyroxine Sodium//g" | sed "s/'//g" | sed "s/\t/#/g" | sed -r "s/( ){2}//g" | sed "s/ \#/#/g" | sed "s/\# /#/g" | sed "s/\#\#/#null#/g" | sed "s/\#\#/#null#/g" | sed "s/\#/,/1" | sed "s/\#/,'/1" | sed "s/\#/','/1" | sed "s/\#/',/1" | sed "s/\#/,'/1" | sed "s/\#/',/1" | sed "s/\#/,'/1" | sed "s/\#/','/1" | sed "s/\r$/'/g" | sed "s/^/insert into drugproduct (ApplNo,ProductNo,Form,Dosage,ProductMktStatus,TECode,ReferenceDrug,drugname,activeingred) values(/g" | sed "s/$/);/g" | sed 1D > ~/public_html/drugproduct.sql.txt</code> | ||
# Using the command line, how can you determine the number of records in each file? Provide the command. | # Using the command line, how can you determine the number of records in each file? Provide the command. | ||
− | #* | + | #* ''application.txt'' - <code>wc application.txt</code>: 19747 lines of data; however, 1st line is column headings, so 19747 - 1 = 19746 records in ''application.txt'' file. |
+ | #* ''Product.txt'' - <code>wc Product.txt</code>: 32771 lines of data; however, 1st line is column headings, so 32771 - 1 = 32770 records in ''Product.txt'' file. | ||
# Using SQL, how can you determine the number of records in the table corresponding to the file? Provide the SQL <code>select</code> statement. | # Using SQL, how can you determine the number of records in the table corresponding to the file? Provide the SQL <code>select</code> statement. | ||
#* | #* |
Revision as of 02:28, 15 October 2015
The FDA Drug Database
The United States Food and Drug Administration (FDA) provides, as a matter of public record, the full data set for its approved drugs at this website: http://www.fda.gov/Drugs/InformationOnDrugs/ucm079750.htm
The site includes both a link to the downloadable files (compressed in .zip format) and the schema (“entity relationship diagram”) for those files.
Using these files, what you have learned about sed
and SQL thus far, and additional information found in this wiki and on the aforementioned FDA website, do the following:
- Download and uncompress the files.
- Define appropriate tables for the Application and Product entities.
- Process the data files for these entities then load them into those tables.
- Answer the questions below.
Electronic Lab Notebook
Downloading and Unzipping FDA Drug Information File
- Like any other assignment using Putty, I logged through the Putty application on one of the SEA 120 computers.
- I bypassed the website and downloaded the file directly to my.cs.lmu.edu using the following command provided on the Week 6 Assignment Page:
curl -O http://www.fda.gov/downloads/Drugs/InformationOnDrugs/UCM054599.zip
- I unzipped the file, using this command (also provided on the Week 6 Assignment Page):
unzip UCM054599.zip
- I was given the results that were expected from unzipping and now have the files application.txt and Product.txt in my home folder, which are the files needed for the rest of this assignment.
Defining appropriate tables for the Application and Product entities
Application
In order to analyze the data in application.txt, I ran the command more application.txt
to discover what the columns of data stood for and recognized what kind of table values would be present in each column (int, varchar, boolean, etc.). The following SQL command created the table I needed for the data within the application.txt file:
create table drugapplication (ApplNo int primary key, ApplType varchar, SponsorApplicant varchar, MostRecentLabelAvailableFlag boolean, CurrentPatentFlag boolean, ActionType varchar, Chemical_Type int, Ther_Potential varchar, Orphan_Code varchar);
Product
Similarly to the application.txt file, I ran the command more Product.txt
in order to see the first line of data that provided the description of each column within the file. Therefore, I ran the following SQL command keeping in mind the names of the columns and the types of values present within the columns:
create table Product (ApplNo int, ProductNo int, Form varchar, Dosage varchar, ProductMktStatus varchar, TECode varchar, ReferenceDrug int, drugname varchar, activeingred varchar);
Processing the data files for Application and Product entities to load into tables
Explanation here.
Application.txt sed
command pipeline to prepare for insertion into DrugApplication table:
cat application.txt | sed "s/\t/#/g" | sed -r "s/( ){2}//g" | sed "s/ \#/#/g" | sed "s/\# /#/g" | sed "s/##/#null#/g" |
sed "s/##/#null#/g" | sed "s/#\r$/#null/g" | sed "s/#/,'/1" | sed "s/#/','/1" | sed "s/#/',/1" | sed "s/#/,/1" | sed "s/#/,'/1" |
sed "s/#/',/1" | sed "s/#/,'/1" | sed "s/#/','/1" | sed "s/#/,'/1" | sed "s/\r$/'/g" | sed "s/'null'/null/g" | sed "s/'null/null/g" |
sed "s/^/insert in drugapplication (ApplNo,ApplType,SponsorApplicant,MostRecentLabelAvailableFlag,CurrentPatentFlag,ActionType,Chemical_Type,Ther_Potential,Orphan_Code) values(/g" |
sed "s/$/);/g" | sed 1D > ~/public_html/drugapplication.sql.txt
Product.txt sed
command pipeline to prepare for insertion into DrugProduct table:
cat Product.txt | sed "s/\*\*Federal Register determination that product was not discontinued or withdrawn for safety or efficacy reasons\*\*//g" |
sed "s/\*\*Federal Register determination that product was not discontinued or withdrawn for safety or efficacy\*\*//g" |
sed "s/\*\*Federal Register notice determination that product was not discontinued or withdrawn for safety or efficacy reasons\*\*//g" |
sed "s/\*\*Federal Registration determination that product was not discontinued or withdrawn for safety or efficacy reason\*\*//g" |
sed "s/See current Annual Edition, 1.8 Description of Special Situations, Levothyroxine Sodium//g" |
sed "s/'//g" | sed "s/\t/#/g" | sed -r "s/( ){2}//g" | sed "s/ \#/#/g" | sed "s/\# /#/g" | sed "s/\#\#/#null#/g" | sed "s/\#\#/#null#/g" |
sed "s/\#/,/1" | sed "s/\#/,'/1" | sed "s/\#/','/1" | sed "s/\#/',/1" | sed "s/\#/,'/1" | sed "s/\#/',/1" | sed "s/\#/,'/1" | sed "s/\#/','/1" |
sed "s/\r$/'/g" | sed "s/^/insert into drugproduct (ApplNo,ProductNo,Form,Dosage,ProductMktStatus,TECode,ReferenceDrug,drugname,activeingred) values(/g" |
sed "s/$/);/g" | sed 1D > ~/public_html/drugproduct.sql.txt
Questions to Answer
- Provide the DDL (
create table
) statements that you used for your application and product tables.- DrugApplication Table:
create table drugapplication (ApplNo int primary key, ApplType varchar, SponsorApplicant varchar, MostRecentLabelAvailableFlag boolean, CurrentPatentFlag boolean, ActionType varchar, Chemical_Type int, Ther_Potential varchar, Orphan_Code varchar);
- DrugProduct Table:
create table Product (ApplNo int, ProductNo int, Form varchar, Dosage varchar, ProductMktStatus varchar, TECode varchar, ReferenceDrug int, drugname varchar, activeingred varchar);
- DrugApplication Table:
- Provide the
sed
command sequences that you used to convert the raw text files into sequences of SQLinsert
statements.- DrugApplication:
cat application.txt | sed "s/\t/#/g" | sed -r "s/( ){2}//g" | sed "s/ \#/#/g" | sed "s/\# /#/g" | sed "s/##/#null#/g" | sed "s/##/#null#/g" | sed "s/#\r$/#null/g" | sed "s/#/,'/1" | sed "s/#/','/1" | sed "s/#/',/1" | sed "s/#/,/1" | sed "s/#/,'/1" | sed "s/#/',/1" | sed "s/#/,'/1" | sed "s/#/','/1" | sed "s/#/,'/1" | sed "s/\r$/'/g" | sed "s/'null'/null/g" | sed "s/'null/null/g" | sed "s/^/insert in drugapplication (ApplNo,ApplType,SponsorApplicant,MostRecentLabelAvailableFlag,CurrentPatentFlag,ActionType,Chemical_Type,Ther_Potential,Orphan_Code) values(/g" | sed "s/$/);/g" | sed 1D > ~/public_html/drugapplication.sql.txt
- DrugProduct:
cat Product.txt | sed "s/\*\*Federal Register determination that product was not discontinued or withdrawn for safety or efficacy reasons\*\*//g" | sed "s/\*\*Federal Register determination that product was not discontinued or withdrawn for safety or efficacy\*\*//g" | sed "s/\*\*Federal Register notice determination that product was not discontinued or withdrawn for safety or efficacy reasons\*\*//g" | sed "s/\*\*Federal Registration determination that product was not discontinued or withdrawn for safety or efficacy reason\*\*//g" | sed "s/See current Annual Edition, 1.8 Description of Special Situations, Levothyroxine Sodium//g" | sed "s/'//g" | sed "s/\t/#/g" | sed -r "s/( ){2}//g" | sed "s/ \#/#/g" | sed "s/\# /#/g" | sed "s/\#\#/#null#/g" | sed "s/\#\#/#null#/g" | sed "s/\#/,/1" | sed "s/\#/,'/1" | sed "s/\#/','/1" | sed "s/\#/',/1" | sed "s/\#/,'/1" | sed "s/\#/',/1" | sed "s/\#/,'/1" | sed "s/\#/','/1" | sed "s/\r$/'/g" | sed "s/^/insert into drugproduct (ApplNo,ProductNo,Form,Dosage,ProductMktStatus,TECode,ReferenceDrug,drugname,activeingred) values(/g" | sed "s/$/);/g" | sed 1D > ~/public_html/drugproduct.sql.txt
- DrugApplication:
- Using the command line, how can you determine the number of records in each file? Provide the command.
- application.txt -
wc application.txt
: 19747 lines of data; however, 1st line is column headings, so 19747 - 1 = 19746 records in application.txt file. - Product.txt -
wc Product.txt
: 32771 lines of data; however, 1st line is column headings, so 32771 - 1 = 32770 records in Product.txt file.
- application.txt -
- Using SQL, how can you determine the number of records in the table corresponding to the file? Provide the SQL
select
statement. - In your database, are these numbers the same or different? Explain why you think so.
- For the remaining questions, supply (a) the answer to the question and (b) the SQL statement that you used to answer the question.
- What are the names of the drug products that are administered in the form
INJECTABLE;INTRAVENOUS, SUBCUTANEOUS
? - What are the names of the drug products whose active ingredient (activeingred) is
ATROPINE
? - In what forms and dosages can the drug product named
BENADRYL
be administered? - Which drug products have a name ending in
ESTROL
? - Produce a table listing all of the known values for the therapeutic_potential column in the application table and how many application records there are of each. (Side note: The therapeutic_potential codes are explained in the ReviewClass_Lookup.txt file, in case you’re interested.)
- Produce a table listing all of the known values for the chemical_type column in the application table and how many application records there are of each. (Side note: The chemical_type codes are explained in the ChemTypeLookup.txt file, in case you’re interested.)
- What are the names of the drug products that are sponsored (sponsor applicant column) by
MERCK
? - Which sponsor applicant companies have the text
LABS
in their names and have products whose active ingredients (activeingred) include bothASPIRIN
andCAFFEINE
?
Links to User Page and Journal Pages
Ron Legaspi
BIOL 367, Fall 2015
Assignment Links
- Week 1 Assignment
- Week 2 Assignment
- Week 3 Assignment
- Week 4 Assignment
- Week 5 Assignment
- Week 6 Assignment
- Week 7 Assignment
- Week 8 Assignment
- Week 9 Assignment
- Week 10 Assignment
- Week 11 Assignment
- Week 12 Assignment
- Week 14 Assignment
- Week 15 Assignment
Individual Weekly Journals
- Individual Journal Week 1 - This is my User Page
- Individual Journal Week 2
- Individual Journal Week 3
- Individual Journal Week 4
- Individual Journal Week 5
- Individual Journal Week 6
- Individual Journal Week 7
- Individual Journal Week 8
- Individual Journal Week 9
- Individual Journal Week 10
- Individual Journal Week 11
- Individual Journal Week 12
- Individual Journal Week 14
- Individual Journal Week 15
- Shared Journal Week 1
- Shared Journal Week 2
- Shared Journal Week 3
- Shared Journal Week 4
- Shared Journal Week 5
- Shared Journal Week 6
- Shared Journal Week 7
- Shared Journal Week 8
- Shared Journal Week 9
- Heavy Metal HaterZ Team Page - Week 10-15 Shared Journal
Homework Partner: Jake Woodlee