Difference between revisions of "Bklein7 Week 6"
From LMU BioDB 2015
(Added Links Subheading) |
(Added notes on downloading and editing the zip file) |
||
Line 1: | Line 1: | ||
+ | ==The FDA Drug Database== | ||
+ | ===Acquiring Data From the Drugs@FDA ZIP File=== | ||
+ | ====Downloading and Viewing==== | ||
+ | *To download the zip file directly from the FDA website, I typed the following into the command line: | ||
+ | curl -O http://www.fda.gov/downloads/Drugs/InformationOnDrugs/UCM054599.zip | ||
+ | *Next, I unziped the FDA file | ||
+ | unzip UCM054599.zip | ||
+ | *The resulting output included the 'application.txt' and 'Product.txt' files. To view the format of these files, I used more commands. The output I received appeared to have listed columns which were not clearly defined. Therefore, I visualized the data using 'Notepad'. This made the columns easier to read. | ||
+ | **In 'Notepad', I got a better idea of how I needed to edit the txt files to import them into a table in 'pgAdminIII'. | ||
+ | ====Editing the txt files with sed==== | ||
+ | *application.txt | ||
+ | **To begin, I wanted to determine if there were any special characters in the text file such as ';' or ''' that needed to be worked around. I did two searches in notepad for these characters and found that this was not the case. | ||
+ | **Next, I began the pipe with a cat command and then wanted to clean up the file to make it easier to read. | ||
+ | ***I started with a sed command to remove excess spaces. Although this cleaned up the file, there were still spaces separating separate values. I deduced that these values were tabs. | ||
+ | ***To figure out how to replace the tabs (from a tab delimited file) with commas, I did some google searching. I found a way to tackle this problem on [http://www.electrictoolbox.com/linux-tab-command-line-bash/ electrictoolbox]. Using this piece of information, I wrote a sed command to replace tabs with commas. Immediately the output looked more like what I wanted. | ||
+ | **Having separated | ||
+ | |||
+ | **I started the pipe with a cat command and a sed command to delete excess spaces and make the file easier to read. | ||
+ | **Next, I added a sed command to include the proper insert command at the beginning of each line. | ||
+ | ** | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ===Questions to Answer=== | ||
+ | # Provide the DDL (<code>create table</code>) statements that you used for your ''application'' and ''product'' tables. | ||
+ | # Provide the <code>sed</code> command sequences that you used to convert the raw text files into sequences of SQL <code>insert</code> statements. | ||
+ | # Using the command line, how can you determine the number of records in each file? Provide the command. | ||
+ | # Using SQL, how can you determine the number of records in the table corresponding to the file? Provide the SQL <code>select</code> statement. | ||
+ | # In your database, are these numbers the same or different? Explain why you think so. | ||
+ | #: ''For the remaining questions, supply (a) the answer to the question and (b) the SQL statement that you used to answer the question.'' | ||
+ | # What are the names of the drug products that are administered in the ''form'' <code>INJECTABLE;INTRAVENOUS, SUBCUTANEOUS</code>? | ||
+ | # What are the names of the drug products whose active ingredient (''activeingred'') is <code>ATROPINE</code>? | ||
+ | # In what ''form''s and ''dosage''s can the drug product named <code>BENADRYL</code> be administered? | ||
+ | # Which drug products have a name ''ending'' in <code>ESTROL</code>? | ||
+ | # Produce a table listing all of the known values for the ''therapeutic_potential'' column in the ''application'' table and how many ''application'' records there are of each. (''Side note:'' The ''therapeutic_potential'' codes are explained in the ''ReviewClass_Lookup.txt'' file, in case you’re interested.) | ||
+ | # Produce a table listing all of the known values for the ''chemical_type'' column in the ''application'' table and how many ''application'' records there are of each. (''Side note:'' The ''chemical_type'' codes are explained in the ''ChemTypeLookup.txt'' file, in case you’re interested.) | ||
+ | # What are the names of the drug products that are sponsored (''sponsor applicant'' column) by <code>MERCK</code>? | ||
+ | # Which ''sponsor applicant'' companies have the text <code>LABS</code> in their names and have products whose active ingredients (''activeingred'') include ''both'' <code>ASPIRIN</code> and <code>CAFFEINE</code>? | ||
==Links== | ==Links== | ||
{{Template:Bklein7}} | {{Template:Bklein7}} | ||
[[Category:Journal Entry]] | [[Category:Journal Entry]] |
Revision as of 02:06, 12 October 2015
Contents
The FDA Drug Database
Acquiring Data From the Drugs@FDA ZIP File
Downloading and Viewing
- To download the zip file directly from the FDA website, I typed the following into the command line:
curl -O http://www.fda.gov/downloads/Drugs/InformationOnDrugs/UCM054599.zip
- Next, I unziped the FDA file
unzip UCM054599.zip
- The resulting output included the 'application.txt' and 'Product.txt' files. To view the format of these files, I used more commands. The output I received appeared to have listed columns which were not clearly defined. Therefore, I visualized the data using 'Notepad'. This made the columns easier to read.
- In 'Notepad', I got a better idea of how I needed to edit the txt files to import them into a table in 'pgAdminIII'.
Editing the txt files with sed
- application.txt
- To begin, I wanted to determine if there were any special characters in the text file such as ';' or that needed to be worked around. I did two searches in notepad for these characters and found that this was not the case.
- Next, I began the pipe with a cat command and then wanted to clean up the file to make it easier to read.
- I started with a sed command to remove excess spaces. Although this cleaned up the file, there were still spaces separating separate values. I deduced that these values were tabs.
- To figure out how to replace the tabs (from a tab delimited file) with commas, I did some google searching. I found a way to tackle this problem on electrictoolbox. Using this piece of information, I wrote a sed command to replace tabs with commas. Immediately the output looked more like what I wanted.
- Having separated
- I started the pipe with a cat command and a sed command to delete excess spaces and make the file easier to read.
- Next, I added a sed command to include the proper insert command at the beginning of each line.
Questions to Answer
- Provide the DDL (
create table
) statements that you used for your application and product tables. - Provide the
sed
command sequences that you used to convert the raw text files into sequences of SQLinsert
statements. - Using the command line, how can you determine the number of records in each file? Provide the command.
- Using SQL, how can you determine the number of records in the table corresponding to the file? Provide the SQL
select
statement. - In your database, are these numbers the same or different? Explain why you think so.
- For the remaining questions, supply (a) the answer to the question and (b) the SQL statement that you used to answer the question.
- What are the names of the drug products that are administered in the form
INJECTABLE;INTRAVENOUS, SUBCUTANEOUS
? - What are the names of the drug products whose active ingredient (activeingred) is
ATROPINE
? - In what forms and dosages can the drug product named
BENADRYL
be administered? - Which drug products have a name ending in
ESTROL
? - Produce a table listing all of the known values for the therapeutic_potential column in the application table and how many application records there are of each. (Side note: The therapeutic_potential codes are explained in the ReviewClass_Lookup.txt file, in case you’re interested.)
- Produce a table listing all of the known values for the chemical_type column in the application table and how many application records there are of each. (Side note: The chemical_type codes are explained in the ChemTypeLookup.txt file, in case you’re interested.)
- What are the names of the drug products that are sponsored (sponsor applicant column) by
MERCK
? - Which sponsor applicant companies have the text
LABS
in their names and have products whose active ingredients (activeingred) include bothASPIRIN
andCAFFEINE
?
Links
- User Page: Brandon Klein
- Team Page: The Class Whoopers
Assignments Pages
- Week 1 Assignment
- Week 2 Assignment
- Week 3 Assignment
- Week 4 Assignment
- Week 5 Assignment
- Week 6 Assignment
- Week 7 Assignment
- Week 8 Assignment
- Week 9 Assignment
- Week 10 Assignment
- Week 11 Assignment
- Week 12 Assignment
- No Week 13 Assignment
- Week 14 Assignment
- Week 15 Assignment
Individual Journal Entries
- Week 1 Individual Journal
- Week 2 Individual Journal
- Week 3 Individual Journal
- Week 4 Individual Journal
- Week 5 Individual Journal
- Week 6 Individual Journal
- Week 7 Individual Journal
- Week 8 Individual Journal
- Week 9 Individual Journal
- Week 10 Individual Journal
- Week 11 Individual Journal
- Week 12 Individual Journal
- No Week 13 Journal
- Week 14 Individual Journal
- Week 15 Individual Journal
- Week 1 Class Journal
- Week 2 Class Journal
- Week 3 Class Journal
- Week 4 Class Journal
- Week 5 Class Journal
- Week 6 Class Journal
- Week 7 Class Journal
- Week 8 Class Journal
- Week 9 Class Journal
- Week 10 Team Journal
- Week 11 Team Journal
- Week 12 Team Journal
- No Week 13 Journal
- Week 14 Team Journal
- Week 15 Team Journal