Difference between revisions of "Bklein7 Week 6"

Revision as of 02:06, 12 October 2015

The FDA Drug Database

Acquiring Data From the Drugs@FDA ZIP File

Downloading and Viewing

To download the zip file directly from the FDA website, I typed the following into the command line:

curl -O http://www.fda.gov/downloads/Drugs/InformationOnDrugs/UCM054599.zip

Next, I unziped the FDA file

unzip UCM054599.zip

The resulting output included the 'application.txt' and 'Product.txt' files. To view the format of these files, I used more commands. The output I received appeared to have listed columns which were not clearly defined. Therefore, I visualized the data using 'Notepad'. This made the columns easier to read.
- In 'Notepad', I got a better idea of how I needed to edit the txt files to import them into a table in 'pgAdminIII'.

Editing the txt files with sed

application.txt
- To begin, I wanted to determine if there were any special characters in the text file such as ';' or that needed to be worked around. I did two searches in notepad for these characters and found that this was not the case.
- Next, I began the pipe with a cat command and then wanted to clean up the file to make it easier to read.
  - I started with a sed command to remove excess spaces. Although this cleaned up the file, there were still spaces separating separate values. I deduced that these values were tabs.
  - To figure out how to replace the tabs (from a tab delimited file) with commas, I did some google searching. I found a way to tackle this problem on electrictoolbox. Using this piece of information, I wrote a sed command to replace tabs with commas. Immediately the output looked more like what I wanted.
- Having separated

- I started the pipe with a cat command and a sed command to delete excess spaces and make the file easier to read.
- Next, I added a sed command to include the proper insert command at the beginning of each line.

Questions to Answer

Provide the DDL (create table) statements that you used for your application and product tables.
Provide the sed command sequences that you used to convert the raw text files into sequences of SQL insert statements.
Using the command line, how can you determine the number of records in each file? Provide the command.
Using SQL, how can you determine the number of records in the table corresponding to the file? Provide the SQL select statement.
In your database, are these numbers the same or different? Explain why you think so.
For the remaining questions, supply (a) the answer to the question and (b) the SQL statement that you used to answer the question.
What are the names of the drug products that are administered in the form INJECTABLE;INTRAVENOUS, SUBCUTANEOUS?
What are the names of the drug products whose active ingredient (activeingred) is ATROPINE?
In what forms and dosages can the drug product named BENADRYL be administered?
Which drug products have a name ending in ESTROL?
Produce a table listing all of the known values for the therapeutic_potential column in the application table and how many application records there are of each. (Side note: The therapeutic_potential codes are explained in the ReviewClass_Lookup.txt file, in case you’re interested.)
Produce a table listing all of the known values for the chemical_type column in the application table and how many application records there are of each. (Side note: The chemical_type codes are explained in the ChemTypeLookup.txt file, in case you’re interested.)
What are the names of the drug products that are sponsored (sponsor applicant column) by MERCK?
Which sponsor applicant companies have the text LABS in their names and have products whose active ingredients (activeingred) include both ASPIRIN and CAFFEINE?

Links

User Page: Brandon Klein
Team Page: The Class Whoopers

Assignments Pages

Individual Journal Entries

Shared Journal Entries

@@ Line 1: / Line 1: @@
+==The FDA Drug Database==
+===Acquiring Data From the Drugs@FDA ZIP File===
+====Downloading and Viewing====
+*To download the zip file directly from the FDA website, I typed the following into the command line:
+ curl -O http://www.fda.gov/downloads/Drugs/InformationOnDrugs/UCM054599.zip
+*Next, I unziped the FDA file
+ unzip UCM054599.zip
+*The resulting output included the 'application.txt' and 'Product.txt' files. To view the format of these files, I used more commands. The output I received appeared to have listed columns which were not clearly defined. Therefore, I visualized the data using 'Notepad'. This made the columns easier to read.
+**In 'Notepad', I got a better idea of how I needed to edit the txt files to import them into a table in 'pgAdminIII'.
+====Editing the txt files with sed====
+*application.txt
+**To begin, I wanted to determine if there were any special characters in the text file such as ';' or ''' that needed to be worked around. I did two searches in notepad for these characters and found that this was not the case.
+**Next, I began the pipe with a cat command and then wanted to clean up the file to make it easier to read.
+***I started with a sed command to remove excess spaces. Although this cleaned up the file, there were still spaces separating separate values. I deduced that these values were tabs.
+***To figure out how to replace the tabs (from a tab delimited file) with commas, I did some google searching. I found a way to tackle this problem on [http://www.electrictoolbox.com/linux-tab-command-line-bash/ electrictoolbox]. Using this piece of information, I wrote a sed command to replace tabs with commas. Immediately the output looked more like what I wanted.
+**Having separated
+**I started the pipe with a cat command and a sed command to delete excess spaces and make the file easier to read.
+**Next, I added a sed command to include the proper insert command at the beginning of each line.
+**
+===Questions to Answer===
+# Provide the DDL (<code>create table</code>) statements that you used for your ''application'' and ''product'' tables.
+# Provide the <code>sed</code> command sequences that you used to convert the raw text files into sequences of SQL <code>insert</code> statements.
+# Using the command line, how can you determine the number of records in each file? Provide the command.
+# Using SQL, how can you determine the number of records in the table corresponding to the file? Provide the SQL <code>select</code> statement.
+# In your database, are these numbers the same or different? Explain why you think so.
+#: ''For the remaining questions, supply (a) the answer to the question and (b) the SQL statement that you used to answer the question.''
+# What are the names of the drug products that are administered in the ''form'' <code>INJECTABLE;INTRAVENOUS, SUBCUTANEOUS</code>?
+# What are the names of the drug products whose active ingredient (''activeingred'') is <code>ATROPINE</code>?
+# In what ''form''s and ''dosage''s can the drug product named <code>BENADRYL</code> be administered?
+# Which drug products have a name ''ending'' in <code>ESTROL</code>?
+# Produce a table listing all of the known values for the ''therapeutic_potential'' column in the ''application'' table and how many ''application'' records there are of each. (''Side note:'' The ''therapeutic_potential'' codes are explained in the ''ReviewClass_Lookup.txt'' file, in case you’re interested.)
+# Produce a table listing all of the known values for the ''chemical_type'' column in the ''application'' table and how many ''application'' records there are of each. (''Side note:'' The ''chemical_type'' codes are explained in the ''ChemTypeLookup.txt'' file, in case you’re interested.)
+# What are the names of the drug products that are sponsored (''sponsor applicant'' column) by <code>MERCK</code>?
+# Which ''sponsor applicant'' companies have the text <code>LABS</code> in their names and have products whose active ingredients (''activeingred'') include ''both'' <code>ASPIRIN</code> and <code>CAFFEINE</code>?
 ==Links==
 {{Template:Bklein7}}
 [[Category:Journal Entry]]

Difference between revisions of "Bklein7 Week 6"

Revision as of 02:06, 12 October 2015

Contents

The FDA Drug Database

Acquiring Data From the Drugs@FDA ZIP File

Downloading and Viewing

Editing the txt files with sed

Questions to Answer

Links

Assignments Pages

Individual Journal Entries

Shared Journal Entries

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools