Difference between revisions of "Nanguiano Week 6"

From LMU BioDB 2015
Jump to: navigation, search
(The FDA Drug Database: added some SQL commands and answered the first question)
(The FDA Drug Database: Began parsing the original strings so that I could have them insert properly)
Line 32: Line 32:
 
  TECode varchar, ReferenceDrug int, drugname varchar, activeingred varchar);
 
  TECode varchar, ReferenceDrug int, drugname varchar, activeingred varchar);
 
* After creating the tables, I needed to change the structure of the txt files so that I could insert them into the tables. I knew that I needed to add the insert command at the start of each line. I began with the application.txt file. I first ran the command to see whether or not I had the correct command to add the insert statements at the beginning of each line.
 
* After creating the tables, I needed to change the structure of the txt files so that I could insert them into the tables. I knew that I needed to add the insert command at the start of each line. I began with the application.txt file. I first ran the command to see whether or not I had the correct command to add the insert statements at the beginning of each line.
  cat application.txt | sed "s/^/insert into application(ApplNo,ApplType,SponsorApplicant,MostRecentLabelAvailable,
+
  cat application.txt | sed "s/^/insert into application(ApplNo,ApplType,SponsorApplicant,MostRecentLabelAvailableFlag,
  CurrentPatentFlag,ActionType,ChemicalType,Ther_Potential,Orphan_char) values(/g"
+
  CurrentPatentFlag,ActionType,ChemicalType,Ther_Potential,Orphan_Code) values(/g"
 
* The command seemed to be correct, so next I needed to remove the first line, which contained the column headers.  
 
* The command seemed to be correct, so next I needed to remove the first line, which contained the column headers.  
  cat application.txt | sed "1D" | sed "s/^/insert into application(ApplNo,ApplType,SponsorApplicant,MostRecentLabelAvailable,
+
  cat application.txt | sed "1D" | sed "s/^/insert into application(ApplNo,ApplType,SponsorApplicant,MostRecentLabelAvailableFlag,
  CurrentPatentFlag,ActionType,ChemicalType,Ther_Potential,Orphan_char) values(/g"
+
  CurrentPatentFlag,ActionType,ChemicalType,Ther_Potential,Orphan_Code) values(/g"
<!--* After confirming the first line was correctly removed, I needed to replace the tabs between the -->
+
* With the first line removed and the insert added, I turned my attention towards formatting the varchar's to be surrounded with single quotes, and replacing the tabs between each piece of data with commas. First, to confirm that the tabs were being replaced properly, I created a random sed command to replace sed with the word "BIODB".
 +
cat application.txt | sed "1D" | sed "s/\t/BIODB/g"
 +
* After confirming this command worked, I decided to replace the word "BIODB" with a comma and confirm that the results were the same.
 +
cat application.txt | sed "1D" | sed "s/\t/,/g"
 +
* Once I saw that this worked, I knew I needed to surround the strings (varchars) and chars with commas. The strings/chars were the second, third, sixth, eigth, and ninth columns.
 +
 
 
=== Questions to Answer ===
 
=== Questions to Answer ===
  

Revision as of 00:25, 14 October 2015

The FDA Drug Database

Direct Download/Unzipping Commands

  • First, to begin the assignment, I ssh'd into my server.
ssh nanguiano@lion.lmu.edu
  • Next, I entered my folder for this class and created a directory for this week's assignment, then entered the directory.
cd biodb
mkdir week6
cd week6
  • In the directory, I downloaded the file from the FDA.
curl -O http://www.fda.gov/downloads/Drugs/InformationOnDrugs/UCM054599.zip
  • After the download completed, I unzipped the file.
unzip UCM054599.zip
  • Next, I needed to move the application.txt and Product.txt files into my public_html folder. However, I first wanted to create a folder for the files. I entered my public_html folder and created a folder for this class, biodb.
cd ~/public_html
mkdir biodb
  • I noticed after making the folder that the "movie.sql.txt" file from a few classes ago was still there. I moved that file into the biodb folder, then returned to my week6 folder to begin the transfer of the files I needed to move.
mv movie.sql.txt biodb
cd ~/biodb/week6
  • I moved the files into the prepared folder in the public_html directory.
mv application.txt ~/public_html/biodb
mv Product.txt ~/public_html/biodb
  • Next, I needed to process the files to prepare them for entry into PostgreSQL. The files needed to be prepared with the insert commands so that they could be added into my database. Before doing this, i needed to ensure I knew the structure of the tables that I was going to enter the files into. To begin this process, I downloaded the two files and opened them in Excel so I could see what the header of each column was, and what type of data it contained.
    • Opening the "application.txt" file showed me that it contained 9 columns: ApplNo (int), ApplType (char), SponsorApplicant (varchar), MostRecentLabelAvailableFlag (bool), CurrentPatentFlag (bool), ActionType (varchar), ChemicalType (int), Ther_Potential (char), and Orphan_Code (char).
    • Opening the "Product.txt" file showed me that it also contained 9 columns: ApplNo (int), ProductNo (int), Form (varchar), Dosage (varchar), ProductMktStatus (int), TECode (varchar), ReferenceDrug (int), drugname (varchar), and activeingred (varchar).
  • I began by creating the database that I would be using for this week's assignment. I opened pgAdmin on my own computer, and created a new database, called "FDADrugDB". I opened the SQL editor and typed in the following commands to create my two tables:
create table application (ApplNo int primary key, ApplType char, SponsorApplicant varchar, MostRecentLabelAvailableFlag boolean, 
CurrentPatentFlag boolean, ActionType varchar, ChemicalType int, Ther_Potential char, Orphan_Code char);

create table product (ApplNo int references application, ProductNo int primary key, Form varchar, Dosage varchar, ProductMktStatus int, 
TECode varchar, ReferenceDrug int, drugname varchar, activeingred varchar);
  • After creating the tables, I needed to change the structure of the txt files so that I could insert them into the tables. I knew that I needed to add the insert command at the start of each line. I began with the application.txt file. I first ran the command to see whether or not I had the correct command to add the insert statements at the beginning of each line.
cat application.txt | sed "s/^/insert into application(ApplNo,ApplType,SponsorApplicant,MostRecentLabelAvailableFlag,
CurrentPatentFlag,ActionType,ChemicalType,Ther_Potential,Orphan_Code) values(/g"
  • The command seemed to be correct, so next I needed to remove the first line, which contained the column headers.
cat application.txt | sed "1D" | sed "s/^/insert into application(ApplNo,ApplType,SponsorApplicant,MostRecentLabelAvailableFlag,
CurrentPatentFlag,ActionType,ChemicalType,Ther_Potential,Orphan_Code) values(/g"
  • With the first line removed and the insert added, I turned my attention towards formatting the varchar's to be surrounded with single quotes, and replacing the tabs between each piece of data with commas. First, to confirm that the tabs were being replaced properly, I created a random sed command to replace sed with the word "BIODB".
cat application.txt | sed "1D" | sed "s/\t/BIODB/g"
  • After confirming this command worked, I decided to replace the word "BIODB" with a comma and confirm that the results were the same.
cat application.txt | sed "1D" | sed "s/\t/,/g"
  • Once I saw that this worked, I knew I needed to surround the strings (varchars) and chars with commas. The strings/chars were the second, third, sixth, eigth, and ninth columns.

Questions to Answer

  • Provide the DDL (create table) statements that you used for your application and product tables.
create table application (ApplNo int primary key, ApplType char, SponsorApplicant varchar, MostRecentLabelAvailableFlag boolean, 
CurrentPatentFlag boolean, ActionType varchar, ChemicalType int, Ther_Potential char, Orphan_Code char);

create table product (ApplNo int references application, ProductNo int primary key, Form varchar, Dosage varchar, ProductMktStatus int, 
TECode varchar, ReferenceDrug int, drugname varchar, activeingred varchar);
  1. Provide the sed command sequences that you used to convert the raw text files into sequences of SQL insert statements.
  2. Using the command line, how can you determine the number of records in each file? Provide the command.
  3. Using SQL, how can you determine the number of records in the table corresponding to the file? Provide the SQL select statement.
  4. In your database, are these numbers the same or different? Explain why you think so.
    For the remaining questions, supply (a) the answer to the question and (b) the SQL statement that you used to answer the question.
  5. What are the names of the drug products that are administered in the form INJECTABLE;INTRAVENOUS, SUBCUTANEOUS?
  6. What are the names of the drug products whose active ingredient (activeingred) is ATROPINE?
  7. In what forms and dosages can the drug product named BENADRYL be administered?
  8. Which drug products have a name ending in ESTROL?
  9. Produce a table listing all of the known values for the therapeutic_potential column in the application table and how many application records there are of each. (Side note: The therapeutic_potential codes are explained in the ReviewClass_Lookup.txt file, in case you’re interested.)
  10. Produce a table listing all of the known values for the chemical_type column in the application table and how many application records there are of each. (Side note: The chemical_type codes are explained in the ChemTypeLookup.txt file, in case you’re interested.)
  11. What are the names of the drug products that are sponsored (sponsor applicant column) by MERCK?
  12. Which sponsor applicant companies have the text LABS in their names and have products whose active ingredients (activeingred) include both ASPIRIN and CAFFEINE?

Links

Nicole Anguiano
BIOL 367, Fall 2015

Assignment Links
Individual Journals
Shared Journals