Difference between revisions of "Rlegaspi Week 6"

From LMU BioDB 2015
Jump to: navigation, search
(Assignment information and added sections to my electronic lab notebook.)
(Fixed some formatting errors and finished unzipping section of electronic lab notebook (easiest section since steps are given))
Line 15: Line 15:
 
== Downloading and Unzipping FDA Drug Information File ==
 
== Downloading and Unzipping FDA Drug Information File ==
  
* Like any other assignment using Putty, I logged through the Putty application on one of the SEA 120 computers; therefore, I did not need to use the command <code>ssh rlegaspi@my.cs.lmu.edu<\/code> and type in my password.
+
* Like any other assignment using Putty, I logged through the Putty application on one of the SEA 120 computers; therefore, I did not need to use the command <code>ssh rlegaspi@my.cs.lmu.edu</\code> and type in my password.
  
 
* I bypassed the website and downloaded the file directly to ''my.cs.lmu.edu'' using the following command provided on the Week 6 Assignment Page:
 
* I bypassed the website and downloaded the file directly to ''my.cs.lmu.edu'' using the following command provided on the Week 6 Assignment Page:
Line 22: Line 22:
 
* I unzipped the file, using this command (also provided on the Week 6 Assignment Page):
 
* I unzipped the file, using this command (also provided on the Week 6 Assignment Page):
 
  unzip UCM054599.zip
 
  unzip UCM054599.zip
 +
 +
* I was given the results that were expected from unzipping and now have the files '''application.txt''' and '''Product.txt''' in my home folder, which are the files needed for the rest of this assignment.
  
 
== Defining appropriate tables for the ''Application'' and ''Product'' entities ==
 
== Defining appropriate tables for the ''Application'' and ''Product'' entities ==

Revision as of 22:19, 12 October 2015

The FDA Drug Database

The United States Food and Drug Administration (FDA) provides, as a matter of public record, the full data set for its approved drugs at this website: http://www.fda.gov/Drugs/InformationOnDrugs/ucm079750.htm

The site includes both a link to the downloadable files (compressed in .zip format) and the schema (“entity relationship diagram”) for those files.

Using these files, what you have learned about sed and SQL thus far, and additional information found in this wiki and on the aforementioned FDA website, do the following:

  1. Download and uncompress the files.
  2. Define appropriate tables for the Application and Product entities.
  3. Process the data files for these entities then load them into those tables.
  4. Answer the questions below.

Electronic Lab Notebook

Downloading and Unzipping FDA Drug Information File

  • Like any other assignment using Putty, I logged through the Putty application on one of the SEA 120 computers; therefore, I did not need to use the command ssh rlegaspi@my.cs.lmu.edu</\code> and type in my password.
  • I bypassed the website and downloaded the file directly to my.cs.lmu.edu using the following command provided on the Week 6 Assignment Page:
curl -O http://www.fda.gov/downloads/Drugs/InformationOnDrugs/UCM054599.zip
  • I unzipped the file, using this command (also provided on the Week 6 Assignment Page):
unzip UCM054599.zip
  • I was given the results that were expected from unzipping and now have the files application.txt and Product.txt in my home folder, which are the files needed for the rest of this assignment.

Defining appropriate tables for the Application and Product entities

create table drugapplication (ApplNo int primary key, ApplType varchar, SponsorApplicant varchar, MostRecentLabelAvailableFlag boolean, CurrentPatentFlag boolean, ActionType varchar, Chemical_Type int, Ther_Potential varchar, Orphan_Code varchar);

Processing the data files for Application and Product entities to load into tables

Still working on this part...

Questions to Answer

  1. Provide the DDL (<code>create table) statements that you used for your application and product tables.
create table drugapplication (ApplNo int primary key, ApplType varchar, SponsorApplicant varchar, MostRecentLabelAvailableFlag boolean, CurrentPatentFlag boolean, ActionType varchar, Chemical_Type int, Ther_Potential varchar, Orphan_Code varchar);
  1. Provide the sed command sequences that you used to convert the raw text files into sequences of SQL insert statements.
  2. Using the command line, how can you determine the number of records in each file? Provide the command.
  3. Using SQL, how can you determine the number of records in the table corresponding to the file? Provide the SQL select statement.
  4. In your database, are these numbers the same or different? Explain why you think so.
    For the remaining questions, supply (a) the answer to the question and (b) the SQL statement that you used to answer the question.
  5. What are the names of the drug products that are administered in the form INJECTABLE;INTRAVENOUS, SUBCUTANEOUS?
  6. What are the names of the drug products whose active ingredient (activeingred) is ATROPINE?
  7. In what forms and dosages can the drug product named BENADRYL be administered?
  8. Which drug products have a name ending in ESTROL?
  9. Produce a table listing all of the known values for the therapeutic_potential column in the application table and how many application records there are of each. (Side note: The therapeutic_potential codes are explained in the ReviewClass_Lookup.txt file, in case you’re interested.)
  10. Produce a table listing all of the known values for the chemical_type column in the application table and how many application records there are of each. (Side note: The chemical_type codes are explained in the ChemTypeLookup.txt file, in case you’re interested.)
  11. What are the names of the drug products that are sponsored (sponsor applicant column) by MERCK?
  12. Which sponsor applicant companies have the text LABS in their names and have products whose active ingredients (activeingred) include both ASPIRIN and CAFFEINE?

Links to User Page and Journal Pages

Ron Legaspi
BIOL 367, Fall 2015

Assignment Links
Individual Weekly Journals
Shared Weekly Journals

Homework Partner: Jake Woodlee