Difference between revisions of "Nanguiano Week 6"

From LMU BioDB 2015
Jump to: navigation, search
(created page, added links, and added headers)
 
(The FDA Drug Database: added initial steps)
Line 1: Line 1:
 
== The FDA Drug Database ==
 
== The FDA Drug Database ==
 
<!-- The United States Food and Drug Administration (FDA) provides, as a matter of public record, the full data set for its approved drugs at this website: http://www.fda.gov/Drugs/InformationOnDrugs/ucm079750.htm
 
 
The site includes both a link to the downloadable files (compressed in .zip format) and the schema (“entity relationship diagram”) for those files.
 
 
Using these files, what you have learned about <code>sed</code> and SQL thus far, and additional information found in this wiki and on the aforementioned FDA website, do the following:
 
 
# Download and uncompress the files.
 
# Define appropriate tables for the ''Application'' and ''Product'' entities.
 
# Process the data files for these entities then load them into those tables.
 
# Answer the questions below.-->
 
  
 
=== Direct Download/Unzipping Commands ===
 
=== Direct Download/Unzipping Commands ===
  
<!--To mirror the procedures performed with the ''movie_titles.txt'' file, this section shows you how to (a) download and process the FDA files on the ''my.cs.lmu.edu'' server via PuTTY/<code>ssh</code> then (b) make your processed files available to the Seaver 120 workstations via the built-in ''my.cs.lmu.edu'' server. To do so, we introduce some additional commands that you may use verbatim at the ''my.cs.lmu.edu'' command line.
+
*First, to begin the assignment, I ssh'd into my server.
 
+
ssh nanguiano@lion.lmu.edu
* For these activities, do ''not'' go into the ''~dondi/xmlpipedb/data'' folder—that’s mine! ''':)''' Instead, do your work on your home folder (i.e., where you are when you first login to ''my.cs.lmu.edu'').
+
*Next, I entered my folder for this class and created a directory for this week's assignment, then entered the directory.
 
+
cd biodb
* You can bypass the website and download the file directly to ''my.cs.lmu.edu'' with this command:
+
mkdir week6
 +
cd week6
 +
* In the directory, I downloaded the file from the FDA.
 +
curl -O <nowiki>http://www.fda.gov/downloads/Drugs/InformationOnDrugs/UCM054599.zip</nowiki>
 +
* After the download completed, I unzipped the file.
 +
unzip UCM054599.zip
  
    curl -O <nowiki>http://www.fda.gov/downloads/Drugs/InformationOnDrugs/UCM054599.zip</nowiki>
+
<!--
  
 
: ''(if the link above does not work, the FDA may have renamed the file; in that case, visit the website at the beginning of this section with a web browser, right click on the ''Drugs@FDA Download File'' link, and find a way to copy the linked address into your <code>ssh</code> session)''
 
: ''(if the link above does not work, the FDA may have renamed the file; in that case, visit the website at the beginning of this section with a web browser, right click on the ''Drugs@FDA Download File'' link, and find a way to copy the linked address into your <code>ssh</code> session)''
Line 26: Line 20:
 
* To unzip this file, use this command:
 
* To unzip this file, use this command:
  
     unzip UCM054599.zip
+
      
  
 
: ''(if the downloaded filename is different from the one shown, use that filename instead)''
 
: ''(if the downloaded filename is different from the one shown, use that filename instead)''
Line 78: Line 72:
 
# What are the names of the drug products that are sponsored (''sponsor applicant'' column) by <code>MERCK</code>?
 
# What are the names of the drug products that are sponsored (''sponsor applicant'' column) by <code>MERCK</code>?
 
# Which ''sponsor applicant'' companies have the text <code>LABS</code> in their names and have products whose active ingredients (''activeingred'') include ''both'' <code>ASPIRIN</code> and <code>CAFFEINE</code>?
 
# Which ''sponsor applicant'' companies have the text <code>LABS</code> in their names and have products whose active ingredients (''activeingred'') include ''both'' <code>ASPIRIN</code> and <code>CAFFEINE</code>?
 
  
 
== Links ==
 
== Links ==
 
{{Template:Nanguiano}}
 
{{Template:Nanguiano}}

Revision as of 22:11, 13 October 2015

The FDA Drug Database

Direct Download/Unzipping Commands

  • First, to begin the assignment, I ssh'd into my server.
ssh nanguiano@lion.lmu.edu
  • Next, I entered my folder for this class and created a directory for this week's assignment, then entered the directory.
cd biodb
mkdir week6
cd week6
  • In the directory, I downloaded the file from the FDA.
curl -O http://www.fda.gov/downloads/Drugs/InformationOnDrugs/UCM054599.zip
  • After the download completed, I unzipped the file.
unzip UCM054599.zip


Supplementary Information

Questions to Answer

  1. Provide the DDL (create table) statements that you used for your application and product tables.
  2. Provide the sed command sequences that you used to convert the raw text files into sequences of SQL insert statements.
  3. Using the command line, how can you determine the number of records in each file? Provide the command.
  4. Using SQL, how can you determine the number of records in the table corresponding to the file? Provide the SQL select statement.
  5. In your database, are these numbers the same or different? Explain why you think so.
    For the remaining questions, supply (a) the answer to the question and (b) the SQL statement that you used to answer the question.
  6. What are the names of the drug products that are administered in the form INJECTABLE;INTRAVENOUS, SUBCUTANEOUS?
  7. What are the names of the drug products whose active ingredient (activeingred) is ATROPINE?
  8. In what forms and dosages can the drug product named BENADRYL be administered?
  9. Which drug products have a name ending in ESTROL?
  10. Produce a table listing all of the known values for the therapeutic_potential column in the application table and how many application records there are of each. (Side note: The therapeutic_potential codes are explained in the ReviewClass_Lookup.txt file, in case you’re interested.)
  11. Produce a table listing all of the known values for the chemical_type column in the application table and how many application records there are of each. (Side note: The chemical_type codes are explained in the ChemTypeLookup.txt file, in case you’re interested.)
  12. What are the names of the drug products that are sponsored (sponsor applicant column) by MERCK?
  13. Which sponsor applicant companies have the text LABS in their names and have products whose active ingredients (activeingred) include both ASPIRIN and CAFFEINE?

Links

Nicole Anguiano
BIOL 367, Fall 2015

Assignment Links
Individual Journals
Shared Journals