Difference between revisions of "Rlegaspi Week 6"

From LMU BioDB 2015
Jump to: navigation, search
(Creating Week 6 page for myself and copying assignment onto page as a reference.)
 
(Finished questions to answer question, just need to proofread and finish electronic lab notebook.)
 
(8 intermediate revisions by the same user not shown)
Line 1: Line 1:
=== The FDA Drug Database ===
+
= The FDA Drug Database =
  
 
The United States Food and Drug Administration (FDA) provides, as a matter of public record, the full data set for its approved drugs at this website: http://www.fda.gov/Drugs/InformationOnDrugs/ucm079750.htm
 
The United States Food and Drug Administration (FDA) provides, as a matter of public record, the full data set for its approved drugs at this website: http://www.fda.gov/Drugs/InformationOnDrugs/ucm079750.htm
Line 12: Line 12:
 
# Answer the questions below.
 
# Answer the questions below.
  
==== Direct Download/Unzipping Commands ====
+
= Electronic Lab Notebook =
 +
== Downloading and Unzipping FDA Drug Information File ==
  
To mirror the procedures performed with the ''movie_titles.txt'' file, this section shows you how to (a) download and process the FDA files on the ''my.cs.lmu.edu'' server via PuTTY/<code>ssh</code> then (b) make your processed files available to the Seaver 120 workstations via the built-in ''my.cs.lmu.edu'' server. To do so, we introduce some additional commands that you may use verbatim at the ''my.cs.lmu.edu'' command line.
+
* Like any other assignment using Putty, I logged through the Putty application on one of the SEA 120 computers.
  
* For these activities, do ''not'' go into the ''~dondi/xmlpipedb/data'' folder—that’s mine! ''':)''' Instead, do your work on your home folder (i.e., where you are when you first login to ''my.cs.lmu.edu'').
+
* I bypassed the website and downloaded the file directly to ''my.cs.lmu.edu'' using the following command provided on the Week 6 Assignment Page:
 +
curl -O <nowiki>http://www.fda.gov/downloads/Drugs/InformationOnDrugs/UCM054599.zip</nowiki>
  
* You can bypass the website and download the file directly to ''my.cs.lmu.edu'' with this command:
+
* I unzipped the file, using this command (also provided on the Week 6 Assignment Page):
 +
unzip UCM054599.zip
  
    curl -O <nowiki>http://www.fda.gov/downloads/Drugs/InformationOnDrugs/UCM054599.zip</nowiki>
+
* I was given the results that were expected from unzipping and now have the files '''application.txt''' and '''Product.txt''' in my home folder, which are the files needed for the rest of this assignment.
  
: ''(if the link above does not work, the FDA may have renamed the file; in that case, visit the website at the beginning of this section with a web browser, right click on the ''Drugs@FDA Download File'' link, and find a way to copy the linked address into your <code>ssh</code> session)''
+
== Defining appropriate tables for the ''Application'' and ''Product'' entities ==
 +
=== Application ===
 +
In order to analyze the data in ''application.txt'', I ran the command <code>more application.txt</code> to discover what the columns of data stood for and recognized what kind of table values would be present in each column (''int'', ''varchar'', ''boolean'', etc.). The following SQL command created the table I needed for the data within the ''application.txt'' file:
 +
create table drugapplication (ApplNo int primary key, ApplType varchar, SponsorApplicant varchar, MostRecentLabelAvailableFlag boolean, CurrentPatentFlag boolean, ActionType varchar, Chemical_Type int, Ther_Potential varchar, Orphan_Code varchar);
  
* To unzip this file, use this command:
+
=== Product ===
 +
Similarly to the ''application.txt'' file, I ran the command <code> more Product.txt</code> in order to see the first line of data that provided the description of each column within the file. Therefore, I ran the following SQL command keeping in mind the names of the columns and the types of values present within the columns:
 +
create table Product (ApplNo int, ProductNo int, Form varchar, Dosage varchar, ProductMktStatus varchar, TECode varchar, ReferenceDrug int, drugname varchar, activeingred varchar);
  
    unzip UCM054599.zip
+
== Processing the data files for ''Application'' and ''Product'' entities to load into tables ==
  
: ''(if the downloaded filename is different from the one shown, use that filename instead)''
+
Explanation here.
  
* Upon successful unzipping, you should see the following files:
+
Application.txt <code>sed</code> command pipeline to prepare for insertion into DrugApplication table:
 +
cat application.txt | sed "s/\t/#/g" | sed -r "s/( ){2}//g" | sed "s/ \#/#/g" | sed "s/\# /#/g" | sed "s/##/#null#/g" |
 +
sed "s/##/#null#/g" | sed "s/#\r$/#null/g" | sed "s/#/,'/1" | sed "s/#/','/1" | sed "s/#/',/1" | sed "s/#/,/1" | sed "s/#/,'/1" |
 +
sed "s/#/',/1" | sed "s/#/,'/1" | sed "s/#/','/1" | sed "s/#/,'/1" | sed "s/\r$/'/g" | sed "s/'null'/null/g" | sed "s/'null/null/g" |
 +
sed "s/^/insert in drugapplication (ApplNo,ApplType,SponsorApplicant,MostRecentLabelAvailableFlag,CurrentPatentFlag,ActionType,Chemical_Type,Ther_Potential,Orphan_Code) values(/g" |
 +
sed "s/$/);/g" | sed 1D > ~/public_html/drugapplication.sql.txt
  
    $ ls
+
  Product.txt <code>sed</code> command pipeline to prepare for insertion into DrugProduct table:
    AppDoc.txt            application.txt    DocType_lookup.txt Product.txt       ReviewClass_Lookup.txt
+
  cat Product.txt | sed "s/\*\*Federal Register determination that product was not discontinued or withdrawn for safety or efficacy reasons\*\*//g" |
    AppDocType_Lookup.txt ChemTypeLookup.txt  Product_tecode.txt RegActionDate.txt UCM054599.zip
+
  sed "s/\*\*Federal Register determination that product was not discontinued or withdrawn for safety or efficacy\*\*//g" |
 +
sed "s/\*\*Federal Register notice determination that product was not discontinued or withdrawn for safety or efficacy reasons\*\*//g" |
 +
sed "s/\*\*Federal Registration determination that product was not discontinued or withdrawn for safety or efficacy reason\*\*//g" |
 +
  sed "s/See current Annual Edition, 1.8 Description of Special Situations, Levothyroxine Sodium//g" |
 +
  sed "s/'//g" | sed "s/\t/#/g" | sed -r "s/( ){2}//g" | sed "s/ \#/#/g" | sed "s/\# /#/g" | sed "s/\#\#/#null#/g" | sed "s/\#\#/#null#/g" |
 +
sed "s/\#/,/1" | sed "s/\#/,'/1" | sed "s/\#/','/1" | sed "s/\#/',/1" | sed "s/\#/,'/1" | sed "s/\#/',/1" | sed "s/\#/,'/1" | sed "s/\#/','/1" |
 +
sed "s/\r$/'/g" | sed "s/^/insert into drugproduct (ApplNo,ProductNo,Form,Dosage,ProductMktStatus,TECode,ReferenceDrug,drugname,activeingred) values(/g" |
 +
sed "s/$/);/g" | sed 1D > ~/public_html/drugproduct.sql.txt
  
: As stated in the tasks above, you will only need to work with the ''application.txt'' and ''Product.txt'' files.
+
== Questions to Answer ==
 
+
Part of the assignment, of course, is to determine how to rework these files so that their data can be loaded into PostgreSQL via the pgAdmin III desktop application on the Seaver 120 workstations. Just as with the ''movie_titles.txt'' file, you do this by “redirecting” the processed data to your ''~/public_html'' folder:
+
 
+
    cat application.txt | ''your command sequence here'' > ~/public_html/application.sql.txt
+
    cat Product.txt | ''your command sequence here'' > ~/public_html/Product.sql.txt
+
 
+
You can then download these files to the Seaver 120 desktop environment by visiting these sites with a web browser:
+
 
+
* <nowiki>http://my.cs.lmu.edu/~</nowiki>''username''/application.sql.txt
+
* <nowiki>http://my.cs.lmu.edu/~</nowiki>''username''/Product.sql.txt
+
 
+
Again, note that these latter steps exactly mirror the tutorial for the ''movie_titles.txt'' file, so please refer to [[PostgreSQL_Tutorial|that wiki page]] for additional details.
+
 
+
==== Supplementary Information ====
+
 
+
Just as with [[Week 4]], real-world data have their share of variations and exceptions. For these files:
+
* To visualize the raw data more easily, you can load the files into Microsoft Excel. ''But do '''not''' use Excel to process them into SQL statements.''
+
* A <code>bit</code> column can either be a <code>boolean</code> or <code>int</code> data type in PostgreSQL—examine the data to see what values are in there in order to make the right choice.
+
* If you see “nulls” in the website schema, that simply means that the value can be empty.
+
* For our purposes, you don’t need to define a primary key for the ''Product'' table. It does, however, have a foreign key.
+
* The first lines of all of these files contain the column names. You don’t need them because the SQL <code>insert</code> statement, as shown in class and in the [[PostgreSQL Tutorial]], specifies these already.
+
* Instead of commas as in the ''movie_titles.txt'' file, the columns in these files are separated by ''tabs''. You can indicate a ''tab'' in <code>sed</code> using the characters <code>\t</code>.
+
* Due to the way these files are formatted, the “end of the line” should be designated using the pattern <code>\r$</code> and not the dollar sign by itself.
+
* PostgreSQL can handle numbers and booleans with or without single quotes, so both <code>5</code> and <code>'5'</code> are valid. In addition, boolean (true/false) values are case-insensitive, so <code>'True'</code> and <code>'False'</code> will be acceptable to the system.
+
 
+
=== Questions to Answer ===
+
  
 
# Provide the DDL (<code>create table</code>) statements that you used for your ''application'' and ''product'' tables.
 
# Provide the DDL (<code>create table</code>) statements that you used for your ''application'' and ''product'' tables.
 +
#* DrugApplication Table: <code>create table drugapplication (ApplNo int primary key, ApplType varchar, SponsorApplicant varchar, MostRecentLabelAvailableFlag boolean, CurrentPatentFlag boolean, ActionType varchar, Chemical_Type int, Ther_Potential varchar, Orphan_Code varchar);</code>
 +
#* DrugProduct Table: <code>create table Product (ApplNo int, ProductNo int, Form varchar, Dosage varchar, ProductMktStatus varchar, TECode varchar, ReferenceDrug int, drugname varchar, activeingred varchar);</code>
 
# Provide the <code>sed</code> command sequences that you used to convert the raw text files into sequences of SQL <code>insert</code> statements.
 
# Provide the <code>sed</code> command sequences that you used to convert the raw text files into sequences of SQL <code>insert</code> statements.
 +
#* DrugApplication: <code>cat application.txt | sed "s/\t/#/g" | sed -r "s/( ){2}//g" | sed "s/ \#/#/g" | sed "s/\# /#/g" | sed "s/##/#null#/g" | sed "s/##/#null#/g" | sed "s/#\r$/#null/g" | sed "s/#/,'/1" | sed "s/#/','/1" | sed "s/#/',/1" | sed "s/#/,/1" | sed "s/#/,'/1" | sed "s/#/',/1" | sed "s/#/,'/1" | sed "s/#/','/1" | sed "s/#/,'/1" | sed "s/\r$/'/g" | sed "s/'null'/null/g" | sed "s/'null/null/g" | sed "s/^/insert in drugapplication (ApplNo,ApplType,SponsorApplicant,MostRecentLabelAvailableFlag,CurrentPatentFlag,ActionType,Chemical_Type,Ther_Potential,Orphan_Code) values(/g" | sed "s/$/);/g" | sed 1D > ~/public_html/drugapplication.sql.txt</code>
 +
#* DrugProduct: <code>cat Product.txt | sed "s/\*\*Federal Register determination that product was not discontinued or withdrawn for safety or efficacy reasons\*\*//g" | sed "s/\*\*Federal Register determination that product was not discontinued or withdrawn for safety or efficacy\*\*//g" | sed "s/\*\*Federal Register notice determination that product was not discontinued or withdrawn for safety or efficacy reasons\*\*//g" | sed "s/\*\*Federal Registration determination that product was not discontinued or withdrawn for safety or efficacy reason\*\*//g" | sed "s/See current Annual Edition, 1.8 Description of Special Situations, Levothyroxine Sodium//g" | sed "s/'//g" | sed "s/\t/#/g" | sed -r "s/( ){2}//g" | sed "s/ \#/#/g" | sed "s/\# /#/g" | sed "s/\#\#/#null#/g" | sed "s/\#\#/#null#/g" | sed "s/\#/,/1" | sed "s/\#/,'/1" | sed "s/\#/','/1" | sed "s/\#/',/1" | sed "s/\#/,'/1" | sed "s/\#/',/1" | sed "s/\#/,'/1" | sed "s/\#/','/1" | sed "s/\r$/'/g" | sed "s/^/insert into drugproduct (ApplNo,ProductNo,Form,Dosage,ProductMktStatus,TECode,ReferenceDrug,drugname,activeingred) values(/g" | sed "s/$/);/g" | sed 1D > ~/public_html/drugproduct.sql.txt</code>
 
# Using the command line, how can you determine the number of records in each file? Provide the command.
 
# Using the command line, how can you determine the number of records in each file? Provide the command.
 +
#* ''application.txt'' - <code>wc application.txt</code>: '''19747''' lines of data; however, 1st line is column headings, so 19747 - 1 = 19746 records in ''application.txt'' file.
 +
#* ''Product.txt'' - <code>wc Product.txt</code>: '''32771''' lines of data; however, 1st line is column headings, so 32771 - 1 = 32770 records in ''Product.txt'' file.
 
# Using SQL, how can you determine the number of records in the table corresponding to the file? Provide the SQL <code>select</code> statement.
 
# Using SQL, how can you determine the number of records in the table corresponding to the file? Provide the SQL <code>select</code> statement.
 +
#* ''application.txt'' - <code>select count(*) from drugapplication</code>: '''19746''' records
 +
#* ''Product.txt'' - <code>select count(*) from drugproduct</code>: '''32770''' records
 +
#* '''Note:''' Similar results compared to using command line, just as expected.
 
# In your database, are these numbers the same or different? Explain why you think so.
 
# In your database, are these numbers the same or different? Explain why you think so.
 +
#* The numbers are the same because all of the lines of data from the text files was transferred into the SQL tables (except for the first line of each text file, which contained the column headings). The record numbers should equal each other.
 
#: ''For the remaining questions, supply (a) the answer to the question and (b) the SQL statement that you used to answer the question.''
 
#: ''For the remaining questions, supply (a) the answer to the question and (b) the SQL statement that you used to answer the question.''
 
# What are the names of the drug products that are administered in the ''form'' <code>INJECTABLE;INTRAVENOUS, SUBCUTANEOUS</code>?
 
# What are the names of the drug products that are administered in the ''form'' <code>INJECTABLE;INTRAVENOUS, SUBCUTANEOUS</code>?
 +
#* a)  '''ACTEMRA''', '''AZACITIDINE''', '''ENOXAPARIN SODIUM''', '''LOVENOX''', '''VELCADE''', '''VIDAZA'''
 +
#* b) <code>select drugname from drugproduct where form = 'INJECTABLE;INTRAVENOUS, SUBCUTANEOUS';</code>
 
# What are the names of the drug products whose active ingredient (''activeingred'') is <code>ATROPINE</code>?
 
# What are the names of the drug products whose active ingredient (''activeingred'') is <code>ATROPINE</code>?
 +
#* a) '''ATROPEN''', '''ATROPINE'''
 +
#* b) <code>select drugname from drugproduct where activeingred = 'ATROPINE';</code>
 
# In what ''form''s and ''dosage''s can the drug product named <code>BENADRYL</code> be administered?
 
# In what ''form''s and ''dosage''s can the drug product named <code>BENADRYL</code> be administered?
 +
#* a)"CAPSULE;ORAL";"50MG", "ELIXIR;ORAL";"12.5MG/5ML", "CAPSULE;ORAL";"25MG", "INJECTABLE;INJECTION";"10MG/ML", "INJECTABLE;INJECTION";"50MG/ML"
 +
#* b)<code>select form, dosage from drugproduct where drugname = 'BENADRYL';</code>
 
# Which drug products have a name ''ending'' in <code>ESTROL</code>?
 
# Which drug products have a name ''ending'' in <code>ESTROL</code>?
 +
#* a)"DIETHYLSTILBESTROL", "STILBESTROL", "DIENESTROL"
 +
#* b)<code>select drugname from drugproduct where drugname like '%ESTROL';</code>
 
# Produce a table listing all of the known values for the ''therapeutic_potential'' column in the ''application'' table and how many ''application'' records there are of each. (''Side note:'' The ''therapeutic_potential'' codes are explained in the ''ReviewClass_Lookup.txt'' file, in case you’re interested.)
 
# Produce a table listing all of the known values for the ''therapeutic_potential'' column in the ''application'' table and how many ''application'' records there are of each. (''Side note:'' The ''therapeutic_potential'' codes are explained in the ''ReviewClass_Lookup.txt'' file, in case you’re interested.)
 +
#* a) [[Image:Therapeutic Potential.png]]
 +
#* b)<code>select ther_potential, count(*) from drugapplication group by ther_potential order by count desc</code>
 
# Produce a table listing all of the known values for the ''chemical_type'' column in the ''application'' table and how many ''application'' records there are of each. (''Side note:'' The ''chemical_type'' codes are explained in the ''ChemTypeLookup.txt'' file, in case you’re interested.)
 
# Produce a table listing all of the known values for the ''chemical_type'' column in the ''application'' table and how many ''application'' records there are of each. (''Side note:'' The ''chemical_type'' codes are explained in the ''ChemTypeLookup.txt'' file, in case you’re interested.)
 +
#* a) [[Image: Chemical Type.png]]
 +
#* b)<code>select chemical_type, count(*) from drugapplication group by chemical_type order by count desc</code>
 
# What are the names of the drug products that are sponsored (''sponsor applicant'' column) by <code>MERCK</code>?
 
# What are the names of the drug products that are sponsored (''sponsor applicant'' column) by <code>MERCK</code>?
 +
#* a) '''DECADRON, HUMORSOL, NEO-HYDELTRASOL, PRINIVIL, MAXALT-MLT, HYDROCORTONE, PERIACTIN, PROPECIA, PROSCAR, CLINORIL, PRINZIDE, ELSPAR, ALDOMET, ALDORIL D30, EMEND, DIUPRES-500, NEODECADRON, ALDOCLOR-250, MEFOXIN IN DEXTROSE 5% IN PLASTIC CONTAINER, ARAMINE, TIAMATE, BLOCADREN, CANCIDAS, CHIBROXIN, CORTONE, PEPCID, TRUSOPT, REDISOL, VIOXX, FLOROPRYL, ALPHAREDISOL, DECADRON-LA, MEFOXIN IN SODIUM CHLORIDE 0.9% IN PLASTIC CONTAINER, PEPCID RPD, NOROXIN, ZOCOR, LERITINE, DECADERM, DOLOBID, MANNITOL 25%, DECADRON W/ XYLOCAINE, ALDORIL D50, TIMOLIDE 10-25, CYCLAINE, HYDROPRES 25, AMINOHIPPURATE SODIUM, MEVACOR, MODURETIC 5-50, ALDORIL 25, SINGULAIR, COLBENEMID, DIUPRES-250, HYDELTRA-TBA, PRIMAXIN, BENEMID, MAXALT, FOSAMAX PLUS D, HYDRODIURIL, HYDELTRASOL, HYDROPRES 50, ZOLINZA, ALDORIL 15, FOSAMAX, DECASPRAY, COGENTIN, ALDOCLOR-150, PEPCID PRESERVATIVE FREE'''
 +
#* b) <code>select drugname from product inner join application on (product.applno = application.applno) where sponsorapplicant = 'MERCK' group by drugname</code>
 
# Which ''sponsor applicant'' companies have the text <code>LABS</code> in their names and have products whose active ingredients (''activeingred'') include ''both'' <code>ASPIRIN</code> and <code>CAFFEINE</code>?
 
# Which ''sponsor applicant'' companies have the text <code>LABS</code> in their names and have products whose active ingredients (''activeingred'') include ''both'' <code>ASPIRIN</code> and <code>CAFFEINE</code>?
 +
#* a) '''ACTAVIS LABS UT INC''' and '''WATSON LABS'''
 +
#* b) <code>select sponsorapplicant from application inner join product on (application.applno = product.applno) where sponsorapplicant like '%LABS%' and activeingred like '%ASPIRIN%' and activeingred like '%CAFFEINE%' group by sponsorapplicant;</code>
  
== Shared Journal Assignment ==
+
= Links to User Page and Journal Pages =
 
+
{{Template:Rlegaspi}}
* Store your journal entry in the shared [[Class Journal Week 6]] page.  If this page does not exist yet, go ahead and create it (congratulations on getting in first&nbsp;''':)'''&nbsp;)
+
'''Homework Partner:''' [[User:Jwoodlee | Jake Woodlee]]
* Link to your journal entry from your user page.
+
* Link back from the journal entry to your user page.
+
**'''''NOTE: you can easily fulfill the links part of these instructions by adding them to your template and using the template on your user page.'''''
+
* Sign your portion of the journal with the standard wiki signature shortcut (<code><nowiki>~~~~</nowiki></code>).
+
* Add the "Journal Entry" and "Shared" categories to the end of the wiki page (if someone has not already done so).
+
 
+
=== Read ===
+
 
+
* This week we return to [http://www.bloomberg.com/graphics/2015-paul-ford-what-is-code/ “What is Code?”], now looking at these sections within Section 5, “The Time You Attended the E-mail Address Validation Meeting:”
+
:5.1 What is the Relationship Between Code and Data?
+
:5.2 Where Does Data Live?
+
:5.3 The Language of White Collars
+
 
+
(feel free to read the other parts of Section 5; we focus on these three because these relate most directly to this assignment, with the third looking to the future...of this semester)
+
 
+
=== Reflect ===
+
 
+
# Based on what you have seen of the FDA database, do you feel that you have a better understanding of how the data mentioned in section 5.1—Spotify music, Fitbit exercise tracking, Twitter tweets, IRS tax returns, etc.—might look when stored on a computer?
+
# Section 5.2 half-jokingly says that, by building a bookstore, you actually built the death of bookstores. You just built a drug database—have you actually built the death of pharmacies? What do you think of this analogy?
+
# Section 5.3 says that the Java language can “talk to a database.” But ''you'' just “talked” to a database in this assignment. Why do you think you would need a programming language to do the talking?
+

Latest revision as of 03:22, 15 October 2015

The FDA Drug Database

The United States Food and Drug Administration (FDA) provides, as a matter of public record, the full data set for its approved drugs at this website: http://www.fda.gov/Drugs/InformationOnDrugs/ucm079750.htm

The site includes both a link to the downloadable files (compressed in .zip format) and the schema (“entity relationship diagram”) for those files.

Using these files, what you have learned about sed and SQL thus far, and additional information found in this wiki and on the aforementioned FDA website, do the following:

  1. Download and uncompress the files.
  2. Define appropriate tables for the Application and Product entities.
  3. Process the data files for these entities then load them into those tables.
  4. Answer the questions below.

Electronic Lab Notebook

Downloading and Unzipping FDA Drug Information File

  • Like any other assignment using Putty, I logged through the Putty application on one of the SEA 120 computers.
  • I bypassed the website and downloaded the file directly to my.cs.lmu.edu using the following command provided on the Week 6 Assignment Page:
curl -O http://www.fda.gov/downloads/Drugs/InformationOnDrugs/UCM054599.zip
  • I unzipped the file, using this command (also provided on the Week 6 Assignment Page):
unzip UCM054599.zip
  • I was given the results that were expected from unzipping and now have the files application.txt and Product.txt in my home folder, which are the files needed for the rest of this assignment.

Defining appropriate tables for the Application and Product entities

Application

In order to analyze the data in application.txt, I ran the command more application.txt to discover what the columns of data stood for and recognized what kind of table values would be present in each column (int, varchar, boolean, etc.). The following SQL command created the table I needed for the data within the application.txt file:

create table drugapplication (ApplNo int primary key, ApplType varchar, SponsorApplicant varchar, MostRecentLabelAvailableFlag boolean, CurrentPatentFlag boolean, ActionType varchar, Chemical_Type int, Ther_Potential varchar, Orphan_Code varchar);

Product

Similarly to the application.txt file, I ran the command more Product.txt in order to see the first line of data that provided the description of each column within the file. Therefore, I ran the following SQL command keeping in mind the names of the columns and the types of values present within the columns:

create table Product (ApplNo int, ProductNo int, Form varchar, Dosage varchar, ProductMktStatus varchar, TECode varchar, ReferenceDrug int, drugname varchar, activeingred varchar);

Processing the data files for Application and Product entities to load into tables

Explanation here.

Application.txt sed command pipeline to prepare for insertion into DrugApplication table:
cat application.txt | sed "s/\t/#/g" | sed -r "s/( ){2}//g" | sed "s/ \#/#/g" | sed "s/\# /#/g" | sed "s/##/#null#/g" | 
sed "s/##/#null#/g" | sed "s/#\r$/#null/g" | sed "s/#/,'/1" | sed "s/#/','/1" | sed "s/#/',/1" | sed "s/#/,/1" | sed "s/#/,'/1" | 
sed "s/#/',/1" | sed "s/#/,'/1" | sed "s/#/','/1" | sed "s/#/,'/1" | sed "s/\r$/'/g" | sed "s/'null'/null/g" | sed "s/'null/null/g" | 
sed "s/^/insert in drugapplication (ApplNo,ApplType,SponsorApplicant,MostRecentLabelAvailableFlag,CurrentPatentFlag,ActionType,Chemical_Type,Ther_Potential,Orphan_Code) values(/g" | 
sed "s/$/);/g" | sed 1D > ~/public_html/drugapplication.sql.txt
Product.txt sed command pipeline to prepare for insertion into DrugProduct table:
cat Product.txt | sed "s/\*\*Federal Register determination that product was not discontinued or withdrawn for safety or efficacy reasons\*\*//g" | 
sed "s/\*\*Federal Register determination that product was not discontinued or withdrawn for safety or efficacy\*\*//g" | 
sed "s/\*\*Federal Register notice determination that product was not discontinued or withdrawn for safety or efficacy reasons\*\*//g" | 
sed "s/\*\*Federal Registration determination that product was not discontinued or withdrawn for safety or efficacy reason\*\*//g" | 
sed "s/See current Annual Edition, 1.8 Description of Special Situations, Levothyroxine Sodium//g" | 
sed "s/'//g" | sed "s/\t/#/g" | sed -r "s/( ){2}//g" | sed "s/ \#/#/g" | sed "s/\# /#/g" | sed "s/\#\#/#null#/g" | sed "s/\#\#/#null#/g" | 
sed "s/\#/,/1" | sed "s/\#/,'/1" | sed "s/\#/','/1" | sed "s/\#/',/1" | sed "s/\#/,'/1" | sed "s/\#/',/1" | sed "s/\#/,'/1" | sed "s/\#/','/1" | 
sed "s/\r$/'/g" | sed "s/^/insert into drugproduct (ApplNo,ProductNo,Form,Dosage,ProductMktStatus,TECode,ReferenceDrug,drugname,activeingred) values(/g" | 
sed "s/$/);/g" | sed 1D > ~/public_html/drugproduct.sql.txt

Questions to Answer

  1. Provide the DDL (create table) statements that you used for your application and product tables.
    • DrugApplication Table: create table drugapplication (ApplNo int primary key, ApplType varchar, SponsorApplicant varchar, MostRecentLabelAvailableFlag boolean, CurrentPatentFlag boolean, ActionType varchar, Chemical_Type int, Ther_Potential varchar, Orphan_Code varchar);
    • DrugProduct Table: create table Product (ApplNo int, ProductNo int, Form varchar, Dosage varchar, ProductMktStatus varchar, TECode varchar, ReferenceDrug int, drugname varchar, activeingred varchar);
  2. Provide the sed command sequences that you used to convert the raw text files into sequences of SQL insert statements.
    • DrugApplication: cat application.txt | sed "s/\t/#/g" | sed -r "s/( ){2}//g" | sed "s/ \#/#/g" | sed "s/\# /#/g" | sed "s/##/#null#/g" | sed "s/##/#null#/g" | sed "s/#\r$/#null/g" | sed "s/#/,'/1" | sed "s/#/','/1" | sed "s/#/',/1" | sed "s/#/,/1" | sed "s/#/,'/1" | sed "s/#/',/1" | sed "s/#/,'/1" | sed "s/#/','/1" | sed "s/#/,'/1" | sed "s/\r$/'/g" | sed "s/'null'/null/g" | sed "s/'null/null/g" | sed "s/^/insert in drugapplication (ApplNo,ApplType,SponsorApplicant,MostRecentLabelAvailableFlag,CurrentPatentFlag,ActionType,Chemical_Type,Ther_Potential,Orphan_Code) values(/g" | sed "s/$/);/g" | sed 1D > ~/public_html/drugapplication.sql.txt
    • DrugProduct: cat Product.txt | sed "s/\*\*Federal Register determination that product was not discontinued or withdrawn for safety or efficacy reasons\*\*//g" | sed "s/\*\*Federal Register determination that product was not discontinued or withdrawn for safety or efficacy\*\*//g" | sed "s/\*\*Federal Register notice determination that product was not discontinued or withdrawn for safety or efficacy reasons\*\*//g" | sed "s/\*\*Federal Registration determination that product was not discontinued or withdrawn for safety or efficacy reason\*\*//g" | sed "s/See current Annual Edition, 1.8 Description of Special Situations, Levothyroxine Sodium//g" | sed "s/'//g" | sed "s/\t/#/g" | sed -r "s/( ){2}//g" | sed "s/ \#/#/g" | sed "s/\# /#/g" | sed "s/\#\#/#null#/g" | sed "s/\#\#/#null#/g" | sed "s/\#/,/1" | sed "s/\#/,'/1" | sed "s/\#/','/1" | sed "s/\#/',/1" | sed "s/\#/,'/1" | sed "s/\#/',/1" | sed "s/\#/,'/1" | sed "s/\#/','/1" | sed "s/\r$/'/g" | sed "s/^/insert into drugproduct (ApplNo,ProductNo,Form,Dosage,ProductMktStatus,TECode,ReferenceDrug,drugname,activeingred) values(/g" | sed "s/$/);/g" | sed 1D > ~/public_html/drugproduct.sql.txt
  3. Using the command line, how can you determine the number of records in each file? Provide the command.
    • application.txt - wc application.txt: 19747 lines of data; however, 1st line is column headings, so 19747 - 1 = 19746 records in application.txt file.
    • Product.txt - wc Product.txt: 32771 lines of data; however, 1st line is column headings, so 32771 - 1 = 32770 records in Product.txt file.
  4. Using SQL, how can you determine the number of records in the table corresponding to the file? Provide the SQL select statement.
    • application.txt - select count(*) from drugapplication: 19746 records
    • Product.txt - select count(*) from drugproduct: 32770 records
    • Note: Similar results compared to using command line, just as expected.
  5. In your database, are these numbers the same or different? Explain why you think so.
    • The numbers are the same because all of the lines of data from the text files was transferred into the SQL tables (except for the first line of each text file, which contained the column headings). The record numbers should equal each other.
    For the remaining questions, supply (a) the answer to the question and (b) the SQL statement that you used to answer the question.
  6. What are the names of the drug products that are administered in the form INJECTABLE;INTRAVENOUS, SUBCUTANEOUS?
    • a) ACTEMRA, AZACITIDINE, ENOXAPARIN SODIUM, LOVENOX, VELCADE, VIDAZA
    • b) select drugname from drugproduct where form = 'INJECTABLE;INTRAVENOUS, SUBCUTANEOUS';
  7. What are the names of the drug products whose active ingredient (activeingred) is ATROPINE?
    • a) ATROPEN, ATROPINE
    • b) select drugname from drugproduct where activeingred = 'ATROPINE';
  8. In what forms and dosages can the drug product named BENADRYL be administered?
    • a)"CAPSULE;ORAL";"50MG", "ELIXIR;ORAL";"12.5MG/5ML", "CAPSULE;ORAL";"25MG", "INJECTABLE;INJECTION";"10MG/ML", "INJECTABLE;INJECTION";"50MG/ML"
    • b)select form, dosage from drugproduct where drugname = 'BENADRYL';
  9. Which drug products have a name ending in ESTROL?
    • a)"DIETHYLSTILBESTROL", "STILBESTROL", "DIENESTROL"
    • b)select drugname from drugproduct where drugname like '%ESTROL';
  10. Produce a table listing all of the known values for the therapeutic_potential column in the application table and how many application records there are of each. (Side note: The therapeutic_potential codes are explained in the ReviewClass_Lookup.txt file, in case you’re interested.)
    • a) Therapeutic Potential.png
    • b)select ther_potential, count(*) from drugapplication group by ther_potential order by count desc
  11. Produce a table listing all of the known values for the chemical_type column in the application table and how many application records there are of each. (Side note: The chemical_type codes are explained in the ChemTypeLookup.txt file, in case you’re interested.)
    • a) Chemical Type.png
    • b)select chemical_type, count(*) from drugapplication group by chemical_type order by count desc
  12. What are the names of the drug products that are sponsored (sponsor applicant column) by MERCK?
    • a) DECADRON, HUMORSOL, NEO-HYDELTRASOL, PRINIVIL, MAXALT-MLT, HYDROCORTONE, PERIACTIN, PROPECIA, PROSCAR, CLINORIL, PRINZIDE, ELSPAR, ALDOMET, ALDORIL D30, EMEND, DIUPRES-500, NEODECADRON, ALDOCLOR-250, MEFOXIN IN DEXTROSE 5% IN PLASTIC CONTAINER, ARAMINE, TIAMATE, BLOCADREN, CANCIDAS, CHIBROXIN, CORTONE, PEPCID, TRUSOPT, REDISOL, VIOXX, FLOROPRYL, ALPHAREDISOL, DECADRON-LA, MEFOXIN IN SODIUM CHLORIDE 0.9% IN PLASTIC CONTAINER, PEPCID RPD, NOROXIN, ZOCOR, LERITINE, DECADERM, DOLOBID, MANNITOL 25%, DECADRON W/ XYLOCAINE, ALDORIL D50, TIMOLIDE 10-25, CYCLAINE, HYDROPRES 25, AMINOHIPPURATE SODIUM, MEVACOR, MODURETIC 5-50, ALDORIL 25, SINGULAIR, COLBENEMID, DIUPRES-250, HYDELTRA-TBA, PRIMAXIN, BENEMID, MAXALT, FOSAMAX PLUS D, HYDRODIURIL, HYDELTRASOL, HYDROPRES 50, ZOLINZA, ALDORIL 15, FOSAMAX, DECASPRAY, COGENTIN, ALDOCLOR-150, PEPCID PRESERVATIVE FREE
    • b) select drugname from product inner join application on (product.applno = application.applno) where sponsorapplicant = 'MERCK' group by drugname
  13. Which sponsor applicant companies have the text LABS in their names and have products whose active ingredients (activeingred) include both ASPIRIN and CAFFEINE?
    • a) ACTAVIS LABS UT INC and WATSON LABS
    • b) select sponsorapplicant from application inner join product on (application.applno = product.applno) where sponsorapplicant like '%LABS%' and activeingred like '%ASPIRIN%' and activeingred like '%CAFFEINE%' group by sponsorapplicant;

Links to User Page and Journal Pages

Ron Legaspi
BIOL 367, Fall 2015

Assignment Links
Individual Weekly Journals
Shared Weekly Journals

Homework Partner: Jake Woodlee