Difference between revisions of "Jwoodlee Week 6"
(→Questions to Answer: added answers) |
(→Questions to Answer: added answers) |
||
Line 44: | Line 44: | ||
*In your database, are these numbers the same or different? Explain why you think so. | *In your database, are these numbers the same or different? Explain why you think so. | ||
** When I ran the commands I got exactly the same numbers, so I would say they are the same. This is the case because assuming we formatted the text files correctly, each line holds one insert command which adds a record to the table, therefore the number of lines represents the number of records. | ** When I ran the commands I got exactly the same numbers, so I would say they are the same. This is the case because assuming we formatted the text files correctly, each line holds one insert command which adds a record to the table, therefore the number of lines represents the number of records. | ||
− | *For the remaining questions, supply (a) the answer to the question and (b) the SQL statement that you used to answer the question. | + | *For the remaining questions, supply (a) the answer to the question and (b) the SQL statement that you used to answer the question. All questions can are answered from wiki. |
*What are the names of the drug products that are administered in the form INJECTABLE;INTRAVENOUS, SUBCUTANEOUS? | *What are the names of the drug products that are administered in the form INJECTABLE;INTRAVENOUS, SUBCUTANEOUS? | ||
*# "LOVENOX", "VELCADE", "VIDAZA", "ENOXAPARIN SODIUM", "ACTEMRA", "AZACITIDINE" | *# "LOVENOX", "VELCADE", "VIDAZA", "ENOXAPARIN SODIUM", "ACTEMRA", "AZACITIDINE" | ||
*# <code>select drugname from product where Form = 'INJECTABLE;INTRAVENOUS, SUBCUTANEOUS';</code> | *# <code>select drugname from product where Form = 'INJECTABLE;INTRAVENOUS, SUBCUTANEOUS';</code> | ||
*What are the names of the drug products whose active ingredient (activeingred) is ATROPINE? | *What are the names of the drug products whose active ingredient (activeingred) is ATROPINE? | ||
− | *# "ATROPEN", "ATROPINE" | + | *# "ATROPEN" (x4), "ATROPINE" |
*# <code> select drugname from product where activeingred = 'ATROPINE'; </code> | *# <code> select drugname from product where activeingred = 'ATROPINE'; </code> | ||
*In what forms and dosages can the drug product named BENADRYL be administered? | *In what forms and dosages can the drug product named BENADRYL be administered? | ||
Line 55: | Line 55: | ||
*#<code> select Form from product where drugname = 'BENADRYL'; </code> <code> select dosage from product where drugname = 'BENADRYL'; </code> | *#<code> select Form from product where drugname = 'BENADRYL'; </code> <code> select dosage from product where drugname = 'BENADRYL'; </code> | ||
*Which drug products have a name ending in ESTROL? | *Which drug products have a name ending in ESTROL? | ||
− | *# | + | *#"DIETHYLSTILBESTROL"(x11), "STILBESTROL"(x6), "DIENESTROL"(x1), "STILBESTROL"(x12) |
− | *# | + | *#<code>select drugname from Product where drugname ~'ESTROL$'</code> |
*Produce a table listing all of the known values for the therapeutic_potential column in the application table and how many application records there are of each. (Side note: The therapeutic_potential codes are explained in the ReviewClass_Lookup.txt file, in case you’re interested.) | *Produce a table listing all of the known values for the therapeutic_potential column in the application table and how many application records there are of each. (Side note: The therapeutic_potential codes are explained in the ReviewClass_Lookup.txt file, in case you’re interested.) | ||
*# | *# | ||
Line 64: | Line 64: | ||
*# | *# | ||
*What are the names of the drug products that are sponsored (sponsor applicant column) by MERCK? | *What are the names of the drug products that are sponsored (sponsor applicant column) by MERCK? | ||
− | *# | + | *#"DECADRON" |
− | *# | + | "HUMORSOL" |
+ | "NEO-HYDELTRASOL" | ||
+ | "PRINIVIL" | ||
+ | "MAXALT-MLT" | ||
+ | "HYDROCORTONE" | ||
+ | "PERIACTIN" | ||
+ | "PROPECIA" | ||
+ | "PROSCAR" | ||
+ | "CLINORIL" | ||
+ | "PRINZIDE" | ||
+ | "ELSPAR" | ||
+ | "ALDOMET" | ||
+ | "ALDORIL D30" | ||
+ | "EMEND" | ||
+ | "DIUPRES-500" | ||
+ | "NEODECADRON" | ||
+ | "ALDOCLOR-250" | ||
+ | "MEFOXIN IN DEXTROSE 5% IN PLASTIC CONTAINER" | ||
+ | "ARAMINE" | ||
+ | "TIAMATE" | ||
+ | "BLOCADREN" | ||
+ | "CANCIDAS" | ||
+ | "CHIBROXIN" | ||
+ | "CORTONE" | ||
+ | "PEPCID" | ||
+ | "TRUSOPT" | ||
+ | "REDISOL" | ||
+ | "VIOXX" | ||
+ | "FLOROPRYL" | ||
+ | "ALPHAREDISOL" | ||
+ | "DECADRON-LA" | ||
+ | "MEFOXIN IN SODIUM CHLORIDE 0.9% IN PLASTIC CONTAINER" | ||
+ | "PEPCID RPD" | ||
+ | "NOROXIN" | ||
+ | "ZOCOR" | ||
+ | "LERITINE" | ||
+ | "DECADERM" | ||
+ | "DOLOBID" | ||
+ | "MANNITOL 25%" | ||
+ | "DECADRON W/ XYLOCAINE" | ||
+ | "ALDORIL D50" | ||
+ | "TIMOLIDE 10-25" | ||
+ | "CYCLAINE" | ||
+ | "HYDROPRES 25" | ||
+ | "AMINOHIPPURATE SODIUM" | ||
+ | "MEVACOR" | ||
+ | "MODURETIC 5-50" | ||
+ | "ALDORIL 25" | ||
+ | "SINGULAIR" | ||
+ | "COLBENEMID" | ||
+ | "DIUPRES-250" | ||
+ | "HYDELTRA-TBA" | ||
+ | "PRIMAXIN" | ||
+ | "BENEMID" | ||
+ | "MAXALT" | ||
+ | "FOSAMAX PLUS D" | ||
+ | "HYDRODIURIL" | ||
+ | "HYDELTRASOL" | ||
+ | "HYDROPRES 50" | ||
+ | "ZOLINZA" | ||
+ | "ALDORIL 15" | ||
+ | "FOSAMAX" | ||
+ | "DECASPRAY" | ||
+ | "COGENTIN" | ||
+ | "ALDOCLOR-150" | ||
+ | "PEPCID PRESERVATIVE FREE" | ||
+ | *#select drugname from product,application where product.applno = application.applno and application.sponsorapplicant = 'MERCK' group by drugname | ||
*Which sponsor applicant companies have the text LABS in their names and have products whose active ingredients (activeingred) include both ASPIRIN and CAFFEINE? | *Which sponsor applicant companies have the text LABS in their names and have products whose active ingredients (activeingred) include both ASPIRIN and CAFFEINE? | ||
− | *# | + | *#"WATSONLABS" |
− | *# | + | "ACTAVISLABSUTINC" |
− | + | *#<code> select sponsorapplicant from product,application where application.applno = product.applno and sponsorapplicant ~ 'LABS' and activeingred ~ 'ASPIRIN' and activeingred ~ 'CAFFEINE' group by sponsorapplicant </code> | |
{{Template:Jwoodlee}} | {{Template:Jwoodlee}} |
Revision as of 03:56, 15 October 2015
Lab Notebook
I sshed into my home directory on my.cs.lmu.edu using the built in command. I downloaded the file from the FDA home page using terminal, and then unzipped the file using terminal as well. I then had all the appropriate files in my home directory.
At first I wasn't familiar with how the data was formatted in application.txt so I wasn't sure what variables to include in the tables, but after looking at the top of the text document, I found out the variables are: ApplNo, ApplType, SponsorApplicant, MostRecentLabelAvailableFlag, CurrentPatentFlag, ActionType, Chemical_Type, Ther_Potential, and Orphan_Code. It appears the data types are: int primary key, varchar, varchar, boolean, boolean, varchar, int, varchar, and varchar respectively. So with this information it is now pretty easy to make a table for application, see questions below. Next I needed to figure out the variables for product.txt. Opening the file it is easy to see the names for each column: ApplNo, ProductNo, Form, Dosage, ProductMktStatus, TECode, ReferenceDrug, drugname, activeingred. Based on the values of data in each individual column I would say that the data types are: int, int, varchar, varchar, varchar, varchar, int, varchar, and varchar respectively. Again, from this info it is easy to make a table for product.txt.
With the tables setup my goal was then to add the appropriate data, to do this I used the sed command. The first line of application.txt is useless so I removed that: cat application.txt | sed "1D".
Then, I wanted to remove all tabs because I thought there was a variable number of tabs between each piece of data. However, I was wrong and I discovered there were actually a bunch of extra spaces in the way of single tabs separating the data. What I thought was a variable number of tabs was really a variable number of spaces. Some of the data relies on spaces so I couldn't just delete all the spaces, instead I removed all spaces next to another space because no valuable spaces were ever right next to each other. I was then left with only the valuable spaces within the data. So I had: cat application.txt | sed "1D" | sed "s/ //g"
.
In class I heard the discussion about "\r" meaning the end of a line so I looked it up and it means "carriage return", which is a basically a character representing someone hitting the enter key. This became useful later.
I noticed many rows were short a few pieces of data so, assuming each piece of data is separated by a tab, I needed to insert a 'null' value in between all tabs that are right next to eachother. So I did this and noticed some rows were still short data which meant they had two pieces of data missing right next to each other, fortunately to remedy this I just ran the command again, so:
cat application.txt | sed "1D" | sed "s/ //g" | sed "s/\t\t/\tnull\t/g" | sed "s/\t\t/\tnull\t/g"
.
Now I had what I wanted. I had every piece of data separated by a single tab with the nulls inserted into the correct place, time to add the insert command!
Whoops I realized that a lot of my rows only had 8 pieces of data when I really needed 9. I realized most data didn't have an Orphan_Code so I needed to tack null onto the end of a lot of rows. To do this I used the carriage return. Every line that ends with a tab and a carriage return needs to have a null added to the end, so I did this:
cat application.txt | sed "1D" | sed "s/ //g" | sed "s/\t\t/\tnull\t/g" | sed "s/\t\t/\tnull\t/g" | sed "s/\t\r/\tnull\r/g"
Now that I have the data properly formatted and separated by tabs, I can insert the apostrophes for the varchars, and replace the tabs with commas.
With sed "s/<to be replaced>/<replacer>/1" one can replace the first instance of string in the <to be replaced> spot on the line. So using this it becomes pretty easy, albeit long, to insert commas and apostrophes. When all that is done there are a bunch of null values that look like: 'null' when they should have no apostrophes around them. To get rid of those simply replace 'null' with null. Then I used the carriage return to insert the final parenthesis, and I used the carrot to insert the beginning command. Final command is as follows:
cat application.txt | sed "1D" | sed "s/ //g" | sed "s/\t\t/\tnull\t/g" | sed "s/\t\t/\tnull\t/g" | sed "s/\t\r/\tnull\r/g" | sed "s/\t/,'/1" | sed "s/\t/','/1" | sed "s/\t/','/1" | sed "s/\t/','/1" | sed "s/\t/','/1" | sed "s/\t/',/1" | sed "s/\t/,'/1" | sed "s/\t/','/g" | sed "s/\r/'\r/g" | sed "s/'null'/null/g" | sed "s/^/insert into application(ApplNo,ApplType,SponsorApplicant,MostRecentLabelAvailableFlag,CurrentPatentFlag,ActionType,Chemical_Type,Ther_Potential,Orphan_Code) values(/g" | sed "s/\r/);\r/g" > ~/public_html/application.sql.txt
After reading the first line form Product.txt and gleaning the information from it, it becomes useless: cat Product.txt | sed "1D"
. Fortunately this one doesn't appear to have any weird extra spaces, so the procedure should be: insert nulls into correct spaces, add commas and apostrophes, add beginning and end of command. So.... cat Product.txt | sed "1D"
The added wrinkle for this one is that it has apostrophes within some of the data so to make sure those are included simple replace all apostrophes with a double apostrophe before the inserting of commas and apostrophes begins. Other than that it is the same basic format and was rather easy to come up with using what I did for application.txt.
cat Product.txt | sed "1D" | sed "s/\t\t/\tnull\t/g" | sed "s/\t\t/\tnull\t/g" | sed "s/\t\r/\tnull\r/g" | sed "s/'/''/g" | sed "s/\t/,/1" | sed "s/\t/,'/1" | sed "s/\t/','/1" | sed "s/\t/',/1" | sed "s/\t/,'/1"| sed "s/\t/',/1" | sed "s/\t/,'/1" | sed "s/\t/','/1" | sed "s/\r/'\r/g" | sed "s/'null'/null/g" | sed "s/^/insert into product(ApplNo, ProductNo, Form, Dosage, ProductMktStatus, TECode, ReferenceDrug, drugname, activeingred) values(/g" | sed "s/\r/);\r/g" > ~/public_html/product.sql.txt
Questions to Answer
- Provide the DDL (create table) statements that you used for your application and product tables.
- create table application (ApplNo int primary key, ApplType varchar, SponsorApplicant varchar, MostRecentLabelAvailableFlag boolean, CurrentPatentFlag boolean, ActionType varchar, Chemical_Type int, Ther_Potential varchar, Orphan_Code varchar)
- create table Product (ApplNo int, ProductNo int, Form varchar, Dosage varchar, ProductMktStatus varchar, TECode varchar, ReferenceDrug int, drugname varchar, activeingred varchar)
- Provide the sed command sequences that you used to convert the raw text files into sequences of SQL insert statements.
-
cat application.txt | sed "1D" | sed "s/ //g" | sed "s/\t\t/\tnull\t/g" | sed "s/\t\t/\tnull\t/g" | sed "s/\t\r/\tnull\r/g" | sed "s/\t/,'/1" | sed "s/\t/','/1" | sed "s/\t/','/1" | sed "s/\t/','/1" | sed "s/\t/','/1" | sed "s/\t/',/1" | sed "s/\t/,'/1" | sed "s/\t/','/g" | sed "s/\r/'\r/g" | sed "s/'null'/null/g" | sed "s/^/insert into application(AppleNo,ApplType,SponsorApplicant,MostRecentLabelAvailableFlag,CurrentPatentFlag,ActionType,Chemical_Type,Ther_Potential,Orphan_Code) values(/g" | sed "s/\r/);\r/g" > ~/public_html/application.sql.txt
cat Product.txt | sed "1D" | sed "s/\t\t/\tnull\t/g" | sed "s/\t\t/\tnull\t/g" | sed "s/\t\r/\tnull\r/g" | sed "s/'/''/g" | sed "s/\t/,/1" | sed "s/\t/,'/1" | sed "s/\t/','/1" | sed "s/\t/',/1" | sed "s/\t/,'/1" | sed "s/\t/',/1" | sed "s/\t/,'/1" | sed "s/\t/','/1" | sed "s/\r/'\r/g" | sed "s/'null'/null/g" | sed "s/^/insert into product(ApplNo, ProductNo, Form, Dosage, ProductMktStatus, TECode, ReferenceDrug, drugname, activeingred) values(/g" | sed "s/\r/);\r/g" > ~/public_html/product.sql.txt
-
- Using the command line, how can you determine the number of records in each file? Provide the command.
- The number or records is exactly the same as the number of lines in our modified file, therefore a simple piped
wc
command in place of the> ~/public_html/product.sql.txt
will yield the result on either file.
- The number or records is exactly the same as the number of lines in our modified file, therefore a simple piped
- Using SQL, how can you determine the number of records in the table corresponding to the file? Provide the SQL select statement.
- Using
Select count(*) from product
orSelect count(*) from application
will yield the number of records in the table.
- Using
- In your database, are these numbers the same or different? Explain why you think so.
- When I ran the commands I got exactly the same numbers, so I would say they are the same. This is the case because assuming we formatted the text files correctly, each line holds one insert command which adds a record to the table, therefore the number of lines represents the number of records.
- For the remaining questions, supply (a) the answer to the question and (b) the SQL statement that you used to answer the question. All questions can are answered from wiki.
- What are the names of the drug products that are administered in the form INJECTABLE;INTRAVENOUS, SUBCUTANEOUS?
- "LOVENOX", "VELCADE", "VIDAZA", "ENOXAPARIN SODIUM", "ACTEMRA", "AZACITIDINE"
-
select drugname from product where Form = 'INJECTABLE;INTRAVENOUS, SUBCUTANEOUS';
- What are the names of the drug products whose active ingredient (activeingred) is ATROPINE?
- "ATROPEN" (x4), "ATROPINE"
-
select drugname from product where activeingred = 'ATROPINE';
- In what forms and dosages can the drug product named BENADRYL be administered?
- Forms: "CAPSULE;ORAL", "ELIXIR;ORAL", "CAPSULE;ORAL", "INJECTABLE;INJECTION", "INJECTABLE;INJECTION" dosage: "50MG","12.5MG/5ML","25MG","10MG/ML", "50MG/ML"
select Form from product where drugname = 'BENADRYL';
select dosage from product where drugname = 'BENADRYL';
- Which drug products have a name ending in ESTROL?
- "DIETHYLSTILBESTROL"(x11), "STILBESTROL"(x6), "DIENESTROL"(x1), "STILBESTROL"(x12)
select drugname from Product where drugname ~'ESTROL$'
- Produce a table listing all of the known values for the therapeutic_potential column in the application table and how many application records there are of each. (Side note: The therapeutic_potential codes are explained in the ReviewClass_Lookup.txt file, in case you’re interested.)
- Produce a table listing all of the known values for the chemical_type column in the application table and how many application records there are of each. (Side note: The chemical_type codes are explained in the ChemTypeLookup.txt file, in case you’re interested.)
- What are the names of the drug products that are sponsored (sponsor applicant column) by MERCK?
- "DECADRON"
"HUMORSOL" "NEO-HYDELTRASOL" "PRINIVIL" "MAXALT-MLT" "HYDROCORTONE" "PERIACTIN" "PROPECIA" "PROSCAR" "CLINORIL" "PRINZIDE" "ELSPAR" "ALDOMET" "ALDORIL D30" "EMEND" "DIUPRES-500" "NEODECADRON" "ALDOCLOR-250" "MEFOXIN IN DEXTROSE 5% IN PLASTIC CONTAINER" "ARAMINE" "TIAMATE" "BLOCADREN" "CANCIDAS" "CHIBROXIN" "CORTONE" "PEPCID" "TRUSOPT" "REDISOL" "VIOXX" "FLOROPRYL" "ALPHAREDISOL" "DECADRON-LA" "MEFOXIN IN SODIUM CHLORIDE 0.9% IN PLASTIC CONTAINER" "PEPCID RPD" "NOROXIN" "ZOCOR" "LERITINE" "DECADERM" "DOLOBID" "MANNITOL 25%" "DECADRON W/ XYLOCAINE" "ALDORIL D50" "TIMOLIDE 10-25" "CYCLAINE" "HYDROPRES 25" "AMINOHIPPURATE SODIUM" "MEVACOR" "MODURETIC 5-50" "ALDORIL 25" "SINGULAIR" "COLBENEMID" "DIUPRES-250" "HYDELTRA-TBA" "PRIMAXIN" "BENEMID" "MAXALT" "FOSAMAX PLUS D" "HYDRODIURIL" "HYDELTRASOL" "HYDROPRES 50" "ZOLINZA" "ALDORIL 15" "FOSAMAX" "DECASPRAY" "COGENTIN" "ALDOCLOR-150" "PEPCID PRESERVATIVE FREE"
- select drugname from product,application where product.applno = application.applno and application.sponsorapplicant = 'MERCK' group by drugname
- Which sponsor applicant companies have the text LABS in their names and have products whose active ingredients (activeingred) include both ASPIRIN and CAFFEINE?
- "WATSONLABS"
"ACTAVISLABSUTINC"
select sponsorapplicant from product,application where application.applno = product.applno and sponsorapplicant ~ 'LABS' and activeingred ~ 'ASPIRIN' and activeingred ~ 'CAFFEINE' group by sponsorapplicant
BIOL 367, Fall 2015, User Page, Team Page
Weekly Assignments | Individual Journal Pages | Shared Journal Pages |
---|---|---|
|
|
|