Malverso Week 6
From LMU BioDB 2015
								
												
				Contents
FDA Drug Database
Direct Download/Unzipping Commands
- I logged onto PuTTy and also brought up my public folder on the web browser at the web address my.cs.lmu.edu/~malverso/
 - I downloaded and unzipped my files through the command line, using the curl and unzip commands.
 - I put the untouched file onto my public web page so I could visually see my progress.
 - My database is saved under the student account of PgAdminIII under the name FDA-malverso.
 
#1
application.txt
- I viewed the application.txt file using the more command to see the column titles, which are:
 
ApplNo ApplType SponsorApplicant MostRecentLabelAvailableFlag CurrentPatentFlag ActionType Chemical_Type Ther_Potential Orphan_Code
- I referenced the PostgreSQL Tutorial for how to create a table and http://www.fda.gov/Drugs/InformationOnDrugs/ucm079750.htm for primary key information. I then created a table using this command:
 
create table application (ApplNo varchar primary key, ApplType varchar, SponsorApplicant varchar, MostRecentLabelAvailableFlag boolean, CurrentPatentFlag boolean, ActionType varchar, Chemical_Type varchar, Therapeutic_Potential varchar, Orphan_Code varchar)
Product.txt
- I viewed the Product.txt file using the more command to see the column titles, which are:
 
ApplNo ProductNo Form Dosage ProductMktStatus TECode ReferenceDrug drugname activeingred
- I created a table for Product.txt using this command:
 
create table Product(ApplNo varchar, productNo varchar, Form varchar, Dosage varchar, ProductMktStatus int, TECode varchar, ReferenceDrug int, Drugname varchar, Activeingred varchar)
- I couldn't remeber how to declare a foreign key when I was making the table, and ended up adding it afterwards in with the help of the tutorial found on http://www.w3schools.com/sql/sql_foreignkey.asp .
 
Here are the additional SQL commands I used:
ALTER TABLE Product ADD FOREIGN KEY (ApplNo) REFERENCES application(ApplNo)
#2
application.txt
- I began by removing the first line, since that is just the column headers, using the sed command "1D".
 - I then aimed to put single quotes around all of the values and get rid of the spaces and replace the tabs with commas, which I did with only two sed commands, "s/^/'/g" and "s/\t/','/g".
 - I saw next that there was a lot of extra spaces after the values that were unnecessary. I tried to remove the extra spaces by using the sed command "s/( ){*}'/'/g", but that did nothing. I looked back at my assignment 4 to see that it was actually "s/ *'/'/g" that would remove all the extra spaces.
 - I also saw that the end of every line was misisng a ' character. I tried a bunch of sed commands to try and add this on, such as "s/$/'/g" but the only command that ended up working was "s/,'/,/8" , which I found buy counting out the 8 commas that separated the 9 columns. I reread the supplementary information section to realize my previous sed commands did not work because the end of lines were formatted differently...so I decided to change my sed command to "s/\r$/'/g"
 - Next I saw all of the empty fields and replaced the empty single quotes with the word null. Veronica informed me while I was doing this that null could not be surrounded by single quotes in order to be recognized accurately by SQL, so I used the sed command "s//null/g".
 - I then added the appropriate SQL commands to the beginning of my line so that the text would insert into the table.
 - I changed the previous sed command that added the ' to the end of the line to add '); to finish off the SQL command.
 
Here is my final command sequence (with line breaks for readability):
cat application.txt | sed "1D" | sed "s/\t/','/g" | sed "s/ *'/'/g" | sed "s/^/'/g" | sed "s/\r$/');/g" | sed "s/''/null/g" | sed "s/^/insert into application(ApplNo, ApplType, SponsorApplicant, MostRecentLabelAvailableFlag, CurrentPatentFlag, ActionType, Chemical_Type, Therapeutic_Potential, Orphan_Code) values (/g" > ~/public_html/application.sql.txt
The final output SQL command sequence can be found at http://my.cs.lmu.edu/~malverso/application.sql.txt . I copy and pasted this into the SQL editor to fill up my table.
Product.txt
- First I removed the first line, using sed command "1D".
 - Since there are single quotes in this .txt file, I decided to replace them with double quotes using the sed command "s/'/"/g", which didn't work for reasons which now seem obvious, so I changed my sed command to "s/'/\"/g" which worked.
 - Noting that there weren't any extra spaces, I got rid of the tabs by using the sed command "s/\t/','". I Then added the extra characters to the end of the line, using the sed command "s/\r$/');/g" which includes the ending of the SQL code.
 - I used the same sed command as before to insert the null values : "s//null/g".
 - Next I added the SQL command to the beginning.
 
Here is the final command sequence (with line breaks for readability):
cat Product.txt | sed "1D" | sed "s/'/\"/g" | sed "s/\t/','/g"| sed "s/\r$/');/g" 
| sed "s/''/null/g" | sed "s/^/insert into Product(ApplNo, ProductNo, Form, Dosage, ProductMktStatus,
 TECode, ReferenceDrug, drugname, activeingred) values ('/g" > ~/public_html/Product.sql.txt
The final output SQL command sequence can be found at http://my.cs.lmu.edu/~malverso/Product.sql.txt . I copy and pasted this into the SQL editor to fill up my table.
#3
Here are the commands that produce the number of records in each file:
select count(*) from Product select count(*) from application
Team Page
Assignments
- Week 1
 - Week 2
 - Week 3
 - Week 4
 - Week 5
 - Week 6
 - Week 7
 - Week 8
 - Week 9
 - Week 10
 - Week 11
 - Week 12
 - Week 13
 - Week 14
 - Week 15
 
Individual Journal Entries
- Malverso User Page (Week 1)
 - Week 2
 - Week 3
 - Week 4
 - Week 5
 - Week 6
 - Week 7
 - Week 8
 - Week 9
 - Week 10
 - Week 11
 - Week 12
 - Week 13
 - Week 14
 - Week 15