Difference between revisions of "Malverso Week 6"
From LMU BioDB 2015
(answered #2 for application.txt) |
(fixed error in column name) |
||
Line 38: | Line 38: | ||
cat application.txt | sed "1D" | sed "s/\t/','/g" | sed "s/ *'/'/g" | sed "s/^/'/g" | sed "s/\r$/');/g" | cat application.txt | sed "1D" | sed "s/\t/','/g" | sed "s/ *'/'/g" | sed "s/^/'/g" | sed "s/\r$/');/g" | ||
| sed "s/''/null/g" | sed "s/^/insert into application(ApplNo, ApplType, SponsorApplicant, | | sed "s/''/null/g" | sed "s/^/insert into application(ApplNo, ApplType, SponsorApplicant, | ||
− | + | MostRecentLabelAvailableFlag, CurrentPatentFlag, ActionType, Chemical_Type, Therapeutic_Potential, Orphan_Code) | |
values (/g" > ~/public_html/application.sql.txt | values (/g" > ~/public_html/application.sql.txt | ||
Revision as of 00:40, 15 October 2015
Contents
FDA Drug Database
Direct Download/Unzipping Commands
- I logged onto PuTTy and also brought up my public folder on the web browser at the web address my.cs.lmu.edu/~malverso/
- I downloaded and unzipped my files through the command line, using the curl and unzip commands.
- I put the untouched file onto my public web page so I could visually see my progress.
- My database is saved under the student account of PgAdminIII under the name FDA-malverso.
#1
application.txt
- I viewed the application.txt file using the more command to see the column titles, which are:
ApplNo ApplType SponsorApplicant MostRecentLabelAvailableFlag CurrentPatentFlag ActionType Chemical_Type Ther_Potential Orphan_Code
- I referenced the PostgreSQL Tutorial for how to create a table and http://www.fda.gov/Drugs/InformationOnDrugs/ucm079750.htm for primary key information. I then created a table using this command:
create table application (ApplNo varchar primary key, ApplType varchar, SponsorApplicant varchar, MostRecentLabelAvailableFlag boolean, CurrentPatentFlag boolean, ActionType varchar, Chemical_Type varchar, Therapeutic_Potential varchar, Orphan_Code varchar)
Product.txt
- I viewed the Product.txt file using the more command to see the column titles, which are:
ApplNo ProductNo Form Dosage ProductMktStatus TECode ReferenceDrug drugname activeingred
- I created a table for Product.txt using this command:
create table Product(ApplNo varchar, productNo varchar, Form varchar, Dosage varchar, ProductMktStatus int, TECode varchar, ReferenceDrug boolean, Drugname varchar, Activeingred varchar)
#2
application.txt
- I began by removing the first line, since that is just the column headers, using the sed command "1D".
- I then aimed to put single quotes around all of the values and get rid of the spaces and replace the tabs with commas, which I did with only two sed commands, "s/^/'/g" and "s/\t/','/g".
- I saw next that there was a lot of extra spaces after the values that were unnecessary. I tried to remove the extra spaces by using the sed command "s/( ){*}'/'/g", but that did nothing. I looked back at my assignment 4 to see that it was actually "s/ *'/'/g" that would remove all the extra spaces.
- I also saw that the end of every line was misisng a ' character. I tried a bunch of sed commands to try and add this on, such as "s/$/'/g" but the only command that ended up working was "s/,'/,/8" , which I found buy counting out the 8 commas that separated the 9 columns. I reread the supplementary information section to realize my previous sed commands did not work because the end of lines were formatted differently...so I decided to change my sed command to "s/\r$/'/g"
- Next I saw all of the empty fields and replaced the empty single quotes with the word null. Veronica informed me while I was doing this that null could not be surrounded by single quotes in order to be recognized accurately by SQL, so I used the sed command "s//null/g".
- I then added the appropriate SQL commands to the beginning of my line so that the text would insert into the table.
- I changed the previous sed command that added the ' to the end of the line to add '); to finish off the SQL command.
Here is my final sed command (with line breaks for readability):
cat application.txt | sed "1D" | sed "s/\t/','/g" | sed "s/ *'/'/g" | sed "s/^/'/g" | sed "s/\r$/');/g" | sed "s//null/g" | sed "s/^/insert into application(ApplNo, ApplType, SponsorApplicant, MostRecentLabelAvailableFlag, CurrentPatentFlag, ActionType, Chemical_Type, Therapeutic_Potential, Orphan_Code) values (/g" > ~/public_html/application.sql.txt
Team Page
Assignments
- Week 1
- Week 2
- Week 3
- Week 4
- Week 5
- Week 6
- Week 7
- Week 8
- Week 9
- Week 10
- Week 11
- Week 12
- Week 13
- Week 14
- Week 15
Individual Journal Entries
- Malverso User Page (Week 1)
- Week 2
- Week 3
- Week 4
- Week 5
- Week 6
- Week 7
- Week 8
- Week 9
- Week 10
- Week 11
- Week 12
- Week 13
- Week 14
- Week 15