Difference between revisions of "Malverso Week 6"
(fixed answer to #3, answered #4) |
(answered #5) |
||
Line 70: | Line 70: | ||
wc | wc | ||
+ | |||
+ | The correct number is the first one, because it is the number of lines in the file which is equal to the number of records. | ||
===#4=== | ===#4=== | ||
Line 77: | Line 79: | ||
select count(*) from Product | select count(*) from Product | ||
select count(*) from application | select count(*) from application | ||
+ | |||
+ | ===#5=== | ||
+ | |||
+ | These numbers are the same because the number of lines is equal to the number of insert statements, each statement inserting one record into the table. | ||
+ | |||
Revision as of 01:47, 15 October 2015
Contents
FDA Drug Database
Direct Download/Unzipping Commands
- I logged onto PuTTy and also brought up my public folder on the web browser at the web address my.cs.lmu.edu/~malverso/
- I downloaded and unzipped my files through the command line, using the curl and unzip commands.
- I put the untouched file onto my public web page so I could visually see my progress.
- My database is saved under the student account of PgAdminIII under the name FDA-malverso.
#1
application.txt
- I viewed the application.txt file using the more command to see the column titles, which are:
ApplNo ApplType SponsorApplicant MostRecentLabelAvailableFlag CurrentPatentFlag ActionType Chemical_Type Ther_Potential Orphan_Code
- I referenced the PostgreSQL Tutorial for how to create a table and http://www.fda.gov/Drugs/InformationOnDrugs/ucm079750.htm for primary key information. I then created a table using this command:
create table application (ApplNo varchar primary key, ApplType varchar, SponsorApplicant varchar, MostRecentLabelAvailableFlag boolean, CurrentPatentFlag boolean, ActionType varchar, Chemical_Type varchar, Therapeutic_Potential varchar, Orphan_Code varchar)
Product.txt
- I viewed the Product.txt file using the more command to see the column titles, which are:
ApplNo ProductNo Form Dosage ProductMktStatus TECode ReferenceDrug drugname activeingred
- I created a table for Product.txt using this command:
create table Product(ApplNo varchar, productNo varchar, Form varchar, Dosage varchar, ProductMktStatus int, TECode varchar, ReferenceDrug int, Drugname varchar, Activeingred varchar)
- I couldn't remeber how to declare a foreign key when I was making the table, and ended up adding it afterwards in with the help of the tutorial found on http://www.w3schools.com/sql/sql_foreignkey.asp .
Here are the additional SQL commands I used:
ALTER TABLE Product ADD FOREIGN KEY (ApplNo) REFERENCES application(ApplNo)
#2
application.txt
- I began by removing the first line, since that is just the column headers, using the sed command "1D".
- I then aimed to put single quotes around all of the values and get rid of the spaces and replace the tabs with commas, which I did with only two sed commands, "s/^/'/g" and "s/\t/','/g".
- I saw next that there was a lot of extra spaces after the values that were unnecessary. I tried to remove the extra spaces by using the sed command "s/( ){*}'/'/g", but that did nothing. I looked back at my assignment 4 to see that it was actually "s/ *'/'/g" that would remove all the extra spaces.
- I also saw that the end of every line was misisng a ' character. I tried a bunch of sed commands to try and add this on, such as "s/$/'/g" but the only command that ended up working was "s/,'/,/8" , which I found buy counting out the 8 commas that separated the 9 columns. I reread the supplementary information section to realize my previous sed commands did not work because the end of lines were formatted differently...so I decided to change my sed command to "s/\r$/'/g"
- Next I saw all of the empty fields and replaced the empty single quotes with the word null. Veronica informed me while I was doing this that null could not be surrounded by single quotes in order to be recognized accurately by SQL, so I used the sed command "s//null/g".
- I then added the appropriate SQL commands to the beginning of my line so that the text would insert into the table.
- I changed the previous sed command that added the ' to the end of the line to add '); to finish off the SQL command.
Here is my final command sequence (with line breaks for readability):
cat application.txt | sed "1D" | sed "s/\t/','/g" | sed "s/ *'/'/g" | sed "s/^/'/g" | sed "s/\r$/');/g" | sed "s/''/null/g" | sed "s/^/insert into application(ApplNo, ApplType, SponsorApplicant, MostRecentLabelAvailableFlag, CurrentPatentFlag, ActionType, Chemical_Type, Therapeutic_Potential, Orphan_Code) values (/g" > ~/public_html/application.sql.txt
The final output SQL command sequence can be found at http://my.cs.lmu.edu/~malverso/application.sql.txt . I copy and pasted this into the SQL editor to fill up my table.
Product.txt
- First I removed the first line, using sed command "1D".
- Since there are single quotes in this .txt file, I decided to replace them with double quotes using the sed command "s/'/"/g", which didn't work for reasons which now seem obvious, so I changed my sed command to "s/'/\"/g" which worked.
- Noting that there weren't any extra spaces, I got rid of the tabs by using the sed command "s/\t/','". I Then added the extra characters to the end of the line, using the sed command "s/\r$/');/g" which includes the ending of the SQL code.
- I used the same sed command as before to insert the null values : "s//null/g".
- Next I added the SQL command to the beginning.
Here is the final command sequence (with line breaks for readability):
cat Product.txt | sed "1D" | sed "s/'/\"/g" | sed "s/\t/','/g"| sed "s/\r$/');/g" | sed "s/''/null/g" | sed "s/^/insert into Product(ApplNo, ProductNo, Form, Dosage, ProductMktStatus, TECode, ReferenceDrug, drugname, activeingred) values ('/g" > ~/public_html/Product.sql.txt
The final output SQL command sequence can be found at http://my.cs.lmu.edu/~malverso/Product.sql.txt . I copy and pasted this into the SQL editor to fill up my table.
#3
Here is the command that produces the number of records in each file on the command line (if added to the end of my command sequence found in the answer to #2, after removing the > ~/public_html/Product.sql.txt or > ~/public_html/application.sql.txt):
wc
The correct number is the first one, because it is the number of lines in the file which is equal to the number of records.
#4
Here are the SQL commands that produce the number of records in each file:
select count(*) from Product select count(*) from application
#5
These numbers are the same because the number of lines is equal to the number of insert statements, each statement inserting one record into the table.
Team Page
Assignments
- Week 1
- Week 2
- Week 3
- Week 4
- Week 5
- Week 6
- Week 7
- Week 8
- Week 9
- Week 10
- Week 11
- Week 12
- Week 13
- Week 14
- Week 15
Individual Journal Entries
- Malverso User Page (Week 1)
- Week 2
- Week 3
- Week 4
- Week 5
- Week 6
- Week 7
- Week 8
- Week 9
- Week 10
- Week 11
- Week 12
- Week 13
- Week 14
- Week 15