Difference between revisions of "Lenaolufson Week 6"

From LMU BioDB 2015
Jump to: navigation, search
(first part of the assignment)
 
(finished creating the commands for the application and product)
Line 29: Line 29:
 
*Now that my tables were set up, I needed to add in the appropriate data and I used sed command to do this, but I first realized that the beginning of the application.txt is not needed so I removed the first line. So, my command was this:
 
*Now that my tables were set up, I needed to add in the appropriate data and I used sed command to do this, but I first realized that the beginning of the application.txt is not needed so I removed the first line. So, my command was this:
 
  cat application.txt | sed "1D"
 
  cat application.txt | sed "1D"
*After being stuck in a state of confusion as to what my next move was because the screen I was viewing had a lot of what appeared to be unnecessary spaces in between randomly.
+
*After being stuck in a state of confusion as to what my next move was because the screen I was viewing had a lot of what appeared to be unnecessary spaces in between randomly. After talking it over with a classmate, I was able to see that some of these spaces were used with the data. So, I took out the spaces that were next to another space as those were not needed using:
 +
cat application.txt | sed "1D" | sed "s/  / /g"
 +
*I then needed to insert a null value into the tabs that were next to each other by:
 +
cat application.txt | sed "1D" | sed "s/  / /g" | sed "s/\t\t/\tnull\t/g" | sed "s/\t\t/\tnull\t/g"
 +
*Upon doing this, I soon realized that some of my rows lacked the 9th column they needed. I corrected this by doing a similar command, but this time using the carriage return. From asking a classmate as well as the discussion in class, I learned that it is a character that is basically invisible at the end of every line and it is recorded whenever the enter key is hit. The command was:
 +
cat application.txt | sed "1D" | sed "s/  / /g" | sed "s/\t\t/\tnull\t/g" | sed "s/\t\t/\tnull\t/g" | sed "s/\t\r/\tnull\r/g"
 +
*My next step was to replace the tabs with commas and insert apostrophes for the varchars using the long command line of:
 +
cat application.txt | sed "1D" | sed "s/ //g" | sed "s/\t\t/\tnull\t/g" | sed "s/\t\t/\tnull\t/g" | sed "s/\t\r/\tnull\r/g" | sed "s/\t/,'/1" | sed "s/\t/','/1" | sed "s/\t/','/1" | sed "s/\t/','/1" |
 +
sed  "s/\t/','/1" | sed "s/\t/',/1" | sed "s/\t/,'/1" | sed "s/\t/','/g" | sed "s/\r/'\r/g" | sed "s/'null'/null/g" | sed "s/^/insert into
 +
application(ApplNo,ApplType,SponsorApplicant,MostRecentLabelAvailableFlag,CurrentPatentFlag,ActionType,Chemical_Type,Ther_Potential,Orphan_Code) values(/g" | sed "s/\r/);\r/g"
 +
> ~/public_html/application.sql.txt
 +
 
 +
*I was able to then use the information I had learned from creating the application.txt to create the product.txt command line. It became apparent that the first line of the product.txt was useless as well, just like the application.txt, so I removed it. The final command I came up with for the product.txt is:
 +
cat Product.txt | sed "1D" | sed "s/\t\t/\tnull\t/g" | sed "s/\t\t/\tnull\t/g" | sed "s/\t\r/\tnull\r/g" | sed "s/'/''/g" | sed "s/\t/,/1" | sed "s/\t/,'/1" | sed "s/\t/','/1" | sed "s/\t/',/1" |
 +
sed  "s/\t/,'/1"| sed "s/\t/',/1" | sed "s/\t/,'/1" | sed "s/\t/','/1" | sed "s/\r/'\r/g" | sed "s/'null'/null/g" | sed "s/^/insert into product(ApplNo, ProductNo, Form, Dosage, ProductMktStatus,
 +
TECode,  ReferenceDrug, drugname, activeingred) values(/g" | sed "s/\r/);\r/g" > ~/public_html/product.sql.txt

Revision as of 04:09, 15 October 2015

The FDA Drug Database

Direct Download/Unzipping Commands

  • First, I logged on with my username and password and then entered the class folder and created a directory for this assignment, and then entered the directory:
ssh eolufson@my.cs.lmu
cdbiodb
mdir week6
cd week6
  • Next, I downloaded the file from the website using:
curl -O http://www.fda.gov/downloads/Drugs/InformationOnDrugs/UCM054599.zip
  • Then I unzipped the file (using my class resources):
unzip UCM054599.zip
  • I then moved the product.txt and the application.txt files into my public folder, but first created a new folder for them that would help me stay organized:
cd ~/public_html
mkdir biodb
mv application.txt ~/public_html/biodb
mv product.txt ~/public_html/biodb
  • Next I was able to determine the variable for the tables by looking at the top columns:
    • application.txt contained 9 variables: ApplNo, ApplType, SponsorApplicant, MostRecentLabelAvailableFlag, CurrentPatentFlag, ActionType, Chemical_Type, Ther_Potential, and Orphan_Code; and the data types are as follows (respectively): int primary key, varchar, varchar, boolean, boolean, varchar, int, varchar, and varchar.
    • product.txt contained 9 variables: ApplNo, ProductNo, Form, Dosage, ProductMktStatus, TECode, ReferenceDrug, drugname, activeingred; and the data types are as follows (respectively): int, int, varchar, varchar, varchar, varchar, int, varchar, and varchar.

Creating the Database

  • Using pgAdmin as well as the SQL editor, I was able to create the tables needed for the data:
create table application (ApplNo int primary key, ApplType char, SponsorApplicant varchar, MostRecentLabelAvailableFlag boolean, 
CurrentPatentFlag boolean, ActionType varchar, ChemicalType int, Ther_Potential char, Orphan_Code char)
create table product (ApplNo int references application, ProductNo int, Form varchar, Dosage varchar, ProductMktStatus int, 
TECode varchar, ReferenceDrug int, drugname varchar, activeingred varchar)
  • Now that my tables were set up, I needed to add in the appropriate data and I used sed command to do this, but I first realized that the beginning of the application.txt is not needed so I removed the first line. So, my command was this:
cat application.txt | sed "1D"
  • After being stuck in a state of confusion as to what my next move was because the screen I was viewing had a lot of what appeared to be unnecessary spaces in between randomly. After talking it over with a classmate, I was able to see that some of these spaces were used with the data. So, I took out the spaces that were next to another space as those were not needed using:
cat application.txt | sed "1D" | sed "s/  / /g"
  • I then needed to insert a null value into the tabs that were next to each other by:
cat application.txt | sed "1D" | sed "s/  / /g" | sed "s/\t\t/\tnull\t/g" | sed "s/\t\t/\tnull\t/g"
  • Upon doing this, I soon realized that some of my rows lacked the 9th column they needed. I corrected this by doing a similar command, but this time using the carriage return. From asking a classmate as well as the discussion in class, I learned that it is a character that is basically invisible at the end of every line and it is recorded whenever the enter key is hit. The command was:
cat application.txt | sed "1D" | sed "s/  / /g" | sed "s/\t\t/\tnull\t/g" | sed "s/\t\t/\tnull\t/g" | sed "s/\t\r/\tnull\r/g"
  • My next step was to replace the tabs with commas and insert apostrophes for the varchars using the long command line of:
cat application.txt | sed "1D" | sed "s/ //g" | sed "s/\t\t/\tnull\t/g" | sed "s/\t\t/\tnull\t/g" | sed "s/\t\r/\tnull\r/g" | sed "s/\t/,'/1" | sed "s/\t/','/1" | sed "s/\t/','/1" | sed "s/\t/','/1" | 
sed   "s/\t/','/1" | sed "s/\t/',/1" | sed "s/\t/,'/1" | sed "s/\t/','/g" | sed "s/\r/'\r/g" | sed "s/'null'/null/g" | sed "s/^/insert into
application(ApplNo,ApplType,SponsorApplicant,MostRecentLabelAvailableFlag,CurrentPatentFlag,ActionType,Chemical_Type,Ther_Potential,Orphan_Code) values(/g" | sed "s/\r/);\r/g" 
> ~/public_html/application.sql.txt
  • I was able to then use the information I had learned from creating the application.txt to create the product.txt command line. It became apparent that the first line of the product.txt was useless as well, just like the application.txt, so I removed it. The final command I came up with for the product.txt is:
cat Product.txt | sed "1D" | sed "s/\t\t/\tnull\t/g" | sed "s/\t\t/\tnull\t/g" | sed "s/\t\r/\tnull\r/g" | sed "s/'//g" | sed "s/\t/,/1" | sed "s/\t/,'/1" | sed "s/\t/','/1" | sed "s/\t/',/1" | 
sed  "s/\t/,'/1"| sed "s/\t/',/1" | sed "s/\t/,'/1" | sed "s/\t/','/1" | sed "s/\r/'\r/g" | sed "s/'null'/null/g" | sed "s/^/insert into product(ApplNo, ProductNo, Form, Dosage, ProductMktStatus,
TECode,  ReferenceDrug, drugname, activeingred) values(/g" | sed "s/\r/);\r/g" > ~/public_html/product.sql.txt