Blitvak Week 6
From LMU BioDB 2015
								Revision as of 20:42, 11 October 2015 by Blitvak (Talk | contribs) (some more details into section involving application.txt)
Contents
- 1 Individual Journal Assignment Week 6
- 1.1 Downloading and Decompressing Data Files, Other Assignment Preparation
 - 1.2 Working with application.txt
 - 1.3 Defining the appropriate tables for the Application and Product entities
 - 1.4 Process the data files for these entities then load them into those tables
 - 1.5 Questions Regarding Database Creation
 
 
Individual Journal Assignment Week 6
Downloading and Decompressing Data Files, Other Assignment Preparation
- Looking over the  Week 6 Assignment Page, I found that 
curl -O http://www.fda.gov/downloads/Drugs/InformationOnDrugs/UCM054599.zipcan be used, while I am in the /nfs/home/blitvak directory, to place a .zip containing the data files directly into my personal folder. I found thatunzip UCM054599.zipunzips the files into the personal folder. - I also downloaded and installed pgAdmin III from http://www.pgadmin.org/
 - I spent some time reviewing the PostgreSQL Tutorial
 - I booted up PuTTy and unzipped the files into my home folder
 
Working with application.txt
- I opened up and reviewed the application.txt and Product.txt files using 
more <filename.txt>; I found that the data column labels for application.txt are: 
ApplNo ApplType SponsorApplicant MostRecentLabelAvailableFlag CurrentPatentFlag ActionType Chemical_Type Ther_Potential Orphan_Code
- Reviewing the actual data, and with PostgreSQL in mind, I found that the variable type for each column should be:
 
ApplNo: int(primary key) ApplType: varchar SponsorApplicant: varchar MostRecentLabelAvailableFlag: boolean CurrentPatentFlag: boolean ActionType: varchar Chemical_Type: int Ther_Potential: varchar Orphan_Code: varchar
- I realized that any empty data spaces in application.txt will have to be turned into 
null - I realized that 
sed "1D"will have to be executed in order to remove the first row (which is column labeling) 
For Product.txt, the column labels were: