Difference between revisions of "Blitvak Week 6"
From LMU BioDB 2015
(another edit for first section) |
(some more details into section involving application.txt) |
||
Line 6: | Line 6: | ||
*I spent some time reviewing [[PostgreSQL Tutorial | the PostgreSQL Tutorial]] | *I spent some time reviewing [[PostgreSQL Tutorial | the PostgreSQL Tutorial]] | ||
*I booted up PuTTy and unzipped the files into my home folder | *I booted up PuTTy and unzipped the files into my home folder | ||
+ | |||
+ | ===Working with ''application.txt''=== | ||
*I opened up and reviewed the ''application.txt'' and ''Product.txt'' files using <code>more <filename.txt></code>; I found that the data column labels for ''application.txt'' are: | *I opened up and reviewed the ''application.txt'' and ''Product.txt'' files using <code>more <filename.txt></code>; I found that the data column labels for ''application.txt'' are: | ||
ApplNo ApplType SponsorApplicant MostRecentLabelAvailableFlag CurrentPatentFlag ActionType Chemical_Type Ther_Potential Orphan_Code | ApplNo ApplType SponsorApplicant MostRecentLabelAvailableFlag CurrentPatentFlag ActionType Chemical_Type Ther_Potential Orphan_Code | ||
− | + | *Reviewing the actual data, and with PostgreSQL in mind, I found that the variable type for each column should be: | |
+ | ApplNo: ''int(primary key)'' ApplType: ''varchar'' SponsorApplicant: ''varchar'' MostRecentLabelAvailableFlag: ''boolean'' CurrentPatentFlag: ''boolean'' ActionType: ''varchar'' Chemical_Type: ''int'' | ||
+ | Ther_Potential: ''varchar'' Orphan_Code: ''varchar'' | ||
+ | *I realized that any empty data spaces in ''application.txt'' will have to be turned into <code>null</code> | ||
+ | *I realized that <code>sed "1D"</code> will have to be executed in order to remove the first row (which is column labeling) | ||
+ | |||
+ | For ''Product.txt'', the column labels were: | ||
Revision as of 20:42, 11 October 2015
Contents
- 1 Individual Journal Assignment Week 6
- 1.1 Downloading and Decompressing Data Files, Other Assignment Preparation
- 1.2 Working with application.txt
- 1.3 Defining the appropriate tables for the Application and Product entities
- 1.4 Process the data files for these entities then load them into those tables
- 1.5 Questions Regarding Database Creation
Individual Journal Assignment Week 6
Downloading and Decompressing Data Files, Other Assignment Preparation
- Looking over the Week 6 Assignment Page, I found that
curl -O http://www.fda.gov/downloads/Drugs/InformationOnDrugs/UCM054599.zip
can be used, while I am in the /nfs/home/blitvak directory, to place a .zip containing the data files directly into my personal folder. I found thatunzip UCM054599.zip
unzips the files into the personal folder. - I also downloaded and installed pgAdmin III from http://www.pgadmin.org/
- I spent some time reviewing the PostgreSQL Tutorial
- I booted up PuTTy and unzipped the files into my home folder
Working with application.txt
- I opened up and reviewed the application.txt and Product.txt files using
more <filename.txt>
; I found that the data column labels for application.txt are:
ApplNo ApplType SponsorApplicant MostRecentLabelAvailableFlag CurrentPatentFlag ActionType Chemical_Type Ther_Potential Orphan_Code
- Reviewing the actual data, and with PostgreSQL in mind, I found that the variable type for each column should be:
ApplNo: int(primary key) ApplType: varchar SponsorApplicant: varchar MostRecentLabelAvailableFlag: boolean CurrentPatentFlag: boolean ActionType: varchar Chemical_Type: int Ther_Potential: varchar Orphan_Code: varchar
- I realized that any empty data spaces in application.txt will have to be turned into
null
- I realized that
sed "1D"
will have to be executed in order to remove the first row (which is column labeling)
For Product.txt, the column labels were: