Blitvak Week 6

From LMU BioDB 2015
Revision as of 20:42, 11 October 2015 by Blitvak (Talk | contribs) (some more details into section involving application.txt)

Jump to: navigation, search

Individual Journal Assignment Week 6

Downloading and Decompressing Data Files, Other Assignment Preparation

Working with application.txt

  • I opened up and reviewed the application.txt and Product.txt files using more <filename.txt>; I found that the data column labels for application.txt are:
ApplNo  ApplType  SponsorApplicant  MostRecentLabelAvailableFlag  CurrentPatentFlag  ActionType  Chemical_Type  Ther_Potential  Orphan_Code
  • Reviewing the actual data, and with PostgreSQL in mind, I found that the variable type for each column should be:
ApplNo: int(primary key)  ApplType: varchar  SponsorApplicant: varchar  MostRecentLabelAvailableFlag: boolean  CurrentPatentFlag: boolean  ActionType: varchar  Chemical_Type: int 
Ther_Potential: varchar  Orphan_Code: varchar          
  • I realized that any empty data spaces in application.txt will have to be turned into null
  • I realized that sed "1D" will have to be executed in order to remove the first row (which is column labeling)


For Product.txt, the column labels were:


Defining the appropriate tables for the Application and Product entities

Process the data files for these entities then load them into those tables

Questions Regarding Database Creation