Difference between revisions of "Blitvak Week 6"

From LMU BioDB 2015
Jump to: navigation, search
(another edit for first section)
(some more details into section involving application.txt)
Line 6: Line 6:
 
*I spent some time reviewing [[PostgreSQL Tutorial | the PostgreSQL Tutorial]]
 
*I spent some time reviewing [[PostgreSQL Tutorial | the PostgreSQL Tutorial]]
 
*I booted up PuTTy and unzipped the files into my home folder
 
*I booted up PuTTy and unzipped the files into my home folder
 +
 +
===Working with ''application.txt''===
 
*I opened up and reviewed the ''application.txt'' and ''Product.txt'' files using <code>more <filename.txt></code>; I found that the data column labels for ''application.txt'' are:
 
*I opened up and reviewed the ''application.txt'' and ''Product.txt'' files using <code>more <filename.txt></code>; I found that the data column labels for ''application.txt'' are:
 
  ApplNo  ApplType  SponsorApplicant  MostRecentLabelAvailableFlag  CurrentPatentFlag  ActionType  Chemical_Type  Ther_Potential  Orphan_Code
 
  ApplNo  ApplType  SponsorApplicant  MostRecentLabelAvailableFlag  CurrentPatentFlag  ActionType  Chemical_Type  Ther_Potential  Orphan_Code
For ''Product.txt'', the column labels were:
+
*Reviewing the actual data, and with PostgreSQL in mind, I found that the variable type for each column should be:
 +
ApplNo: ''int(primary key)''  ApplType: ''varchar''  SponsorApplicant: ''varchar''  MostRecentLabelAvailableFlag: ''boolean''  CurrentPatentFlag: ''boolean''  ActionType: ''varchar''  Chemical_Type: ''int''
 +
Ther_Potential: ''varchar''  Orphan_Code: ''varchar''         
 +
*I realized that any empty data spaces in ''application.txt'' will have to be turned into <code>null</code>
 +
*I realized that <code>sed "1D"</code> will have to be executed in order to remove the first row (which is column labeling)
  
 +
 +
For ''Product.txt'', the column labels were:
  
  

Revision as of 20:42, 11 October 2015

Individual Journal Assignment Week 6

Downloading and Decompressing Data Files, Other Assignment Preparation

Working with application.txt

  • I opened up and reviewed the application.txt and Product.txt files using more <filename.txt>; I found that the data column labels for application.txt are:
ApplNo  ApplType  SponsorApplicant  MostRecentLabelAvailableFlag  CurrentPatentFlag  ActionType  Chemical_Type  Ther_Potential  Orphan_Code
  • Reviewing the actual data, and with PostgreSQL in mind, I found that the variable type for each column should be:
ApplNo: int(primary key)  ApplType: varchar  SponsorApplicant: varchar  MostRecentLabelAvailableFlag: boolean  CurrentPatentFlag: boolean  ActionType: varchar  Chemical_Type: int 
Ther_Potential: varchar  Orphan_Code: varchar          
  • I realized that any empty data spaces in application.txt will have to be turned into null
  • I realized that sed "1D" will have to be executed in order to remove the first row (which is column labeling)


For Product.txt, the column labels were:


Defining the appropriate tables for the Application and Product entities

Process the data files for these entities then load them into those tables

Questions Regarding Database Creation