Difference between revisions of "Blitvak Week 6"

Revision as of 20:42, 11 October 2015

Individual Journal Assignment Week 6

Downloading and Decompressing Data Files, Other Assignment Preparation

Looking over the Week 6 Assignment Page, I found that curl -O http://www.fda.gov/downloads/Drugs/InformationOnDrugs/UCM054599.zip can be used, while I am in the /nfs/home/blitvak directory, to place a .zip containing the data files directly into my personal folder. I found that unzip UCM054599.zip unzips the files into the personal folder.
I also downloaded and installed pgAdmin III from http://www.pgadmin.org/
I spent some time reviewing the PostgreSQL Tutorial
I booted up PuTTy and unzipped the files into my home folder

Working with application.txt

I opened up and reviewed the application.txt and Product.txt files using more <filename.txt>; I found that the data column labels for application.txt are:

ApplNo  ApplType  SponsorApplicant  MostRecentLabelAvailableFlag  CurrentPatentFlag  ActionType  Chemical_Type  Ther_Potential  Orphan_Code

Reviewing the actual data, and with PostgreSQL in mind, I found that the variable type for each column should be:

ApplNo: int(primary key)  ApplType: varchar  SponsorApplicant: varchar  MostRecentLabelAvailableFlag: boolean  CurrentPatentFlag: boolean  ActionType: varchar  Chemical_Type: int 
Ther_Potential: varchar  Orphan_Code: varchar

I realized that any empty data spaces in application.txt will have to be turned into null
I realized that sed "1D" will have to be executed in order to remove the first row (which is column labeling)

For Product.txt, the column labels were:

@@ Line 6: / Line 6: @@
 *I spent some time reviewing [[PostgreSQL Tutorial | the PostgreSQL Tutorial]]
 *I booted up PuTTy and unzipped the files into my home folder
+===Working with ''application.txt''===
 *I opened up and reviewed the ''application.txt'' and ''Product.txt'' files using <code>more <filename.txt></code>; I found that the data column labels for ''application.txt'' are:
   ApplNo  ApplType  SponsorApplicant  MostRecentLabelAvailableFlag  CurrentPatentFlag  ActionType  Chemical_Type  Ther_Potential  Orphan_Code
-For ''Product.txt'', the column labels were:
+*Reviewing the actual data, and with PostgreSQL in mind, I found that the variable type for each column should be:
+ ApplNo: ''int(primary key)''  ApplType: ''varchar''  SponsorApplicant: ''varchar''  MostRecentLabelAvailableFlag: ''boolean''  CurrentPatentFlag: ''boolean''  ActionType: ''varchar''  Chemical_Type: ''int''
+ Ther_Potential: ''varchar''  Orphan_Code: ''varchar''
+*I realized that any empty data spaces in ''application.txt'' will have to be turned into <code>null</code>
+*I realized that <code>sed "1D"</code> will have to be executed in order to remove the first row (which is column labeling)
+For ''Product.txt'', the column labels were:

Difference between revisions of "Blitvak Week 6"

Revision as of 20:42, 11 October 2015

Contents

Individual Journal Assignment Week 6

Downloading and Decompressing Data Files, Other Assignment Preparation

Working with application.txt

Defining the appropriate tables for the Application and Product entities

Process the data files for these entities then load them into those tables

Questions Regarding Database Creation

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools