Difference between revisions of "Week 6"

From LMU BioDB 2015
Jump to: navigation, search
(Additional Download/Processing Commands: Rename the section.)
(The FDA Drug Database: Eliminate a table.)
Line 43: Line 43:
  
 
# Download and uncompress the files.
 
# Download and uncompress the files.
# Define appropriate tables for the ''Application'', ''Product'', and ''ChemicalType_Lookup'' entities.
+
# Define appropriate tables for the ''Application'' and ''Product'' entities.
 
# Process the data files for these entities then load them into those tables.
 
# Process the data files for these entities then load them into those tables.
 
# Answer the questions below.
 
# Answer the questions below.

Revision as of 06:17, 6 October 2015

This journal entry is due on Tuesday, October 13, at midnight PDT. (Monday night/Tuesday morning)

This page is under construction.

Overview

The purpose of this assignment is:

  • To apply the technical knowledge and skills learned so far to defining and loading a relational database from scratch
  • To experience a real-world example of how raw data sets can be distributed and formatted
  • To get some practice with SQL queries

Individual Journal Assignment

  • Store this journal entry as "username Week 6" (i.e., this is the text to place between the square brackets when you link to this page).
  • Link from your user page to this Assignment page.
  • Link to your journal entry from your user page.
  • Link back from your journal entry to your user page.
  • Don't forget to add the "Journal Entry" category to the end of your wiki page.
    • Note: you can easily fulfill all of these links by adding them to your template and then using your template on your journal entry.
  • For your assignment this week, you will keep an electronic laboratory notebook on your individual journal entry page for this week. An electronic laboratory notebook records all the manipulations you perform on the data and the answers to the questions throughout the protocol. Like a paper lab notebook found in a wet lab, it should contain enough information so that you or someone else could reproduce what you did using only the information from the notebook.

Homework Partners

For this week, we return to homework partner pairs. The assignments are:

  • Mary Alverson, Erich Yanoschik
  • Nicole Anguiano, Mahrad Saeedi
  • Brandon Klein, Lena Olufson
  • Ronald Legaspi, Jake Woodlee
  • Brandon Litvak, Josh Kuroda
  • Veronica Pacheco, Kevin Wyllie
  • Trixie Roque, Emily Simso
  • Anu Varshneya, Kristin Zebrowski

The FDA Drug Database

The United States Food and Drug Administration (FDA) provides, as a matter of public record, the full data set for its approved drugs at this website: http://www.fda.gov/Drugs/InformationOnDrugs/ucm079750.htm

The site includes both a link to the downloadable files (compressed in .zip format) and the schema (“entity relationship diagram”) for those files.

Using these files, what you have learned about sed and SQL thus far, and additional information found in this wiki and on the aforementioned FDA website, do the following:

  1. Download and uncompress the files.
  2. Define appropriate tables for the Application and Product entities.
  3. Process the data files for these entities then load them into those tables.
  4. Answer the questions below.

Direct Download/Unzipping Commands

To mirror the procedures performed with the movie_titles.txt file, this section shows you how to (a) download and process the FDA files on the my.cs.lmu.edu server via PuTTY/ssh then (b) make your processed files available to the Seaver 120 workstations via the built-in my.cs.lmu.edu server. To do so, we introduce some additional commands that you may use verbatim at the my.cs.lmu.edu command line.

  • For these activities, do not go into the ~dondi/xmlpipedb/data folder—that’s mine! :) Instead, do your work on your home folder (i.e., where you are when you first login to my.cs.lmu.edu).
  • You can bypass the website and download the file directly to my.cs.lmu.edu with this command:
   curl -O http://www.fda.gov/downloads/Drugs/InformationOnDrugs/UCM054599.zip
(if the link above does not work, the FDA may have renamed the file; in that case, visit the website at the beginning of this section with a web browser, right click on the Drugs@FDA Download File link, and find a way to copy the linked address into your ssh session)
  • To unzip this file, use this command:
   unzip UCM054599.zip
(if the downloaded filename is different from the one shown, use that filename instead)
  • Upon successful unzipping, you should see the following files:
   $ ls
   AppDoc.txt             application.txt     DocType_lookup.txt  Product.txt        ReviewClass_Lookup.txt
   AppDocType_Lookup.txt  ChemTypeLookup.txt  Product_tecode.txt  RegActionDate.txt  UCM054599.zip
As stated in the tasks above, you will only need to work with the application.txt, Product.txt, and ChemTypeLookup.txt files.

Part of the assignment, of course, is to determine how to rework these files so that their data can be loaded into PostgreSQL via the pgAdmin III desktop application on the Seaver 120 workstations. Just as with the movie_titles.txt file, you do this by “redirecting” the processed data to your ~/public_html folder:

   cat application.txt | your command sequence here > ~/public_html/application.sql.txt
   cat Product.txt | your command sequence here > ~/public_html/Product.sql.txt
   cat ChemTypeLookup.txt | your command sequence here > ~/public_html/ChemTypeLookup.sql.txt

You can then download these files to the Seaver 120 desktop environment by visiting these sites with a web browser:

  • http://my.cs.lmu.edu/~username/application.sql.txt
  • http://my.cs.lmu.edu/~username/Product.sql.txt
  • http://my.cs.lmu.edu/~username/ChemTypeLookup.sql.txt

Again, note that these latter steps exactly mirror the tutorial for the movie_titles.txt file, so please refer to that wiki page for additional details.

Questions to Answer

Shared Journal Assignment

  • Store your journal entry in the shared Class Journal Week 6 page. If this page does not exist yet, go ahead and create it (congratulations on getting in first :) )
  • Link to your journal entry from your user page.
  • Link back from the journal entry to your user page.
    • NOTE: you can easily fulfill the links part of these instructions by adding them to your template and using the template on your user page.
  • Sign your portion of the journal with the standard wiki signature shortcut (~~~~).
  • Add the "Journal Entry" and "Shared" categories to the end of the wiki page (if someone has not already done so).