Difference between revisions of "Week 6"

From LMU BioDB 2013
Jump to: navigation, search
(Movies from Text File to Tables: Reinstate (some) join questions.)
(Eliminate GOA and XML exercises.)
Line 38: Line 38:
 
#* Provide the SQL query "translation" of your plain English query.
 
#* Provide the SQL query "translation" of your plain English query.
 
#* Copy/paste the answer to your query, according to the sample movie database.
 
#* Copy/paste the answer to your query, according to the sample movie database.
 
=== Back to ''sed'' and Biology ===
 
 
Now, let’s apply all of this to biological data.  [http://www.ebi.ac.uk/integr8 Integr8] is a query interface for the UniProt biological database.  Available from this database is a download for ''GO (Gene Ontology) annotations'', or GOA for short, whose format is documented [http://www.geneontology.org/GO.format.annotation.shtml here].
 
 
The GOA file for ''V. cholerae'' has been downloaded to ''~xmlpipedb/data'' for you (its exact filename is ''46.V_cholerae_ATCC_39315.goa'').  Using the documentation in http://www.geneontology.org/GO.format.annotation.shtml, do the following:
 
# Specify the ''create table'' command that defines a table equivalent for GOA data, and create such a table in Kerfuffle.  Call the table '''''yourKeckUsername'''_goa''.  Some notes:
 
#* Define all fields as ''varchar''
 
#* Include all fields, including those described as “optional”
 
#* Make all field/column names completely lowercase
 
#* When a field name has a space or a colon, substitute them with an underscore ('''_''') (e.g., ''GO ID'' becomes ''go_id''; ''DB:Reference'' becomes ''db_reference'')
 
#* Treat the multivalued fields (i.e., the ones separated by '''|''' within the text) as a single value
 
#* Don’t worry about defining a primary key
 
# What series of '''sed''' commands will transform the ''46.V_cholerae_ATCC_39315.goa'' file into a sequence of ''insert'' commands that successfully load the contents of that file into the ''goa'' table in your private database?  Some notes:
 
#* Use '''\t''' in your '''sed''' commands to represent the ''tab'' characters that separate the fields of the GOA file.
 
#* When a field is described as “optional,” that simply means there may be no value for it.  However, there will still be ''tab'' characters before and after it.
 
#* Since a single GOA records has 15 fields, there will be a lot of typing involved.  However, this is solely due to the number of fields; conceptually, this exercise is the same as creating a table with just two fields.
 
 
=== An “Xploratory” Question ===
 
 
Finally, think about and answer this question (no actual commands involved, for now): Importing the ''.goa'' file (and some others) is fairly natural, since it is already tabular by nature.  Think about and describe an approach that you might take if asked to import an ''XML'' file, which, we have seen, is structured more like an outline, into a relational database.  Write your answer in terms of how the relational tables would be defined, what fields would comprise primary and foreign keys, and what might need to be done to the information in the XML file.
 
  
 
{{Shared Journal Instructions|week=6}}
 
{{Shared Journal Instructions|week=6}}

Revision as of 03:45, 26 September 2013

Under Construction

The content in this page has not been finalized and is still subject to change. Use the current information at your own risk.

This journal entry is due on Friday, October 4, at midnight PDT. (Thursday night/Friday morning)

Individual Journal Assignment

  • Store this journal entry as "username Week 6" (i.e., this is the text to place between the square brackets when you link to this page).
  • Link from your user page to this Assignment page.
  • Link to your journal entry from your user page.
  • Link back from your journal entry to your user page.
  • Don't forget to add the "Journal Entry" category to the end of your wiki page.
    • Note: you can easily fulfill all of these links by adding them to your template and then using your template on your journal entry.

Buddy System

For this exercise, you have been assigned to a “journal buddy;” you should work on the exercises together.

  • Plan at least one “live” work session, where you and your journal buddy can interactively talk, think, and work through the exercises.
  • While you may work with your buddy to figure out solutions to the exercises, you should still answer the reflection questions and write up your journal pages individually. In particular, any explanations, issues, or “think-aloud” comments should be in your own words.

This week’s groupings are:

  • Hilda—Lena
  • Kurt—Katrina
  • Viktoria—Mitchell
  • Gabriel—Stephen—Lauren
  • Miles—Kevin McGee
  • Dillon—Kevin Meilak
  • Tauras—Alina

Movies from Text File to Tables

Follow the instructions in the PostgreSQL Tutorial to load up your own copy of the sample movie database. To get "warmed up," complete the tutorial so that you get some query practice before working on the questions below. Note that the member and rating tables were deliberately kept small so that it would be possible to check your work by hand.

Provide both the answer and the SQL query used to determine that answer:

  1. What movies were released before 1915?
  2. What movies from the 1980s had 4-digit numbers in their titles?
  3. What movies have apostrophes in their titles?
  4. How many movies have titles that begin with the word “Star”?
  5. How many movies, on a year by year basis, were released in the 1940s (years 1940 to 1949)? (don’t answer this question by listing the movies then counting the results!)
  6. Which movies, in alphabetical order, did member number 42 rate as a “5”?
  7. Produce a table that lists every movie that has been rated, the rating that the movie got, and the name of the member that gave this rating.
  8. How many movies has member number 42 rated as 1, 2, 3, 4, and 5, respectively?
  9. What is the average rating of movies that begin with the word “Star”?
  10. Form your own query involving movies, ratings, and members, then
    • State your query in plain English (like the queries above).
    • Provide the SQL query "translation" of your plain English query.
    • Copy/paste the answer to your query, according to the sample movie database.

Shared Journal Assignment

  • Store your journal entry in the shared Class Journal Week 6 page. If this page does not exist yet, go ahead and create it (congratulations on getting in first :) )
  • Link to your journal entry from your user page.
  • Link back from the journal entry to your user page.
    • NOTE: you can easily fulfill the links part of these instructions by adding them to your template and using the template on your user page.
  • Sign your portion of the journal with the standard wiki signature shortcut (~~~~).
  • Add the "Journal Entry" and "Shared" categories to the end of the wiki page (if someone has not already done so).

Reflect

After completing the both exercises, answer the following questions on the shared Class Journal Week 6 page:

  1. Has your view of digital data changed since the beginning of the semester? Why or why not?
  2. How confident do you feel about exploring and manipulating data, in various forms, after the past few weeks’ assignments?
  3. Consider movie query #7: if you looked at all of the movies that a particular member has rated from 1 to 5, do you think you can get a general idea of that member’s likes and dislikes? Do you see any issues with having this kind of information available to companies?
  4. Do you think it would have been easier to answer the movie questions using the plain text files in the ~xmlpipedb/data folder coupled with grep and wc? Why or why not?
Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox