Difference between revisions of "Week 8"

From LMU BioDB 2013
Jump to: navigation, search
(Reflection: added space character)
(clarification about electronic lab notebook)
 
(15 intermediate revisions by one user not shown)
Line 1: Line 1:
{{Under Construction}}
+
'''This journal entry is due on Friday, October 18, at midnight PDT.''' ''(Thursday night/Friday morning)''  Note that there is an interim deadline for uploading your files from part 1 by midnight, Monday, October 14. ''(Sunday night/Monday morning)''
  
'''This journal entry is due on Friday, October 18, at midnight PDT.''' ''(Thursday night/Friday morning)''
+
For the next section of the course, you will be introduced to the process we will use for the final projects in the course in a series of in-class and journal assignments where we will first analyze microarray data from ''Vibrio cholerae'', and then learn how to create a Gene Database for this organism.
 
+
== Analysis of ''Vibrio cholerae'' Microarray Data Part 1 ==
+
 
+
* For the next section of the course, you will be introduced to the process we will use for the final projects in the course in a series of in-class and journal assignments where we will first analyze microarray data from ''Vibrio cholerae'', and then learn how to create a Gene Database for this organism.
+
* The detailed instructions for the microarray data analysis we will carry out can be found on the [http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae Sample Microarray Analysis for ''Vibrio cholerae'' page].
+
  
 
{{Individual Journal Instructions|week=8}}
 
{{Individual Journal Instructions|week=8}}
  
* Keep an "electronic lab notebook", containing your methods, results, and interpretations of this week's portion of the [http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae Sample Microarray Analysis for ''Vibrio cholerae'' page] in your "''username'' Week 8" journal page. Although you will have assigned partner(s), you will need to fill out your own individual journal page.
+
* Keep an "electronic lab notebook", containing your methods, results, and interpretations of the ''Vibrio cholerae'' microarray analysis [http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae part 1] and [http://www.openwetware.org/wiki/BIOL367/F10:GenMAPP_and_MAPPFinder_Protocols part 2] in your "''username'' Week 8" journal page. Although you will have assigned partner(s), you will need to fill out your own individual journal page.
 +
** Your electronic notebook should contain enough information such that you or someone else could reproduce what you did given only the information on your page.
 +
** You should use screenshots and hyperlinks as appropriate.
 
** Be sure to answer any questions embedded in the protocol in your journal page.
 
** Be sure to answer any questions embedded in the protocol in your journal page.
* Upload your completed spreadsheet (both the .xls and .txt versions) to this wiki and link to them on your individual journal page.
+
* Upload the requested files from [http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae part 1] and [http://www.openwetware.org/wiki/BIOL367/F10:GenMAPP_and_MAPPFinder_Protocols part 2] to this wiki and link to them on your individual journal page.
 +
** '''''IMPORTANT''''' upload your completed spreadsheet (both the .xls and .txt versions) from part 1 by the interim deadline of midnight, Monday, October 14 ''(Sunday night/Monday morning)'' so that Dr. Dahlquist can check them before moving on to part 2 of the exercise.  She will not be assigning grades at this point; you will have the chance to make corrections, if necessary, before completing part 2.
 +
 
 +
=== Reading ===
 +
 
 +
* [http://www.nature.com/nature/journal/v417/n6889/full/nature00778.html Merrell, D.S., Butler, S.M., Qadri, F., Dolganov, N.A., Alam, A., Cohen, M.B., Calderwood, S.B., Schoolnik, G.K., and Camilli, A. (2002) Host-induced epidemic spread of the cholera bacterium.  ''Nature'' 417: 642-645.]
 +
* [http://www.nature.com/ng/journal/v25/n1/full/ng0500_25.html Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G.(2000) Gene Ontology: tool for the unification of biology.  ''Nature Genetics''  25: 25-29.]
 +
*  [http://genomebiology.com/content/4/1/R7 Doniger, S.W., Salomonis, N., Dahlquist, K.D., Vranizan, K., Lawlor, S.C., Conklin, B.R. (2003) MAPPFinder: using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data.  ''Genome Biology'' 4:R7.]
 +
 
 +
=== Overview of Microarray Data Analysis ===
 +
 
 +
This is a list of steps required to analyze DNA microarray data.
 +
 
 +
# Quantitate the fluorescence signal in each spot in the microarray image.
 +
#* Typically performed by the scanner software, although third party software packages do exist.
 +
#* The image of the microarray slide and this quantitation are considered the "raw-est" form of the data.
 +
#* Ideally, this type of raw data would be made publicly available upon publication. 
 +
#* In practice, the image data is usually not made available because the raw image file of one slide could be up to 100 MB in size.
 +
#* Also, some journals do not require data deposition as a requirement for publication, so often published data are not actually available anywhere for download.
 +
#* Microarray data is not centrally located on the web.  Some major sources are:
 +
#** [http://www.ncbi.nlm.nih.gov/geo/ NCBI GEO]
 +
#** [http://www.ebi.ac.uk/microarray-as/ae/ EBI ArrayExpress]
 +
#** [http://smd.princeton.edu/ Stanford Microarray Database (now hosted by Princeton)]
 +
#** [http://puma.princeton.edu/ PUMAdb (Princeton Microarray Database)]
 +
#** In addition, microarray data can sometimes be found as supplementary information with a journal article or on an investigator's own web site.
 +
# Calculate the ratio of red/green fluorescence
 +
# Log(base 2) transform the ratios
 +
# Normalize the log ratios on each microarray slide
 +
# Normalize the log ratios for a set of slides in an experiment
 +
# Perform statistical analysis on the log ratios
 +
# Compare individual genes with known data
 +
# Look for patterns (expression profiles) in the data (many programs are available to do this)
 +
# Perform Gene Ontology term enrichment analysis (we will use MAPPFinder for this)
 +
# Map onto biological pathways (we will use GenMAPP for this)
 +
 
 +
In this week's exercise, we will do steps 5-7 (part 1, using Microsoft Excel) and 9 (part 2, using GenMAPP & MAPPFinder).
 +
 
 +
=== Statistical Analysis of ''Vibrio cholerae'' Microarray Data (Part 1) ===
 +
 
 +
* We will begin this analysis in class on Thursday, October 10.
 +
* The detailed instructions for the microarray data analysis we will carry out can be found on the [http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae Sample Microarray Analysis for ''Vibrio cholerae'' page] hosted by [http://www.openwetware.org OpenWetWare.org].
 +
 
 +
=== MAPPFinder Analysis of ''Vibrio cholerae'' Microarray Data (Part 2) ===
 +
 
 +
* We will begin this analysis in class on Tuesday, October 15.
 +
* The detailed instructions can be found on the [http://www.openwetware.org/wiki/BIOL367/F10:GenMAPP_and_MAPPFinder_Protocols GenMAPP and MAPPFinder Protocols page] hosted by [http://www.openwetware.org OpenWetWare.org].
 +
 
 +
==== Downloading and installing the GenMAPP and MAPPFinder Software ====
 +
 
 +
* We will be using GenMAPP and MAPPFinder version 2.1 (http://genmapp.org).  This software is already installed on the Windows machines in the Keck lab annex and in the Seaver 120 computer lab.
 +
** This version is now called "GenMAPP Classic" and can be downloaded [http://www.genmapp.org/download_v2.1.php from this page].
 +
** Follow the instructions in the installer.
 +
** During installation, the installer will open a window called the GenMAPP Data Acquisition Tool.  It will not function because it cannot connect to the server.  This is OK, you will download your ''Vibrio cholerae'' Gene Database from the XMLPipeDB project at SourceForge.org.
 +
*** Half of the class will use the Vc-Std_External_20090622.gdb Gene Database that was created by the Fall 2008 Biological Databases class.
 +
**** To download this Gene Database, [http://sourceforge.net/projects/xmlpipedb/files/V.%20cholerae%20Gene%20Database/V.%20cholerae%2020090622/Vc-Std_External_20090622.zip/download follow '''''this link''''' to the XMLPipeDB SourceForge Download page].
 +
*** Half of the class will use a more recent Vc-Std_External_20101022.gdb Gene Database that was created by Drs. Dahlquist and Dionisio in 2010.
 +
**** To download this Gene Database, [http://sourceforge.net/projects/xmlpipedb/files/V.%20cholerae%20Gene%20Database/V.%20cholerae%2020101022/Vc-Std_External_20101022.zip/download follow '''''this link''''' to the XMLPipeDB SourceForge Download page].
 +
*** The members of a pair should each choose a different gene database.
 +
* Click on the link for the Gene Database to which you have been assigned, download the file, and save it into the folder C:\GenMAPP 2 Data\Gene Databases (if you accepted the default folders during the installation), and extract it.
  
 
=== Groups ===
 
=== Groups ===
Line 26: Line 81:
 
{{Shared Journal Instructions|week=8}}
 
{{Shared Journal Instructions|week=8}}
  
==== Reflection ====
+
=== View ===
  
 
Now that you've done your own microarray analysis, we will revisit the case [http://www.cbsnews.com/video/watch/?id=7398476n "Deception at Duke"].
 
Now that you've done your own microarray analysis, we will revisit the case [http://www.cbsnews.com/video/watch/?id=7398476n "Deception at Duke"].
 +
* View the video: [http://videolectures.net/cancerbioinformatics2010_baggerly_irrh/ The Importance of Reproducible Research in High-Throughput Biology: Case Studies in Forensic Bioinformatics].
 +
* View the slides from DataONE on [http://www.dataone.org/sites/all/documents/L04_DataEntryManipulation.pptx data entry and manipulation].
 +
* Optional: for more information on the Duke saga, see the web site put together by Baggerly and Coombes [http://bioinformatics.mdanderson.org/Supplements/ReproRsch-All/Modified/StarterSet/ here].
  
View the video: [http://videolectures.net/cancerbioinformatics2010_baggerly_irrh/ The Importance of Reproducible Research in High-Throughput Biology: Case Studies in Forensic Bioinformatics] and answer the following questions:
+
=== Reflection ===
 
+
* What were the main issues with the data and analysis identified by Baggerly and Coombs?
+
* What recommendations does Dr. Baggerly recommend for reproducible research?
+
* Is there anything else that '''''you''''' would recommend to prevent cases like this from happening in the future?
+
  
For more information on the Duke saga, see the web site put together by Baggerly and Coombes [http://bioinformatics.mdanderson.org/Supplements/ReproRsch-All/Modified/StarterSet/ here].
+
* What were the main issues with the data and analysis identified by Baggerly and Coombs?  What best practices enumerated by DataONE were violated?  Which of these did Dr. Baggerly claim were common issues?
 +
* What recommendations does Dr. Baggerly recommend for reproducible research?  How do these correspond to what DataONE recommends?
 +
* Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
 +
* Look at the methods and results described in the [http://www.nature.com/nature/journal/v417/n6889/full/nature00778.html Merrell et al. (2002)] paper. Do you think there is sufficient information there to reproduce their data analysis?  Why or why not?
  
 
<!--
 
<!--
Line 42: Line 99:
 
* What was your initial comfort level with Excel before performing this exercise?  Did your comfort level change?  Why or why not?
 
* What was your initial comfort level with Excel before performing this exercise?  Did your comfort level change?  Why or why not?
 
* What advice would you give next year's class when performing this assignment?
 
* What advice would you give next year's class when performing this assignment?
* Do you think the original authors of the Merrell et al. (2002) paper adequately discussed their methods and results of their data analysis in their ''Nature'' paper?  Why or why not?
 
 
-->
 
-->

Latest revision as of 22:32, 8 October 2013

This journal entry is due on Friday, October 18, at midnight PDT. (Thursday night/Friday morning) Note that there is an interim deadline for uploading your files from part 1 by midnight, Monday, October 14. (Sunday night/Monday morning)

For the next section of the course, you will be introduced to the process we will use for the final projects in the course in a series of in-class and journal assignments where we will first analyze microarray data from Vibrio cholerae, and then learn how to create a Gene Database for this organism.

Contents

[edit] Individual Journal Assignment

  • Store this journal entry as "username Week 8" (i.e., this is the text to place between the square brackets when you link to this page).
  • Link from your user page to this Assignment page.
  • Link to your journal entry from your user page.
  • Link back from your journal entry to your user page.
  • Don't forget to add the "Journal Entry" category to the end of your wiki page.
    • Note: you can easily fulfill all of these links by adding them to your template and then using your template on your journal entry.
  • Keep an "electronic lab notebook", containing your methods, results, and interpretations of the Vibrio cholerae microarray analysis part 1 and part 2 in your "username Week 8" journal page. Although you will have assigned partner(s), you will need to fill out your own individual journal page.
    • Your electronic notebook should contain enough information such that you or someone else could reproduce what you did given only the information on your page.
    • You should use screenshots and hyperlinks as appropriate.
    • Be sure to answer any questions embedded in the protocol in your journal page.
  • Upload the requested files from part 1 and part 2 to this wiki and link to them on your individual journal page.
    • IMPORTANT upload your completed spreadsheet (both the .xls and .txt versions) from part 1 by the interim deadline of midnight, Monday, October 14 (Sunday night/Monday morning) so that Dr. Dahlquist can check them before moving on to part 2 of the exercise. She will not be assigning grades at this point; you will have the chance to make corrections, if necessary, before completing part 2.

[edit] Reading

[edit] Overview of Microarray Data Analysis

This is a list of steps required to analyze DNA microarray data.

  1. Quantitate the fluorescence signal in each spot in the microarray image.
    • Typically performed by the scanner software, although third party software packages do exist.
    • The image of the microarray slide and this quantitation are considered the "raw-est" form of the data.
    • Ideally, this type of raw data would be made publicly available upon publication.
    • In practice, the image data is usually not made available because the raw image file of one slide could be up to 100 MB in size.
    • Also, some journals do not require data deposition as a requirement for publication, so often published data are not actually available anywhere for download.
    • Microarray data is not centrally located on the web. Some major sources are:
  2. Calculate the ratio of red/green fluorescence
  3. Log(base 2) transform the ratios
  4. Normalize the log ratios on each microarray slide
  5. Normalize the log ratios for a set of slides in an experiment
  6. Perform statistical analysis on the log ratios
  7. Compare individual genes with known data
  8. Look for patterns (expression profiles) in the data (many programs are available to do this)
  9. Perform Gene Ontology term enrichment analysis (we will use MAPPFinder for this)
  10. Map onto biological pathways (we will use GenMAPP for this)

In this week's exercise, we will do steps 5-7 (part 1, using Microsoft Excel) and 9 (part 2, using GenMAPP & MAPPFinder).

[edit] Statistical Analysis of Vibrio cholerae Microarray Data (Part 1)

[edit] MAPPFinder Analysis of Vibrio cholerae Microarray Data (Part 2)

[edit] Downloading and installing the GenMAPP and MAPPFinder Software

  • We will be using GenMAPP and MAPPFinder version 2.1 (http://genmapp.org). This software is already installed on the Windows machines in the Keck lab annex and in the Seaver 120 computer lab.
    • This version is now called "GenMAPP Classic" and can be downloaded from this page.
    • Follow the instructions in the installer.
    • During installation, the installer will open a window called the GenMAPP Data Acquisition Tool. It will not function because it cannot connect to the server. This is OK, you will download your Vibrio cholerae Gene Database from the XMLPipeDB project at SourceForge.org.
  • Click on the link for the Gene Database to which you have been assigned, download the file, and save it into the folder C:\GenMAPP 2 Data\Gene Databases (if you accepted the default folders during the installation), and extract it.

[edit] Groups

  • Viktoria - Kevin Meilek
  • Hilda - Tauras
  • Dillon - Kevin McGee
  • Lena - Alina
  • Mitchell - Gabriel
  • Stephen - Miles
  • Katrina - Lauren

[edit] Shared Journal Assignment

  • Store your journal entry in the shared Class Journal Week 8 page. If this page does not exist yet, go ahead and create it (congratulations on getting in first :) )
  • Link to your journal entry from your user page.
  • Link back from the journal entry to your user page.
    • NOTE: you can easily fulfill the links part of these instructions by adding them to your template and using the template on your user page.
  • Sign your portion of the journal with the standard wiki signature shortcut (~~~~).
  • Add the "Journal Entry" and "Shared" categories to the end of the wiki page (if someone has not already done so).

[edit] View

Now that you've done your own microarray analysis, we will revisit the case "Deception at Duke".

[edit] Reflection

  • What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
  • What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
  • Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
  • Look at the methods and results described in the Merrell et al. (2002) paper. Do you think there is sufficient information there to reproduce their data analysis? Why or why not?


Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox