Difference between revisions of "Week 8"
Kdahlquist (Talk | contribs) (→Downloading and installing the GenMAPP and MAPPFinder Software: instructions to download vibrio databases) |
Kdahlquist (Talk | contribs) (clarification about electronic lab notebook) |
||
(2 intermediate revisions by one user not shown) | |||
Line 1: | Line 1: | ||
− | |||
− | |||
'''This journal entry is due on Friday, October 18, at midnight PDT.''' ''(Thursday night/Friday morning)'' Note that there is an interim deadline for uploading your files from part 1 by midnight, Monday, October 14. ''(Sunday night/Monday morning)'' | '''This journal entry is due on Friday, October 18, at midnight PDT.''' ''(Thursday night/Friday morning)'' Note that there is an interim deadline for uploading your files from part 1 by midnight, Monday, October 14. ''(Sunday night/Monday morning)'' | ||
Line 8: | Line 6: | ||
* Keep an "electronic lab notebook", containing your methods, results, and interpretations of the ''Vibrio cholerae'' microarray analysis [http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae part 1] and [http://www.openwetware.org/wiki/BIOL367/F10:GenMAPP_and_MAPPFinder_Protocols part 2] in your "''username'' Week 8" journal page. Although you will have assigned partner(s), you will need to fill out your own individual journal page. | * Keep an "electronic lab notebook", containing your methods, results, and interpretations of the ''Vibrio cholerae'' microarray analysis [http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae part 1] and [http://www.openwetware.org/wiki/BIOL367/F10:GenMAPP_and_MAPPFinder_Protocols part 2] in your "''username'' Week 8" journal page. Although you will have assigned partner(s), you will need to fill out your own individual journal page. | ||
+ | ** Your electronic notebook should contain enough information such that you or someone else could reproduce what you did given only the information on your page. | ||
+ | ** You should use screenshots and hyperlinks as appropriate. | ||
** Be sure to answer any questions embedded in the protocol in your journal page. | ** Be sure to answer any questions embedded in the protocol in your journal page. | ||
* Upload the requested files from [http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae part 1] and [http://www.openwetware.org/wiki/BIOL367/F10:GenMAPP_and_MAPPFinder_Protocols part 2] to this wiki and link to them on your individual journal page. | * Upload the requested files from [http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae part 1] and [http://www.openwetware.org/wiki/BIOL367/F10:GenMAPP_and_MAPPFinder_Protocols part 2] to this wiki and link to them on your individual journal page. | ||
Line 88: | Line 88: | ||
* Optional: for more information on the Duke saga, see the web site put together by Baggerly and Coombes [http://bioinformatics.mdanderson.org/Supplements/ReproRsch-All/Modified/StarterSet/ here]. | * Optional: for more information on the Duke saga, see the web site put together by Baggerly and Coombes [http://bioinformatics.mdanderson.org/Supplements/ReproRsch-All/Modified/StarterSet/ here]. | ||
− | + | === Reflection === | |
* What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues? | * What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues? |
Latest revision as of 22:32, 8 October 2013
This journal entry is due on Friday, October 18, at midnight PDT. (Thursday night/Friday morning) Note that there is an interim deadline for uploading your files from part 1 by midnight, Monday, October 14. (Sunday night/Monday morning)
For the next section of the course, you will be introduced to the process we will use for the final projects in the course in a series of in-class and journal assignments where we will first analyze microarray data from Vibrio cholerae, and then learn how to create a Gene Database for this organism.
Contents |
[edit] Individual Journal Assignment
- Store this journal entry as "username Week 8" (i.e., this is the text to place between the square brackets when you link to this page).
- Link from your user page to this Assignment page.
- Link to your journal entry from your user page.
- Link back from your journal entry to your user page.
- Don't forget to add the "Journal Entry" category to the end of your wiki page.
- Note: you can easily fulfill all of these links by adding them to your template and then using your template on your journal entry.
- Keep an "electronic lab notebook", containing your methods, results, and interpretations of the Vibrio cholerae microarray analysis part 1 and part 2 in your "username Week 8" journal page. Although you will have assigned partner(s), you will need to fill out your own individual journal page.
- Your electronic notebook should contain enough information such that you or someone else could reproduce what you did given only the information on your page.
- You should use screenshots and hyperlinks as appropriate.
- Be sure to answer any questions embedded in the protocol in your journal page.
- Upload the requested files from part 1 and part 2 to this wiki and link to them on your individual journal page.
- IMPORTANT upload your completed spreadsheet (both the .xls and .txt versions) from part 1 by the interim deadline of midnight, Monday, October 14 (Sunday night/Monday morning) so that Dr. Dahlquist can check them before moving on to part 2 of the exercise. She will not be assigning grades at this point; you will have the chance to make corrections, if necessary, before completing part 2.
[edit] Reading
- Merrell, D.S., Butler, S.M., Qadri, F., Dolganov, N.A., Alam, A., Cohen, M.B., Calderwood, S.B., Schoolnik, G.K., and Camilli, A. (2002) Host-induced epidemic spread of the cholera bacterium. Nature 417: 642-645.
- Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G.(2000) Gene Ontology: tool for the unification of biology. Nature Genetics 25: 25-29.
- Doniger, S.W., Salomonis, N., Dahlquist, K.D., Vranizan, K., Lawlor, S.C., Conklin, B.R. (2003) MAPPFinder: using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data. Genome Biology 4:R7.
[edit] Overview of Microarray Data Analysis
This is a list of steps required to analyze DNA microarray data.
- Quantitate the fluorescence signal in each spot in the microarray image.
- Typically performed by the scanner software, although third party software packages do exist.
- The image of the microarray slide and this quantitation are considered the "raw-est" form of the data.
- Ideally, this type of raw data would be made publicly available upon publication.
- In practice, the image data is usually not made available because the raw image file of one slide could be up to 100 MB in size.
- Also, some journals do not require data deposition as a requirement for publication, so often published data are not actually available anywhere for download.
- Microarray data is not centrally located on the web. Some major sources are:
- NCBI GEO
- EBI ArrayExpress
- Stanford Microarray Database (now hosted by Princeton)
- PUMAdb (Princeton Microarray Database)
- In addition, microarray data can sometimes be found as supplementary information with a journal article or on an investigator's own web site.
- Calculate the ratio of red/green fluorescence
- Log(base 2) transform the ratios
- Normalize the log ratios on each microarray slide
- Normalize the log ratios for a set of slides in an experiment
- Perform statistical analysis on the log ratios
- Compare individual genes with known data
- Look for patterns (expression profiles) in the data (many programs are available to do this)
- Perform Gene Ontology term enrichment analysis (we will use MAPPFinder for this)
- Map onto biological pathways (we will use GenMAPP for this)
In this week's exercise, we will do steps 5-7 (part 1, using Microsoft Excel) and 9 (part 2, using GenMAPP & MAPPFinder).
[edit] Statistical Analysis of Vibrio cholerae Microarray Data (Part 1)
- We will begin this analysis in class on Thursday, October 10.
- The detailed instructions for the microarray data analysis we will carry out can be found on the Sample Microarray Analysis for Vibrio cholerae page hosted by OpenWetWare.org.
[edit] MAPPFinder Analysis of Vibrio cholerae Microarray Data (Part 2)
- We will begin this analysis in class on Tuesday, October 15.
- The detailed instructions can be found on the GenMAPP and MAPPFinder Protocols page hosted by OpenWetWare.org.
[edit] Downloading and installing the GenMAPP and MAPPFinder Software
- We will be using GenMAPP and MAPPFinder version 2.1 (http://genmapp.org). This software is already installed on the Windows machines in the Keck lab annex and in the Seaver 120 computer lab.
- This version is now called "GenMAPP Classic" and can be downloaded from this page.
- Follow the instructions in the installer.
- During installation, the installer will open a window called the GenMAPP Data Acquisition Tool. It will not function because it cannot connect to the server. This is OK, you will download your Vibrio cholerae Gene Database from the XMLPipeDB project at SourceForge.org.
- Half of the class will use the Vc-Std_External_20090622.gdb Gene Database that was created by the Fall 2008 Biological Databases class.
- To download this Gene Database, follow this link to the XMLPipeDB SourceForge Download page.
- Half of the class will use a more recent Vc-Std_External_20101022.gdb Gene Database that was created by Drs. Dahlquist and Dionisio in 2010.
- To download this Gene Database, follow this link to the XMLPipeDB SourceForge Download page.
- The members of a pair should each choose a different gene database.
- Half of the class will use the Vc-Std_External_20090622.gdb Gene Database that was created by the Fall 2008 Biological Databases class.
- Click on the link for the Gene Database to which you have been assigned, download the file, and save it into the folder C:\GenMAPP 2 Data\Gene Databases (if you accepted the default folders during the installation), and extract it.
[edit] Groups
- Viktoria - Kevin Meilek
- Hilda - Tauras
- Dillon - Kevin McGee
- Lena - Alina
- Mitchell - Gabriel
- Stephen - Miles
- Katrina - Lauren
[edit]
- Store your journal entry in the shared Class Journal Week 8 page. If this page does not exist yet, go ahead and create it (congratulations on getting in first :) )
- Link to your journal entry from your user page.
- Link back from the journal entry to your user page.
- NOTE: you can easily fulfill the links part of these instructions by adding them to your template and using the template on your user page.
- Sign your portion of the journal with the standard wiki signature shortcut (
~~~~
). - Add the "Journal Entry" and "Shared" categories to the end of the wiki page (if someone has not already done so).
[edit] View
Now that you've done your own microarray analysis, we will revisit the case "Deception at Duke".
- View the video: The Importance of Reproducible Research in High-Throughput Biology: Case Studies in Forensic Bioinformatics.
- View the slides from DataONE on data entry and manipulation.
- Optional: for more information on the Duke saga, see the web site put together by Baggerly and Coombes here.
[edit] Reflection
- What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
- What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
- Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
- Look at the methods and results described in the Merrell et al. (2002) paper. Do you think there is sufficient information there to reproduce their data analysis? Why or why not?