Dramir36 Week 7

From LMU BioDB 2019
Revision as of 15:04, 15 October 2019 by Dramir36 (talk | contribs) (Notes/Methods/Results: added methods that will be done in Week 8)
Jump to navigation Jump to search

User:Dramir36 template:Dramir36 Skinny Genes

  • Week 1
Week 1
Class Journal Week 1
  • Week 2
Week 2
Class Journal Week 2
Dramir36 Week 2
  • Week 3
Week 3
Class Journal Week 3
CDC28/YBR160W Week 3
  • Week 4
Week 4
Class Journal Week 4
Dramir36 Week 4
  • Week 5
Week 5
Class Journal Week 5
CRISPRlnc Group Journal
  • Week 6
Week 6
Class Journal Week 6
Dramir36 Week 6
  • Week 7
Week 7
Class Journal Week 7
Dramir36 Week 7
  • Week 8
Week 8
Class Journal Week 8
Dramir36 Week 8
  • Week 9
Week 9
Class Journal Week 9
Dramir36 Week 9
  • Week 10
Week 10
Class Journal Week 10
Dramir36 Week 10
  • Week 11
Week 11
Dramir36 Week 11
  • Week 12/13
Week 12/13
Dramir36 Week 12/13
  • Week 14
  • Week 15

Purpose

  • to conduct the "analyze" step of the data life cycle for a DNA microarray dataset.
  • to develop an intuition about what different p-value cut-offs mean.
  • to keep a detailed electronic laboratory notebook to facilitate reproducible research.
  • to revisit the "Deception at Duke" case with new insights because you have analyzed your own dataset.

Notes/Methods/Results

  • T test: is this gene expression change significantly different than zero at a time point?
p>0.05 5%
probability that you would have seen at least this big of a change by chance.
  • ANOVA: is the gene expression significantly different than zero at any time point?
  • Values below 0.25 should be considered to be a gene with no change in expression

Background

This is a list of steps required to analyze DNA microarray data.

  1. Quantitate the fluorescence signal in each spot
  2. Calculate the ratio of red/green fluorescence
  3. Log2 transform the ratios
    • Steps 1-3 have been performed for you by the GenePix Pro software (which runs the microarray scanner).
  4. Normalize the ratios on each microarray slide
  5. Normalize the ratios for a set of slides in an experiment
  6. Perform statistical analysis on the ratios
  7. Compare individual genes with known data
    • Steps 6-7 are performed in Microsoft Excel
  8. Pattern finding algorithms (clustering)
  9. Map onto biological pathways
    • We will use software called STEM for the clustering and mapping
  10. Identifying regulatory transcription factors responsible for observed changes in gene expression
  11. Dynamical systems modeling of the gene regulatory network (GRNmap)
  12. Viewing modeling results in GRNsight

Experimental Design and Getting Ready

The data used in this exercise is publicly available at the NCBI GEO database in record GSE83656.

  • Begin by downloading the Excel file for your group's strain.
  • NOTE: before beginning any analysis, immediately change the filename (Save As...) so that it contains your initials to distinguish it from other students' work.
  • In the Excel spreadsheet, there is a worksheet labeled "Master_Sheet_<STRAIN>", where <STRAIN> is replaced by the strain designation, wt, dCIN5, dGLN3, or dHAP4.
    • In this worksheet, each row contains the data for one gene (one spot on the microarray).
    • The first column contains the "MasterIndex", which numbers all of the rows sequentially in the worksheet so that we can always use it to sort the genes into the order they were in when we started.
    • The second column (labeled "ID") contains the Systematic Name (gene identifier) from the Saccharomyces Genome Database.
    • The third column contains the Standard Name for each of the genes.
    • Each subsequent column contains the log2 ratio of the red/green fluorescence from each microarray hybridized in the experiment (steps 1-5 above having been performed for you already), for each strain starting with wild type and proceeding in alphabetical order by strain deletion.
    • Each of the column headings from the data begin with the experiment name ("wt" for wild type S. cerevisiae data, "dCIN5" for the Δcin5 data, etc.). "LogFC" stands for "Log2 Fold Change" which is the Log2 red/green ratio. The timepoints are designated as "t" followed by a number in minutes. Replicates are numbered as "-0", "-1", "-2", etc. after the timepoint.
      • The timepoints are t15, t30, t60 (cold shock at 13°C) and t90 and t120 (cold shock at 13°C followed by 30 or 60 minutes of recovery at 30°C).
  • Begin by recording in your wiki, the strain that you will analyze, the filename, the number of replicates for each strain and each time point in your data.


Data/Files

Conclusion

Acknowledgments

  • Copied purpose, methods, and procedure from Week 7 assignment page to individual journal and modified steps to relate to the dHAP4 data

References