Difference between revisions of "Dramir36 Week 7"
Jump to navigation
Jump to search
(moved answer to time point and strain used) |
(→Notes/Methods/Results: bolded statement) |
||
Line 34: | Line 34: | ||
==Notes/Methods/Results== | ==Notes/Methods/Results== | ||
− | :The strain that will be analyzed is dHAP4. There are four replicates for the 15,30, and 60 minute time points, but only three replicates for the 90 and 120 minute time points. | + | :'''The strain that will be analyzed is dHAP4. There are four replicates for the 15,30, and 60 minute time points, but only three replicates for the 90 and 120 minute time points. |
*T test: is this gene expression change significantly different than zero at a time point? | *T test: is this gene expression change significantly different than zero at a time point? |
Revision as of 16:21, 15 October 2019
User:Dramir36 template:Dramir36 Skinny Genes
- Week 1
- Week 2
- Week 3
- Week 4
- Week 5
- Week 6
- Week 7
- Week 8
- Week 9
- Week 10
- Week 11
- Week 12/13
- Week 14
- Week 15
Contents
Purpose
- to conduct the "analyze" step of the data life cycle for a DNA microarray dataset.
- to develop an intuition about what different p-value cut-offs mean.
- to keep a detailed electronic laboratory notebook to facilitate reproducible research.
- to revisit the "Deception at Duke" case with new insights because you have analyzed your own dataset.
Background
This is a list of steps required to analyze DNA microarray data.
- Quantitate the fluorescence signal in each spot
- Calculate the ratio of red/green fluorescence
- Log2 transform the ratios
- Steps 1-3 have been performed for you by the GenePix Pro software (which runs the microarray scanner).
- Normalize the ratios on each microarray slide
- Normalize the ratios for a set of slides in an experiment
- Steps 4-5 was performed for you using a script in R, a statistics package (see: Microarray Data Analysis Workflow)
- You will perform the following steps:
- Perform statistical analysis on the ratios
- Compare individual genes with known data
- Steps 6-7 are performed in Microsoft Excel
- Pattern finding algorithms (clustering)
- Map onto biological pathways
- We will use software called STEM for the clustering and mapping
- Identifying regulatory transcription factors responsible for observed changes in gene expression
- Dynamical systems modeling of the gene regulatory network (GRNmap)
- Viewing modeling results in GRNsight
Notes/Methods/Results
- The strain that will be analyzed is dHAP4. There are four replicates for the 15,30, and 60 minute time points, but only three replicates for the 90 and 120 minute time points.
- T test: is this gene expression change significantly different than zero at a time point?
- p>0.05 5%
- probability that you would have seen at least this big of a change by chance.
- ANOVA: is the gene expression significantly different than zero at any time point?
- Values below 0.25 should be considered to be a gene with no change in expression
Experimental Design and Getting Ready
The data used in this exercise is publicly available at the NCBI GEO database in record GSE83656.
- Begin by downloading the Excel file for dHAP4, found in the "Data/Files" section of this page
- In the Excel spreadsheet, there is a worksheet labeled "Master_Sheet_dHAP4"
- In this worksheet, each row contains the data for one gene (one spot on the microarray).
- The first column contains the "MasterIndex", which numbers all of the rows sequentially in the worksheet so that we can always use it to sort the genes into the order they were in when we started.
- The second column (labeled "ID") contains the Systematic Name (gene identifier) from the Saccharomyces Genome Database.
- The third column contains the Standard Name for each of the genes.
- Each subsequent column contains the log2 ratio of the red/green fluorescence from each microarray hybridized in the experiment (steps 1-5 above having been performed for you already), for each strain starting with wild type and proceeding in alphabetical order by strain deletion.
- Each of the column headings from the data begin with the experiment name ("wt" for wild type S. cerevisiae data, "dCIN5" for the Δcin5 data, etc.). "LogFC" stands for "Log2 Fold Change" which is the Log2 red/green ratio. The timepoints are designated as "t" followed by a number in minutes. Replicates are numbered as "-0", "-1", "-2", etc. after the timepoint.
- The timepoints are t15, t30, t60 (cold shock at 13°C) and t90 and t120 (cold shock at 13°C followed by 30 or 60 minutes of recovery at 30°C).
Data/Files
- Startup File: File:BIOL367 F19 microarray-data dHAP4 DR.xlsx
Conclusion
Acknowledgments
- Copied purpose, methods, and procedure from Week 7 assignment page to individual journal and modified steps to relate to the dHAP4 data