Difference between revisions of "Kmill104 Week 9"
(→Methods/Results: adding file name) |
(→Methods/Results: adding methods from week 9 assignment) |
||
Line 2: | Line 2: | ||
==Methods/Results== | ==Methods/Results== | ||
+ | The data used in this exercise is publicly available at the NCBI GEO database in [https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE83656 record GSE83656]. | ||
+ | * Begin by downloading the Excel file for your group's strain. | ||
+ | ** Katie & Dean ([[Media:BIOL367_S24_microarray-data_wt.xlsx |wild type data]]) | ||
+ | * '''NOTE: before beginning any analysis, immediately change the filename (Save As...) so that it contains your initials to distinguish it from other students' work.''' | ||
+ | * In the Excel spreadsheet, there is a worksheet labeled "Master_Sheet_<STRAIN>", where <STRAIN> is replaced by the strain designation wt. | ||
+ | ** In this worksheet, each row contains the data for one gene (one spot on the microarray). | ||
+ | ** The first column contains the "MasterIndex", which numbers all of the rows sequentially in the worksheet so that we can always use it to sort the genes into the order they were in when we started. | ||
+ | ** The second column (labeled "ID") contains the Systematic Name (gene identifier) from the [http://www.yeastgenome.org Saccharomyces Genome Database]. | ||
+ | ** The third column contains the Standard Name for each of the genes. | ||
+ | ** Each subsequent column contains the log<sub>2</sub> ratio of the red/green fluorescence from each microarray hybridized in the experiment (steps 1-5 above having been performed for you already), for each strain starting with wild type and proceeding in alphabetical order by strain deletion. | ||
+ | ** Each of the column headings from the data begin with the experiment name ("wt" for wild type ''S. cerevisiae'' data, "dCIN5" for the ''Δcin5'' data, etc.). "LogFC" stands for "Log<sub>2</sub> Fold Change" which is the Log<sub>2</sub> red/green ratio. The timepoints are designated as "t" followed by a number in minutes. Replicates are numbered as "-0", "-1", "-2", etc. after the timepoint. | ||
+ | *** The timepoints are t15, t30, t60 (cold shock at 13°C) and t90 and t120 (cold shock at 13°C followed by 30 or 60 minutes of recovery at 30°C). | ||
+ | * '''''Begin by recording in your wiki, the strain that you will analyze, the filename, the number of replicates for each strain and each time point in your data.''''' | ||
+ | |||
The strain that me and Dean will analyze is the wild type strain. The name of the file is [[Media:BIOL367_S24_microarray-data_wt.xlsx | BIOL367_S24_microarray-data_wt.xlsx]]. | The strain that me and Dean will analyze is the wild type strain. The name of the file is [[Media:BIOL367_S24_microarray-data_wt.xlsx | BIOL367_S24_microarray-data_wt.xlsx]]. | ||
Revision as of 18:58, 20 March 2024
Purpose
Methods/Results
The data used in this exercise is publicly available at the NCBI GEO database in record GSE83656.
- Begin by downloading the Excel file for your group's strain.
- Katie & Dean (wild type data)
- NOTE: before beginning any analysis, immediately change the filename (Save As...) so that it contains your initials to distinguish it from other students' work.
- In the Excel spreadsheet, there is a worksheet labeled "Master_Sheet_<STRAIN>", where <STRAIN> is replaced by the strain designation wt.
- In this worksheet, each row contains the data for one gene (one spot on the microarray).
- The first column contains the "MasterIndex", which numbers all of the rows sequentially in the worksheet so that we can always use it to sort the genes into the order they were in when we started.
- The second column (labeled "ID") contains the Systematic Name (gene identifier) from the Saccharomyces Genome Database.
- The third column contains the Standard Name for each of the genes.
- Each subsequent column contains the log2 ratio of the red/green fluorescence from each microarray hybridized in the experiment (steps 1-5 above having been performed for you already), for each strain starting with wild type and proceeding in alphabetical order by strain deletion.
- Each of the column headings from the data begin with the experiment name ("wt" for wild type S. cerevisiae data, "dCIN5" for the Δcin5 data, etc.). "LogFC" stands for "Log2 Fold Change" which is the Log2 red/green ratio. The timepoints are designated as "t" followed by a number in minutes. Replicates are numbered as "-0", "-1", "-2", etc. after the timepoint.
- The timepoints are t15, t30, t60 (cold shock at 13°C) and t90 and t120 (cold shock at 13°C followed by 30 or 60 minutes of recovery at 30°C).
- Begin by recording in your wiki, the strain that you will analyze, the filename, the number of replicates for each strain and each time point in your data.
The strain that me and Dean will analyze is the wild type strain. The name of the file is BIOL367_S24_microarray-data_wt.xlsx.
Data & Files
Conclusion
Acknowledgements
References
Begin by recording in your wiki, the strain that you will analyze, the filename, the number of replicates for each strain and each time point in your data.
Analyzing the wild type strain Original filename: , Updated filename: MillerSymondsSheet At time 15, 4 replicates 30, 5 60, 4 90, 5 120, 5
Total is 23
How many genes have p < 0.05? and what is the percentage (out of 6189)? 2528 How many genes have p < 0.01? and what is the percentage (out of 6189)? 1652 How many genes have p < 0.001? and what is the percentage (out of 6189)? 919 How many genes have p < 0.0001? and what is the percentage (out of 6189)? 496
How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 6189)? 248 How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 6189)? 1822
Find NSR1 in your dataset. What is its unadjusted, Bonferroni-corrected, and B-H-corrected p values? What is its average Log fold change at each of the timepoints in the experiment?
2.86939E-10 1.77586E-06 8.87932E-07 3.279225 3.621 3.526525 -2.04985 -0.60622
What is its unadjusted, Bonferroni-corrected, and B-H-corrected p values? What is its average Log fold change at each of the timepoints in the experiment?
0.563798852 3489.351093 0.679258535 0.1076 -0.46192 -0.47075 0.16805 -0.18418