Ymesfin Week 9
Contents
Purpose
The purpose of this assignment was to create a detailed electronic lab notebook to statistically analyze a DNA microarray dataset and to demonstrate our understanding of p-value cut-offs.
Methods
- The data was first downloaded from the wiki
- The strain we will be analyzing is Delta-Hap4
- The data was obtained from the following file: Ymesfin_BIOL367_F19_microarray-data_dHAP4(7846).xlsx
- The file had timepoints at 15, 30, 60, 90, and 120 minutes. All the timepoints had 4 replicates but the 90 and 120 minute timepoints had 3 replicates. There were a total of 6189 samples in this dataset.
- Performed ANOVA Statistical Analysis
- Calculated average results for each datapoint.
- Calculated Sum square of the entire dataset
- Created the column headers dHAP4_ss_(TIME) for each timepoint.
- Calculated Sum Square for each timepoint
- Calculated Fstat for each data point
- Calculated p value for each data point
- Perfomed Bonferroni p value correction for each data point
- Performed Benjamini & Hochberg p value Correction for each data point
- Performed Sanity check
- Clustering and GO Term Enrichment with STEM Software
- The data was copied onto a new excel sheet
- Column A was renamed to "SPOT", Column B was renamed "Gene Symbol", and the column named Column C was deleted
- All the data entries with BH p-values > 0.05 were deleted
- All of the data columns except for the Average Log Fold change columns for each timepoint were deleted
- The data columns were renamed with just the time and units (for example, 15m, 30m, etc.)
- The data was saved as a tab-delimited text document
- The STEM Software, Gene Ontology, and yeast GO annotations were downloaded
- The STEM Software was run using the dHAP4 text data
- The Profile GO and Profile Gene Tables were saved from the STEM results
- Profile 48 from the STEM results was selected for further analysis
- Why did you select this profile? In other words, why was it interesting to you?
- I chose Profile 48 because it was the third most significant profile and the expression of the gene appeared parabolic.
- How many genes belong to this profile?
- 256 genes are associated with this profile.
- How many genes were expected to belong to this profile?
- 32.6 genes were expected to be associated with this profile.
- What is the p-value for the enrichment of genes in this profile?
- The p-value of enrichment for this profile is 1.8E-141
- How many GO terms are associated with this profile at p < 0.05?
- 35 terms associated with this profile have a p-value < 0.05.
- How many GO terms are associated with this profile with a corrected p value < 0.05?
- Only one term associated with this profile had a corrected p value < 0.05.
- Why did you select this profile? In other words, why was it interesting to you?
- The definitions of the top 6 terms with p-values <0.05 were searched on http://geneontology.org
How many transcription factors are green or "significant"?
--9 are significant
Are CIN5, GLN3, and/or HAP4 on the list? If so, what is their "% in user set", "% in YEASTRACT", and "p value". CIN5: 16.47%, p value= 0.999999881503972; GLN3: 35.29%, p value= 0.276481708725189; HAP4: 14.51%, p value= 0.597297513212687
Results
Data and Files
dHAP4 P-Values and Stem Results
dHAP4 profile 48 Regulation Matrix
Conclusion
Acknowledgements
Dr. Kam Dahlquist; Professor
Naomi Tesfaiohannes; Homework Partner
David Ramirez; Homework Partner
Except for what is noted above, this individual journal entry was completed by me and not copied from another source.
References
LMU BioDB 2019. (2019). Week 7. Retrieved October 16, 2019, from https://xmlpipedb.cs.lmu.edu/biodb/fall2019/index.php/Week_7
LMU BioDB 2019. (2019). Week 8. Retrieved October 23, 2019, from https://xmlpipedb.cs.lmu.edu/biodb/fall2019/index.php/Week_8
LMU BioDB 2019. (2019). Week 9. Retrieved October 26, 2019, from https://xmlpipedb.cs.lmu.edu/biodb/fall2019/index.php/Week_9
The Gene Ontology Resource. (2019). Retrieved October 26, 2019, from http://geneontology.org