Ymesfin Week 9
Contents
Purpose
The purpose of this assignment was to create a detailed electronic lab notebook to statistically analyze a DNA microarray dataset and to demonstrate our understanding of p-value cut-offs.
Methods
- The data was first downloaded from the wiki
- The strain we will be analyzing is Delta-Hap4
- The data was obtained from the following file: Ymesfin_BIOL367_F19_microarray-data_dHAP4(7846).xlsx
- The file had timepoints at 15, 30, 60, 90, and 120 minutes. All the timepoints had 4 replicates but the 90 and 120 minute timepoints had 3 replicates. There were a total of 6189 samples in this dataset.
- Performed ANOVA Statistical Analysis
- Calculated average results for each datapoint.
- Calculated Sum square of the entire dataset
- Created the column headers dHAP4_ss_(TIME) for each timepoint.
- Calculated Sum Square for each timepoint
- Calculated Fstat for each data point
- Calculated p value for each data point
- Perfomed Bonferroni p value correction for each data point
- Performed Benjamini & Hochberg p value Correction for each data point
- Performed Sanity check
- Clustering and GO Term Enrichment with STEM Software
- The data was copied onto a new excel sheet
- Column A was renamed to "SPOT", Column B was renamed "Gene Symbol", and the column named Column C was deleted
- All the data entries with BH p-values > 0.05 were deleted
- All of the data columns except for the Average Log Fold change columns for each timepoint were deleted
- The data columns were renamed with just the time and units (for example, 15m, 30m, etc.)
- The data was saved as a tab-delimited text document
- The STEM Software, Gene Ontology, and yeast GO annotations were downloaded
- The STEM Software was run using the dHAP4 text data
- The Profile GO and Profile Gene Tables were saved from the STEM results
- Profile 48 from the STEM results was selected for further analysis
- Why did you select this profile? In other words, why was it interesting to you?
- I chose Profile 48 because it was the third most significant profile and the expression of the gene appeared parabolic.
- How many genes belong to this profile?
- 256 genes are associated with this profile.
- How many genes were expected to belong to this profile?
- 32.6 genes were expected to be associated with this profile.
- What is the p-value for the enrichment of genes in this profile?
- The p-value of enrichment for this profile is 1.8E-141
- How many GO terms are associated with this profile at p < 0.05?
- 35 terms associated with this profile have a p-value < 0.05.
- How many GO terms are associated with this profile with a corrected p value < 0.05?
- Only one term associated with this profile had a corrected p value < 0.05.
- Why did you select this profile? In other words, why was it interesting to you?
- The definitions of the top 6 terms with p-values <0.05 were searched on http://geneontology.org
- Using YEASTRACT to Infer which Transcription Factors Regulate a Cluster of Genes
- The gene list for Profile 48 was pasted onto Yeastract and ranked by TF
- How many transcription factors are green or "significant"?
- 9 are significant.
- Are CIN5, GLN3, and/or HAP4 on the list? If so, what is their "% in user set", "% in YEASTRACT", and "p value".
- CIN5: 16.47%, p value= 0.999999881503972; GLN3: 35.29%, p value= 0.276481708725189; HAP4: 14.51%, p value= 0.597297513212687
- How many transcription factors are green or "significant"?
- A regulation matrix was created from the top 15 transcription factors with the highest p-values and the transcription factors GLN3, HAP4, and CIN5 using the Yeastract Gene Regulation Matrix
- The gene list for Profile 48 was pasted onto Yeastract and ranked by TF
- Visualizing Your Gene Regulatory Networks with GRNsight
- The regulation matrix was reformatted into an adjacency matrix
- The rows and columns of the adjacency matrix were sorted in alphabetical order
- The data was visualized with a grid by uploading the adjacency matrix onto GRNsight
Results
Data and Files
dHAP4 profile 48 Regulation Matrix
Conclusion
Acknowledgements
Dr. Kam Dahlquist; Professor
Naomi Tesfaiohannes; Homework Partner
David Ramirez; Homework Partner
Except for what is noted above, this individual journal entry was completed by me and not copied from another source.
Ymesfin (talk) 17:31, 30 October 2019 (PDT)
References
LMU BioDB 2019. (2019). Week 7. Retrieved October 16, 2019, from https://xmlpipedb.cs.lmu.edu/biodb/fall2019/index.php/Week_7
LMU BioDB 2019. (2019). Week 8. Retrieved October 23, 2019, from https://xmlpipedb.cs.lmu.edu/biodb/fall2019/index.php/Week_8
LMU BioDB 2019. (2019). Week 9. Retrieved October 26, 2019, from https://xmlpipedb.cs.lmu.edu/biodb/fall2019/index.php/Week_9
The Gene Ontology Resource. (2019). Retrieved October 26, 2019, from http://geneontology.org
Yeastract. (2019). Retrieved October 29, 2019, from http://www.yeastract.com/index.php
Yeastract Gene Regulation Matrix. (2019). Retrieved October 29, 2019, from http://www.yeastract.com/formregmatrix.php
GRNsight. (2019). Retrieved October 29, 2019, https://dondi.github.io/GRNsight/