Ymesfin Week 9

From LMU BioDB 2019
Jump to navigation Jump to search

Purpose

The purpose of this assignment was to create a detailed electronic lab notebook to statistically analyze a DNA microarray dataset, demonstrate our understanding of p-value cut-offs, and display the relationships between the network of transcription factors in Saccharomyces cerevisiae.

Week 7 Assignment

Week 8 Assignment

Week 9 Assignment

Methods

  1. The data was first downloaded from the wiki
    • The strain we will be analyzing is Delta-Hap4
    • The data was obtained from the following file: Ymesfin_BIOL367_F19_microarray-data_dHAP4(7846).xlsx
    • The file had timepoints at 15, 30, 60, 90, and 120 minutes. All the timepoints had 4 replicates but the 90 and 120 minute timepoints had 3 replicates. There were a total of 6189 samples in this dataset.
  2. Performed ANOVA Statistical Analysis
    1. Calculated average results for each datapoint.
    2. Calculated Sum square of the entire dataset
    3. Created the column headers dHAP4_ss_(TIME) for each timepoint.
    4. Calculated Sum Square for each timepoint
    5. Calculated Fstat for each data point
    6. Calculated p value for each data point
    7. Perfomed Bonferroni p value correction for each data point
    8. Performed Benjamini & Hochberg p value Correction for each data point
    9. Performed Sanity check
  3. Clustering and GO Term Enrichment with STEM Software
    1. The data was copied onto a new excel sheet
    2. Column A was renamed to "SPOT", Column B was renamed "Gene Symbol", and the column named Column C was deleted
    3. All the data entries with BH p-values > 0.05 were deleted
    4. All of the data columns except for the Average Log Fold change columns for each timepoint were deleted
    5. The data columns were renamed with just the time and units (for example, 15m, 30m, etc.)
    6. The data was saved as a tab-delimited text document
    7. The STEM Software, Gene Ontology, and yeast GO annotations were downloaded
    8. The STEM Software was run using the dHAP4 text data
    9. The Profile GO and Profile Gene Tables were saved from the STEM results
    10. Profile 48 from the STEM results was selected for further analysis
      • Why did you select this profile? In other words, why was it interesting to you?
        • I chose Profile 48 because it was the third most significant profile and the expression of the gene appeared parabolic.
      • How many genes belong to this profile?
        • 256 genes are associated with this profile.
      • How many genes were expected to belong to this profile?
        • 32.6 genes were expected to be associated with this profile.
      • What is the p-value for the enrichment of genes in this profile?
        • The p-value of enrichment for this profile is 1.8E-141
      • How many GO terms are associated with this profile at p < 0.05?
        • 35 terms associated with this profile have a p-value < 0.05.
      • How many GO terms are associated with this profile with a corrected p value < 0.05?
        • Only one term associated with this profile had a corrected p value < 0.05.
    11. The definitions of the top 6 terms with p-values <0.05 were searched on http://geneontology.org
      • Look up the definitions for each of the terms at http://geneontology.org. In your research presentation, you will discuss the biological interpretation of these GO terms. In other words, why does the cell react to cold shock by changing the expression of genes associated with these GO terms? Also, what does this have to do with the transcription factor being deleted (for the groups working with deletion strain data)?
        • GO:0005634;
          • Name: Nucleus
          • Definition: A membrane-bounded organelle of eukaryotic cells in which chromosomes are housed and replicated. In most cells, the nucleus contains all of the cell's chromosomes except the organellar chromosomes, and is the site of RNA synthesis and processing. In some species, or in specialized cell types, RNA metabolism or DNA replication may be absent.
        • GO:0005730
          • Name: Nucleolus
          • Definition: A small, dense body one or more of which are present in the nucleus of eukaryotic cells. It is rich in RNA and protein, is not bounded by a limiting membrane, and is not seen during mitosis. Its prime function is the transcription of the nucleolar DNA into 45S ribosomal-precursor RNA, the processing of this RNA into 5.8S, 18S, and 28S components of ribosomal RNA, and the association of these components with 5S RNA and proteins synthesized outside the nucleolus. This association results in the formation of ribonucleoprotein precursors; these pass into the cytoplasm and mature into the 40S and 60S subunits of the ribosome.
        • GO:0006364
          • Name: rRNA processing
          • Definition: Any process involved in the conversion of a primary ribosomal RNA (rRNA) transcript into one or more mature rRNA molecules.
        • GO:0034476
          • Name: U5 snRNA 3'-end processing
          • Definition: Any process involved in forming the mature 3' end of a U5 snRNA molecule.
        • GO:0003723
          • Name: RNA binding
          • Definition: Interacting selectively and non-covalently with an RNA molecule or a portion thereof.
        • GO:0000178
          • Name: exosome (RNase complex)
          • Definition: A ribonuclease complex that has 3-prime to 5-prime exoribonuclease activity and possibly endoribonuclease activity, producing 5-prime-phosphomonoesters. Participates in a multitude of cellular RNA processing and degradation events preventing nuclear export and/or translation of aberrant RNAs. Restricted to processing linear and circular single-stranded RNAs (ssRNA) only. RNAs with complex secondary structures may have to be unwound or pre-processed by co-factors prior to entering the complex, esp if the 3-prime end is structured.
  4. Using YEASTRACT to Infer which Transcription Factors Regulate a Cluster of Genes
    1. The gene list for Profile 48 was pasted onto Yeastract and ranked by TF
      • How many transcription factors are green or "significant"?
        • 9 are significant.
      • Are CIN5, GLN3, and/or HAP4 on the list? If so, what is their "% in user set", "% in YEASTRACT", and "p value".
        • CIN5: 16.47%, p value= 0.999999881503972; GLN3: 35.29%, p value= 0.276481708725189; HAP4: 14.51%, p value= 0.597297513212687
    2. A regulation matrix was created from the top 15 transcription factors with the highest p-values and the transcription factors GLN3, HAP4, and CIN5 using the Yeastract Gene Regulation Matrix
  5. Visualizing Your Gene Regulatory Networks with GRNsight
    1. The regulation matrix was reformatted into an adjacency matrix
    2. The rows and columns of the adjacency matrix were sorted in alphabetical order
    3. The data was visualized with a grid by uploading the adjacency matrix onto GRNsight

Results

Data and Files

dHAP4 Data Sheet

dHAP4 Stem Data

dHAP4 Gene List

dHAP4 GO List

dHAP4 Slides

dHAP4 profile 48 Regulation Matrix

Conclusion

In this study, yeast cells (saccharomyces cerevisiae) were exposed to a cold shock to monitor how their gene expression levels related to environmental temperatures. The ANOVA suggest that approximately 40.1% of the collected data is statistically significant and had p values less than 0.05. However, given the large number of data entries used in the study, it is not unreasonable to assume that some of the significant results are due to chance, regardless of whether the p values were less than 0.05. Nonetheless, 4.5% of the data contained p values less than 0.0001, substantiating the significance of at least some of the data. Thus, the database suggests that there is a relationship between the expression levels of certain genes of saccharomyces cerevisiae and the cold shock treatments. Of the most significant stem profiles, Profile 48 was used for further analysis. GRNsight was used to analyze the relationship between the transcription factors associated with Profile 48. Most of the transcription factors were associated with one another creating a network.

Main Page

Ymesfin

Week 1

Week 2

Week 3

Week 4

Week 5

Week 6

Week 7

Week 8

Week 9

Week 10

Week 11

Skinny Genes

Week 12

Week 13

Week 14

Week 15

Acknowledgements

Dr. Kam Dahlquist; Professor

Naomi Tesfaiohannes; Homework Partner

David Ramirez; Homework Partner

Except for what is noted above, this individual journal entry was completed by me and not copied from another source.

Ymesfin (talk) 17:31, 30 October 2019 (PDT)

References

LMU BioDB 2019. (2019). Week 7. Retrieved October 16, 2019, from https://xmlpipedb.cs.lmu.edu/biodb/fall2019/index.php/Week_7

LMU BioDB 2019. (2019). Week 8. Retrieved October 23, 2019, from https://xmlpipedb.cs.lmu.edu/biodb/fall2019/index.php/Week_8

LMU BioDB 2019. (2019). Week 9. Retrieved October 26, 2019, from https://xmlpipedb.cs.lmu.edu/biodb/fall2019/index.php/Week_9

The Gene Ontology Resource. (2019). Retrieved October 26, 2019, from http://geneontology.org

Yeastract. (2019). Retrieved October 29, 2019, from http://www.yeastract.com/index.php

Yeastract Gene Regulation Matrix. (2019). Retrieved October 29, 2019, from http://www.yeastract.com/formregmatrix.php

GRNsight. (2019). Retrieved October 29, 2019, https://dondi.github.io/GRNsight/