Imacarae Week 9

From LMU BioDB 2019
Revision as of 14:34, 29 October 2019 by Imacarae (talk | contribs) (Using YEASTRACT to Infer which Transcription Factors Regulate a Cluster of Genes: edited bullet)
Jump to navigation Jump to search

Template

Imacarae's User Page

Assignment Shared Entries Individual Entries
Week 1 Class Journal Week 1 ----
Week 2 Class Journal Week 2 Imacarae Week 2
Week 3 Class Journal Week 3 HSF1/YGL073W Week 3
Week 4 Class Journal Week 4 Imacarae Week 4
Week 5 Class Journal Week 5 CancerSEA Week 5
Week 6 Class Journal Week 6 Imacarae Week 6
Week 7 Class Journal Week 7 Imacarae Week 7
Week 8 Class Journal Week 8 Imacarae Week 8
Week 9 Class Journal Week 9 Imacarae Week 9
Week 10 Class Journal Week 10 Imacarae Week 10
Week 11 Sulfiknights Imacarae Week 11
Week 12/13 Sulfiknights Sulfiknights DA Week 12/13
---- Sulfiknights Sulfiknights DA Week 14

Purpose

  • to conduct the "analyze" step of the data life cycle for a DNA microarray dataset.
  • to "think like a cell" to interpret the clusters and associated Gene Ontology terms.
  • to develop an intuition about gene regulatory networks.
  • to keep a detailed electronic laboratory notebook to facilitate reproducible research.

Methods and Results

  1. Analyzing and Interpreting STEM Results
    1. I selected Profile 45 for further intepretation of the data.
      • Why did you select this profile? In other words, why was it interesting to you?
        - I selected this Profile 45 because it was the first profile on the STEM page. It has the most genes assigned to it out of all of the dCIN5 STEM profiles.
      • How many genes belong to this profile?
        - There are 455 genes assigned to this profile.
      • How many genes were expected to belong to this profile?
        - There are 36.5 genes expected in this profile.
      • What is the p value for the enrichment of genes in this profile? Bear in mind that we just finished computing p values to determine whether each individual gene had a significant change in gene expression at each time point. This p value determines whether the number of genes that show this particular expression profile across the time points is significantly more than expected.
        - The p value for the enrichment of genes in this profile is p=0.00.
      • Open the GO list file you saved for this profile in Excel. This list shows all of the Gene Ontology terms that are associated with genes that fit this profile. Select the third row and then choose from the menu Data > Filter > Autofilter. Filter on the "p-value" column to show only GO terms that have a p value of < 0.05. How many GO terms are associated with this profile at p < 0.05?
        - There are 65 GO terms that are associated with Profile 45 at p<0.05.
      • The GO list also has a column called "Corrected p-value". This correction is needed because the software has performed thousands of significance tests. Filter on the "Corrected p-value" column to show only GO terms that have a corrected p value of < 0.05. How many GO terms are associated with this profile with a corrected p value < 0.05?
        - There are 8 GO terms associated with Profile 45 with a corrected p value < 0.05.
      • Select the top 6 Gene Ontology terms from your filtered list (either p < 0.05 or corrected p < 0.05).
        • Note whether the same GO terms are showing up in multiple clusters.
        • Look up the definitions for each of the terms at http://geneontology.org. In your research presentation, you will discuss the biological interpretation of these GO terms. In other words, why does the cell react to cold shock by changing the expression of genes associated with these GO terms? Also, what does this have to do with the transcription factor being deleted (for the groups working with deletion strain data)?
        • To easily look up the definitions, go to http://geneontology.org.
        • Copy and paste the GO ID (e.g. GO:0044848) into the search field on the left of the page.
        • In the results page, click on the button that says "Link to detailed information about <term>, in this case "biological phase"".
        • The definition will be on the next results page, e.g. here.
        - GO:0032040: "small-subunit processome" which is "a large ribonucleoprotein complex that is an early preribosomal complex"
        - GO:0000466: "maturation of 5.8S rRNA from tricistronic rRNA transcript" which is "any process involved in the maturation of an rRNA molecule originally produced as part of a tricistronic rRNA transcript that contained the Small SubUnit (SSU) rRNA, the 5.8S rRNA, and the Large SubUnit (LSU) rRNA, in that order, from 5' to 3' along the primary transcript"
        - GO:0030488: "tRNA methylation" which is "the posttranscriptional addition of methyl groups to specific residues in a tRNA molecule"
        - GO:0003729: "mRNA binding" which means it is "interacting selectively and non-covalently with messenger RNA (mRNA), [as] an intermediate molecule between DNA and protein. mRNA includes UTR and coding sequences, but does not contain introns"
        - GO:0000462: "maturation of SSU-rRNA from tricistronic rRNA transcript" which is "any process involved in the maturation of a precursor Small SubUnit (SSU) ribosomal RNA (rRNA) molecule into a mature SSU-rRNA molecule from the pre-rRNA molecule originally produced as a tricistronic rRNA transcript that contains the Small Subunit (SSU) rRNA, 5.8S rRNA, and the Large Subunit (LSU) in that order from 5' to 3' along the primary transcript." This is the same as GO:0000466.
        - GO:0003899: "DNA-directed 5'-3' RNA polymerase activity" which is "Catalysis of the reaction: nucleoside triphosphate + RNA(n) = diphosphate + RNA(n+1). Utilizes a DNA template, i.e. the catalysis of DNA-template-directed extension of the 3'-end of an RNA strand by one nucleotide at a time. Can initiate a chain 'de novo'"

Using YEASTRACT to Infer which Transcription Factors Regulate a Cluster of Genes

In the previous analysis using STEM, we found a number of gene expression profiles (aka clusters) which grouped genes based on similarity of gene expression changes over time. The implication is that these genes share the same expression pattern because they are regulated by the same (or the same set) of transcription factors. We will explore this using the YEASTRACT database.

  1. Open the gene list in Excel for the one of the significant profiles from your stem analysis. Choose a cluster with a clear cold shock/recovery up/down or down/up pattern. You should also choose one of the largest clusters.
    • Copy the list of gene IDs onto your clipboard.
  2. Launch a web browser and go to the YEASTRACT database.
    • On the left panel of the window, click on the link to Rank by TF.
    • Paste your list of genes from your cluster into the box labeled ORFs/Genes.
    • Check the box for Check for all TFs.
    • Accept the defaults for the Regulations Filter (Documented, DNA binding plus expression evidence)
    • Do not apply a filter for "Filter Documented Regulations by environmental condition".
    • Rank genes by TF using: The % of genes in the list and in YEASTRACT regulated by each TF.
    • Click the Search button.
  3. Answer the following questions:
    • In the results window that appears, the p values colored green are considered "significant", the ones colored yellow are considered "borderline significant" and the ones colored pink are considered "not significant". How many transcription factors are green or "significant"?
      - There are 17 transcription factors that are significant.
    • Copy the table of results from the web page and paste it into a new Excel workbook to preserve the results.
      • Upload the Excel file to the wiki and link to it in your electronic lab notebook.
      • Are CIN5, GLN3, and/or HAP4 on the list? If so, what is their "% in user set", "% in YEASTRACT", and "p value".
        - CIN5: 15.73% in user set, 3.21% in YEASTRACT, p value = 1.
        - GLN3: 39.78% in user set, 7.34% in YEASTRACT, p value = 0.002677.
        - HAP4: 16.85% in user set, 6.87% in YEASTRACT, p value = 0.1553.
  4. For the mathematical model that we will build, we need to define a gene regulatory network of transcription factors that regulate other transcription factors. We can use YEASTRACT to assist us with creating the network. We want to generate a network with approximately 15-20 transcription factors in it.
    • You need to select from this list of "significant" transcription factors, which ones you will use to run the model. You will use these transcription factors and add GLN3, HAP4, and ZAP1 if they are not in your list. Explain in your electronic notebook how you decided on which transcription factors to include. Record the list and your justification in your electronic lab notebook. Each group member will select a different network (they can have some overlapping transcription factors, but some should also be different).
    • Go back to the YEASTRACT database and follow the link to Generate Regulation Matrix.
    • Copy and paste the list of transcription factors you identified (plus HAP4, GLN3, and ZAP1) into both the "Transcription factors" field and the "Target ORF/Genes" field.
    • We are going to use the "Regulations Filter" options of "Documented", "Only DNA binding evidence"
      • Click the "Generate" button.
      • In the results window that appears, click on the link to the "Regulation matrix (Semicolon Separated Values (CSV) file)" that appears and save it to your Desktop. Rename this file with a meaningful name so that you can distinguish it from the other files you will generate.

Data and Files

Conclusion

Acknowledgments

  • To Dr. Dahlquist for walking us through analyzing the data with Excel.
    -Procedural steps were copied from the [Week 9] assignment page and modified to fit the data.
  • To my group members, DeLisa, Mihir, and Emma. We met during class time to walk through the procedure and to delegate what profile we were going to do individually.

References

http://geneontology.org/