Difference between revisions of "Knguye66 Week 9"

From LMU BioDB 2019
Jump to navigation Jump to search
(add headers and format and fill in information)
 
(Data and Files: add file)
 
(29 intermediate revisions by 2 users not shown)
Line 6: Line 6:
  
 
==== Methods/Results ====
 
==== Methods/Results ====
After finishing the steps on [[Week 8]] for Statistical Analysis Part I: ANOVA on Microsoft Excel, a quick sanity check was performed for the p-value dataset. Following the filtering of the data, with a p-value of less than 0.05, 2528 of 6189 records were found. This means, in a p-value of less than 0.05, there is a 5% chance that 2528 out of 6189 records would be found.
+
After finishing the steps on [[Week 9]] for Viewing and Saving STEM results on stem.jar, screenshots of each profile were placed on a powerpoint (see Data and Files below), and the Profile Gene Table and Profile GO Table list for each table were compressed and zipped into files. Following this step, Profile #45 was chosen to analyze and interpret.  
  
 
---------------------------  
 
---------------------------  
Sanity Check Questions:
+
Analyzing and Interpreting STEM Results:
  
-Unadjusted p-value-
+
- Profile #45 -
  
# How many genes have p<0.05? and what is the percentage (out of 6189)?
+
# Why did you select this profile? In other words, why was it interesting to you?
#* 2528, 40.8%
+
#* I chose this model expression profile because the average log before t=60 is above the x-axis and after t=60, it is below the x-axis. Of all the colored profiles, Profile #45 has the most significant p-value.
# How many genes have p<0.01? and what is the percentage (out of 6189)?
+
# How many genes belong to this profile?
#* 1652, 26.7%
+
#* 549.0 genes were assigned to this profile.
# How any genes have p<0.001? and what is the percentage (out of 6189)?
+
# How many genes were expected to belong to this profile?
#* 919, 14.8%
+
#* 47.1 genes were expected to belong to this profile.
# How many genes have p<0.0001? and what is the percentage (out of 6189)?
+
# What is the p value for the enrichment of genes in this profile?
#* 496, 8.0%
+
#* p-value = 0.00 (significant)
 +
# How many GO terms are associated with this profile at p < 0.05?
 +
#* 81 of 202 records found.  
 +
# How many GO terms are associated with this profile with a corrected p value < 0.05?
 +
#* 7 of 202 records found.
 +
# Look up the definitions for each of the terms at http://geneontology.org. In your research presentation, you will discuss the biological interpretation of these GO terms. In other words, why does the cell react to cold shock by changing the expression of genes associated with these GO terms? Also, what does this have to do with the transcription factor being deleted (for the groups working with deletion strain data)?
 +
#* Top 6 Gene Ontology (corrected p-value < 0.05): only 7 out of 202 records found, all clustered in Row 28-34.  
 +
#** '''GO:0000178''' (exosome, RNase complex), ontology: cellular component
 +
#*** A ribonuclease complex that has 3-prime to 5-prime exoribonuclease activity and possibly endoribonuclease activity, producing 5-prime-phosphomonoesters. Participates in a multitude of cellular RNA processing and degradation events preventing nuclear export and/or translation of aberrant RNAs. Restricted to processing linear and circular single-stranded RNAs (ssRNA) only. RNAs with complex secondary structures may have to be unwound or pre-processed by co-factors prior to entering the complex, esp if the 3-prime end is structured. (Source: PMID:26726035, PMID:17174896, PMID:20531386)
 +
#** '''GO:0070478''' (nuclear-transcribed mRNA catabolic process, 3'-5' exonucleolytic nonsense-mediated decay), ontology: biological process
 +
#*** The chemical reactions and pathways resulting in the breakdown of the nuclear-transcribed mRNA transcript body of an mRNA in which an amino-acid codon has changed to a nonsense codon; occurs when the 3' end is not protected by a 3'-poly(A) tail; degradation proceeds in the 3' to 5' direction. (Source: PMID:12769863)
 +
#** '''GO:0004004''': no definition found.
 +
#** '''GO:0030515''' (snoRNA binding), ontology: molecular function
 +
#*** Interacting selectively and non-covalently with small nucleolar RNA. (Source: GOC:mah)
 +
#** '''GO:0000055''' (ribosomal large subunit export from nucleus), ontology: biological process
 +
#*** The directed movement of a ribosomal large subunit from the nucleus into the cytoplasm. (Source: GOC:mah)
 +
#** '''GO:0032543''' (mitochondrial translation), ontology: biological process
 +
#*** The chemical reactions and pathways resulting in the formation of a protein in a mitochondrion. This is a ribosome-mediated process in which the information in messenger RNA (mRNA) is used to specify the sequence of amino acids in the protein; the mitochondrion has its own ribosomes and transfer RNAs, and uses a genetic code that differs from the nuclear code. (Source: GOC:go_curators)
 +
#** '''GO:0071035''' (nuclear polyadenylation-dependent rRNA catabolic process), ontology: biological process
 +
#*** The chemical reactions and pathways occurring in the nucleus and resulting in the breakdown of a ribosomal RNA (rRNA) molecule, including RNA fragments released as part of processing the primary transcript into multiple mature rRNA species, initiated by the enzymatic addition of a sequence of adenylyl residues (polyadenylation) at the 3' end the target rRNA. (Source: PMID:17652137, GOC:krc, PMID:15173578, PMID:18591258, PMID:15935758, GOC:dgf, PMID:15572680)
  
-Bonferroni & Benjamini and Hochberg p-value-
+
Using YEASTRACT to Infer which Transcription Factors Regulate a Cluster of Genes:
  
# How any genes are p<0.05 for the Bonferroni-corrected p-value? and what is the percentage (out of 6189)?
+
- Profile #45 -  
#* 248, 4.0%
 
# How any genes are p <0.05 for the Benjamini and Hochberg-corrected p-value? and what is the percentage (out of 6189)?
 
#* 1822, 29.4%
 
  
-NSR1-
+
# How many transcription factors are green or "significant"?
# What is its unadjusted, Bonferroni-corrected, and B-H-corrected p-values?
+
#* 19 transcription factors are green.
#* Unadjusted: 2.86939E-10
+
# Are CIN5, GLN3, and/or HAP4 are not on the green list? If so, what is their "% in user set", "% in YEASTRACT", and "p value".
#* Bonferroni: 1.77586E-06
+
#* CIN5, GLN3, and HAP4 are on the table, but not on the green list (instead, on the pink).
#* B-H: 8.87932E-07
+
#** '''CIN5:'''
#* Average Log fold:
+
#*** % user set: 16.82%
#** t15: 3.279225
+
#*** % in YEASTRACT: 4.17%
#** t30: 3.621
+
#*** p-value: 0.999999999999975
#** t60: 3.526525
+
#** '''GLN3:'''
#** t90: -2.04985
+
#*** % user set: 37.34%
#** t120: -0.60622
+
#*** % in YEASTRACT: 8.38%
 +
#*** p-value: 0.031507090452519
 +
#** '''HAP4:'''
 +
#*** % user set: 15.71%
 +
#*** % in YEASTRACT: 7.79%
 +
#*** p-value: 0.361027186229152
  
-Favorite Gene (YDR090C)-
+
To build the mathematical model, the list of transcription factors that were chosen were the first 20 on the Excel sheet from the YEASTRACT database table. The top 20 were chosen (including CIN5, GLN3, and HAP4) because they were ranked from the most significant to least. This would then give significant results to analyze later.
#* Unadjusted: 0.519632074
+
 
#* Bonferroni: 3216.002905
+
Visualizing Your Gene Regulatory Networks with GRNsight:
#* B-H: 0.641276751
 
#* Average Log fold:
 
#** t15: -0.067
 
#** t30: 0.51286
 
#** t60: 0.5417
 
#** t90: -0.27156
 
#** t120: -0.06704
 
  
 
==== Data and Files ====
 
==== Data and Files ====
  
* [[Media: Wt_genelist_kn.zip | Profile Genelist (zip file)
+
* [[Media:File:BIOL367 F19 KN slide.pptx | One-way ANOVA (wt), STEM Results, Profiles, & GRNsight (.pptx)]]
 +
* [[Media:Wt_profile-45_GOlist_kn.xlsx | Profile#45 GOlist, Genelist, Yeastract rank tf, & Gene regulatory network (Excel)]]
 +
* [[Media:Regulation_Matrix_profile45.xlsx | Regulation Matrix (adjacency matrix)]]
 +
* [[Media:RegulationMatrix_Documented_2019117_050_1154628891.xlsx | Kaitlyn's new matrix]]
  
 
==== Conclusion ====
 
==== Conclusion ====
 
+
The above methods/results, as well as, the Data and Files lsited above finish the tasks for the interim deadline of Tuesday, October 29, 2019 at 12:01am.
  
 
== Acknowledgements ==
 
== Acknowledgements ==
Line 63: Line 80:
 
This section is in acknowledgement to partner Christina Dominguez ([[User:Cdomin12]]), as well as, Marcus Avila ([[User:Mavila9]]) and Jonar Cowan ([[User:Jcowan4]]). I would also like to acknowledge Dr. Dahlquist ([[User:KDahlquist]]) for introducing and teaching the topic and direction of this assignment.  
 
This section is in acknowledgement to partner Christina Dominguez ([[User:Cdomin12]]), as well as, Marcus Avila ([[User:Mavila9]]) and Jonar Cowan ([[User:Jcowan4]]). I would also like to acknowledge Dr. Dahlquist ([[User:KDahlquist]]) for introducing and teaching the topic and direction of this assignment.  
  
"Except for what is noted above, this individual journal entry was completed by me and not copied from another source."  
+
"Except for what is noted above, this individual journal entry was completed by me and not copied from another source." [[User:Knguye66|Knguye66]] [[User:Knguye66|Knguye66]] ([[User talk:Knguye66|talk]]) 21:07, 28 October 2019 (PDT)
  
 
{{Template:knguye66}}
 
{{Template:knguye66}}
Line 70: Line 87:
  
 
* GEO Accession viewer. (n.d.). Retrieved from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE83656.
 
* GEO Accession viewer. (n.d.). Retrieved from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE83656.
* Dahlquist, K. (2019, October 17). Week 8. In Wikipedia, Biological Databases. Retrieved 11:14, October 21, 2019, from https://xmlpipedb.cs.lmu.edu/biodb/fall2019/index.php/Week_8
+
* Dahlquist, K. (2019, October 24). Week 9. In Wikipedia, Biological Databases. Retrieved 08:02, October 28, 2019, from https://xmlpipedb.cs.lmu.edu/biodb/fall2019/index.php/Week_9
 +
* Gene Ontology. (n.d.). Retrieved from https://lmu.app.box.com/s/t8i5s1z1munrcfxzzs7nv7q2edsktxgl
 +
* Gene Association. (n.d.) Retrieved from https://lmu.app.box.com/s/zlr1s8fjogfssa1wl59d5shyybtm1d49
 +
* The Short Time-series Expression Miner (STEM) version 1.3.12. (n.d.). Retrieved from http://www.cs.cmu.edu/~jernst/stem/
 +
* Geneontology (Unifying Biology) (n.d.). Retrieved from http://geneontology.org/

Latest revision as of 17:17, 6 November 2019

Microrray Data Analysis (wild type data)

Purpose

This week's assignment is to continue the Microarray Data Anaylsis for wild type data. Step 4 and 5 from Week 9 are featured: Viewing and Saving STEM results, and Analyzing and Interpreting STEM results. For reference and files, go to: knguye66 Week 8.

Methods/Results

After finishing the steps on Week 9 for Viewing and Saving STEM results on stem.jar, screenshots of each profile were placed on a powerpoint (see Data and Files below), and the Profile Gene Table and Profile GO Table list for each table were compressed and zipped into files. Following this step, Profile #45 was chosen to analyze and interpret.


Analyzing and Interpreting STEM Results:

- Profile #45 -

  1. Why did you select this profile? In other words, why was it interesting to you?
    • I chose this model expression profile because the average log before t=60 is above the x-axis and after t=60, it is below the x-axis. Of all the colored profiles, Profile #45 has the most significant p-value.
  2. How many genes belong to this profile?
    • 549.0 genes were assigned to this profile.
  3. How many genes were expected to belong to this profile?
    • 47.1 genes were expected to belong to this profile.
  4. What is the p value for the enrichment of genes in this profile?
    • p-value = 0.00 (significant)
  5. How many GO terms are associated with this profile at p < 0.05?
    • 81 of 202 records found.
  6. How many GO terms are associated with this profile with a corrected p value < 0.05?
    • 7 of 202 records found.
  7. Look up the definitions for each of the terms at http://geneontology.org. In your research presentation, you will discuss the biological interpretation of these GO terms. In other words, why does the cell react to cold shock by changing the expression of genes associated with these GO terms? Also, what does this have to do with the transcription factor being deleted (for the groups working with deletion strain data)?
    • Top 6 Gene Ontology (corrected p-value < 0.05): only 7 out of 202 records found, all clustered in Row 28-34.
      • GO:0000178 (exosome, RNase complex), ontology: cellular component
        • A ribonuclease complex that has 3-prime to 5-prime exoribonuclease activity and possibly endoribonuclease activity, producing 5-prime-phosphomonoesters. Participates in a multitude of cellular RNA processing and degradation events preventing nuclear export and/or translation of aberrant RNAs. Restricted to processing linear and circular single-stranded RNAs (ssRNA) only. RNAs with complex secondary structures may have to be unwound or pre-processed by co-factors prior to entering the complex, esp if the 3-prime end is structured. (Source: PMID:26726035, PMID:17174896, PMID:20531386)
      • GO:0070478 (nuclear-transcribed mRNA catabolic process, 3'-5' exonucleolytic nonsense-mediated decay), ontology: biological process
        • The chemical reactions and pathways resulting in the breakdown of the nuclear-transcribed mRNA transcript body of an mRNA in which an amino-acid codon has changed to a nonsense codon; occurs when the 3' end is not protected by a 3'-poly(A) tail; degradation proceeds in the 3' to 5' direction. (Source: PMID:12769863)
      • GO:0004004: no definition found.
      • GO:0030515 (snoRNA binding), ontology: molecular function
        • Interacting selectively and non-covalently with small nucleolar RNA. (Source: GOC:mah)
      • GO:0000055 (ribosomal large subunit export from nucleus), ontology: biological process
        • The directed movement of a ribosomal large subunit from the nucleus into the cytoplasm. (Source: GOC:mah)
      • GO:0032543 (mitochondrial translation), ontology: biological process
        • The chemical reactions and pathways resulting in the formation of a protein in a mitochondrion. This is a ribosome-mediated process in which the information in messenger RNA (mRNA) is used to specify the sequence of amino acids in the protein; the mitochondrion has its own ribosomes and transfer RNAs, and uses a genetic code that differs from the nuclear code. (Source: GOC:go_curators)
      • GO:0071035 (nuclear polyadenylation-dependent rRNA catabolic process), ontology: biological process
        • The chemical reactions and pathways occurring in the nucleus and resulting in the breakdown of a ribosomal RNA (rRNA) molecule, including RNA fragments released as part of processing the primary transcript into multiple mature rRNA species, initiated by the enzymatic addition of a sequence of adenylyl residues (polyadenylation) at the 3' end the target rRNA. (Source: PMID:17652137, GOC:krc, PMID:15173578, PMID:18591258, PMID:15935758, GOC:dgf, PMID:15572680)

Using YEASTRACT to Infer which Transcription Factors Regulate a Cluster of Genes:

- Profile #45 -

  1. How many transcription factors are green or "significant"?
    • 19 transcription factors are green.
  2. Are CIN5, GLN3, and/or HAP4 are not on the green list? If so, what is their "% in user set", "% in YEASTRACT", and "p value".
    • CIN5, GLN3, and HAP4 are on the table, but not on the green list (instead, on the pink).
      • CIN5:
        •  % user set: 16.82%
        •  % in YEASTRACT: 4.17%
        • p-value: 0.999999999999975
      • GLN3:
        •  % user set: 37.34%
        •  % in YEASTRACT: 8.38%
        • p-value: 0.031507090452519
      • HAP4:
        •  % user set: 15.71%
        •  % in YEASTRACT: 7.79%
        • p-value: 0.361027186229152

To build the mathematical model, the list of transcription factors that were chosen were the first 20 on the Excel sheet from the YEASTRACT database table. The top 20 were chosen (including CIN5, GLN3, and HAP4) because they were ranked from the most significant to least. This would then give significant results to analyze later.

Visualizing Your Gene Regulatory Networks with GRNsight:

Data and Files

Conclusion

The above methods/results, as well as, the Data and Files lsited above finish the tasks for the interim deadline of Tuesday, October 29, 2019 at 12:01am.

Acknowledgements

This section is in acknowledgement to partner Christina Dominguez (User:Cdomin12), as well as, Marcus Avila (User:Mavila9) and Jonar Cowan (User:Jcowan4). I would also like to acknowledge Dr. Dahlquist (User:KDahlquist) for introducing and teaching the topic and direction of this assignment.

"Except for what is noted above, this individual journal entry was completed by me and not copied from another source." Knguye66 Knguye66 (talk) 21:07, 28 October 2019 (PDT)

User Page

User:knguye66

Template Page

Template:knguye66


Table of all assignments and journal entries for BIO-367-01

Week Individual Journal Entry Shared Journal
Week 1 - Class Journal Week 1
Week 2 knguye66 Week 2 Class Journal Week 2
Week 3 ILT1/YDR090C Week 3 Class Journal Week 3
Week 4 knguye66 Week 4 Class Journal Week 4
Week 5 DrugCentral Week 5 Class Journal Week 5
Week 6 knguye66 Week 6 Class Journal Week 6
Week 7 knguye66 Week 7 Class Journal Week 7
Week 8 knguye66 Week 8 Class Journal Week 8
Week 9 knguye66 Week 9 Class Journal Week 9
Week 10 knguye66 Week 10 Class Journal Week 10
Week 11 knguye66 Week 11 FunGals
Week 12/13 knguye66 Eyoung20 Week 12/13 FunGals
Week 15 knguye66 Eyoung20 Week 15 Class Journal Week 15

References