Difference between revisions of "Eyoung20 journal week 9"
(→# Analyzing and Interpreting STEM Results: adding information) |
(→References) |
||
(17 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
==Purpose== | ==Purpose== | ||
+ | The purpose of this activity was to link cell clusters to the associated gene ontology terms, and to created a mapping of the gene regulatory networks. | ||
+ | |||
==Method== | ==Method== | ||
=== Clustering and GO Term Enrichment with stem (part 2)=== | === Clustering and GO Term Enrichment with stem (part 2)=== | ||
Line 32: | Line 34: | ||
## The STEM attempted to run the data for 2 hours with no success. | ## The STEM attempted to run the data for 2 hours with no success. | ||
##* '''''This was the stopping point for the Week 8 assignment.''''' | ##* '''''This was the stopping point for the Week 8 assignment.''''' | ||
− | + | # '''Viewing and Saving STEM Results''' | |
## A new window was opened and called "All STEM Profiles (1)". Each box corresponded to a model expression profile. Colored profiles had a statistically significant number of genes assigned; they were arranged in order from most to least significant p value. Profiles with the same color belonged to the same cluster of profiles. The number that was in each box was simply an ID number for the profile. | ## A new window was opened and called "All STEM Profiles (1)". Each box corresponded to a model expression profile. Colored profiles had a statistically significant number of genes assigned; they were arranged in order from most to least significant p value. Profiles with the same color belonged to the same cluster of profiles. The number that was in each box was simply an ID number for the profile. | ||
##* Clicked on the button that said "Interface Options...". At the bottom of the Interface Options window appeared below where it says "X-axis scale should be:", clicked on the radio button that said "Based on real time". Then the Interface Options window was closed. | ##* Clicked on the button that said "Interface Options...". At the bottom of the Interface Options window appeared below where it says "X-axis scale should be:", clicked on the radio button that said "Based on real time". Then the Interface Options window was closed. | ||
Line 64: | Line 66: | ||
##** The definition will be on the next results page, e.g. [http://amigo.geneontology.org/amigo/term/GO:0044848 here]. | ##** The definition will be on the next results page, e.g. [http://amigo.geneontology.org/amigo/term/GO:0044848 here]. | ||
##* '''''This is the stopping point for the [[Week 8]] assignment and the beginning point for Week 9''''' | ##* '''''This is the stopping point for the [[Week 8]] assignment and the beginning point for Week 9''''' | ||
− | + | # '''Viewing and Saving STEM Results''' | |
## A new window will open called "All STEM Profiles (1)". Each box corresponds to a model expression profile. Colored profiles have a statistically significant number of genes assigned; they are arranged in order from most to least significant p value. Profiles with the same color belong to the same cluster of profiles. The number in each box is simply an ID number for the profile. | ## A new window will open called "All STEM Profiles (1)". Each box corresponds to a model expression profile. Colored profiles have a statistically significant number of genes assigned; they are arranged in order from most to least significant p value. Profiles with the same color belong to the same cluster of profiles. The number in each box is simply an ID number for the profile. | ||
##* Click on the button that says "Interface Options...". At the bottom of the Interface Options window that appears below where it says "X-axis scale should be:", click on the radio button that says "Based on real time". Then close the Interface Options window. | ##* Click on the button that says "Interface Options...". At the bottom of the Interface Options window that appears below where it says "X-axis scale should be:", click on the radio button that says "Based on real time". Then close the Interface Options window. | ||
Line 71: | Line 73: | ||
##* Take a screenshot of each of the individual profile windows and save the images in your PowerPoint presentation. | ##* Take a screenshot of each of the individual profile windows and save the images in your PowerPoint presentation. | ||
##* At the bottom of each profile window, there are two yellow buttons "Profile Gene Table" and "Profile GO Table". For each of the profiles, click on the "Profile Gene Table" button to see the list of genes belonging to the profile. In the window that appears, click on the "Save Table" button and save the file to your desktop. Make your filename descriptive of the contents, e.g. "wt_profile#_genelist.txt", where you replace the number symbol with the actual profile number. | ##* At the bottom of each profile window, there are two yellow buttons "Profile Gene Table" and "Profile GO Table". For each of the profiles, click on the "Profile Gene Table" button to see the list of genes belonging to the profile. In the window that appears, click on the "Save Table" button and save the file to your desktop. Make your filename descriptive of the contents, e.g. "wt_profile#_genelist.txt", where you replace the number symbol with the actual profile number. | ||
− | |||
− | |||
##** Upload these files to the wiki and link to them on your individual journal page. (Note that it will be easier to [[Week_4#Compressing_and_Decompressing_Files_with_7-Zip | zip all the files together]] and upload them as one file). | ##** Upload these files to the wiki and link to them on your individual journal page. (Note that it will be easier to [[Week_4#Compressing_and_Decompressing_Files_with_7-Zip | zip all the files together]] and upload them as one file). | ||
##* For each of the significant profiles, click on the "Profile GO Table" to see the list of Gene Ontology terms belonging to the profile. In the window that appears, click on the "Save Table" button and save the file to your desktop. Make your filename descriptive of the contents, e.g. "wt_profile#_GOlist.txt", where you use "wt", "dGLN3", etc. to indicate the dataset and where you replace the number symbol with the actual profile number. At this point you have saved all of the primary data from the STEM software and it's time to interpret the results! | ##* For each of the significant profiles, click on the "Profile GO Table" to see the list of Gene Ontology terms belonging to the profile. In the window that appears, click on the "Save Table" button and save the file to your desktop. Make your filename descriptive of the contents, e.g. "wt_profile#_GOlist.txt", where you use "wt", "dGLN3", etc. to indicate the dataset and where you replace the number symbol with the actual profile number. At this point you have saved all of the primary data from the STEM software and it's time to interpret the results! | ||
##** Upload these files to the wiki and link to them on your individual journal page. (Note that it will be easier to [[Week_4#Compressing_and_Decompressing_Files_with_7-Zip | zip all the files together]] and upload them as one file). | ##** Upload these files to the wiki and link to them on your individual journal page. (Note that it will be easier to [[Week_4#Compressing_and_Decompressing_Files_with_7-Zip | zip all the files together]] and upload them as one file). | ||
− | + | ||
+ | # '''Analyzing and Interpreting STEM Results''' | ||
## Select '''''one''''' of the profiles you saved in the previous step for further intepretation of the data. I suggest that you choose one that has a pattern of up- or down-regulated genes at the cold shock timepoints. '''''Each member of your group should choose a different profile.''''' Answer the following: | ## Select '''''one''''' of the profiles you saved in the previous step for further intepretation of the data. I suggest that you choose one that has a pattern of up- or down-regulated genes at the cold shock timepoints. '''''Each member of your group should choose a different profile.''''' Answer the following: | ||
##* '''''Why did you select this profile? In other words, why was it interesting to you?''''' | ##* '''''Why did you select this profile? In other words, why was it interesting to you?''''' | ||
Line 95: | Line 96: | ||
##** In the [http://amigo.geneontology.org/amigo/medial_search?q=GO%3A0044848 results] page, click on the button that says "Link to detailed information about <term>, in this case "biological phase"". | ##** In the [http://amigo.geneontology.org/amigo/medial_search?q=GO%3A0044848 results] page, click on the button that says "Link to detailed information about <term>, in this case "biological phase"". | ||
##** The definition will be on the next results page, e.g. [http://amigo.geneontology.org/amigo/term/GO:0044848 here]. | ##** The definition will be on the next results page, e.g. [http://amigo.geneontology.org/amigo/term/GO:0044848 here]. | ||
− | GO:0008150 | + | ##***GO:0008150: dopamine neurotransmitter receptor activity - Combining with the neurotransmitter dopamine and activating adenylate cyclase via coupling to Gi/Go to initiate a change in cell activity. |
− | GO:0003674 | + | ##***GO:0003674: nuclear-transcribed mRNA catabolic process - The chemical reactions and pathways resulting in the breakdown of the transcript body of a nuclear-transcribed mRNA with stalls in translation elongation |
− | GO:0005575 | + | ##***GO:0005575: dopamine neurotransmitter receptor activity - Combining with the neurotransmitter dopamine and activating adenylate cyclase via coupling to Gi/Go to initiate a change in cell activity. |
− | GO:0007049 | + | ##***GO:0007049: Cell cycle - The progression of biochemical and morphological phases and events that occur in a cell during successive cell replication or nuclear replication events. Canonically, the cell cycle comprises the replication and segregation of genetic material followed by the division of the cell, but in endocycles or syncytial cells nuclear replication or nuclear division may not be followed by cell division. |
− | GO:0051301 | + | ##***GO:0051301: Cell Division - The process resulting in division and partitioning of components of a cell to form more cells; may or may not be accompanied by the physical separation of a cell into distinct, individually membrane-bounded daughter cells. |
− | GO:0016021 | + | ##***GO:0016021: integral component of membrane - The component of a membrane consisting of the gene products and protein complexes having at least some part of their peptide sequence embedded in the hydrophobic region of the membrane. |
Line 157: | Line 158: | ||
#* Select the menu item File > Open and select the regulation matrix .xlsx file that has the "network" worksheet in it that you formatted above. If the file has been formatted properly, GRNsight should automatically create a graph of your network. You can click the "Grid Layout" button to arrange the nodes in a grid, or you can click and drag the nodes (genes) around until you get a layout that you like and take a screenshot of the results. Paste it into your PowerPoint presentation. | #* Select the menu item File > Open and select the regulation matrix .xlsx file that has the "network" worksheet in it that you formatted above. If the file has been formatted properly, GRNsight should automatically create a graph of your network. You can click the "Grid Layout" button to arrange the nodes in a grid, or you can click and drag the nodes (genes) around until you get a layout that you like and take a screenshot of the results. Paste it into your PowerPoint presentation. | ||
#** If you have nodes (genes) floating around in the display that are not connected to any other nodes, we need to delete them from the network for the modeling to work properly. Go back to the Excel workbook and network sheet and delete both the row and column with the floating gene's name. Then re-upload the edited file to GRNsight to visualize it. Use this final version in your PowerPoint and subsequent modeling. | #** If you have nodes (genes) floating around in the display that are not connected to any other nodes, we need to delete them from the network for the modeling to work properly. Go back to the Excel workbook and network sheet and delete both the row and column with the floating gene's name. Then re-upload the edited file to GRNsight to visualize it. Use this final version in your PowerPoint and subsequent modeling. | ||
+ | #**There was a formatting error with the .xslx sheet which did not allow GRNsight to work with the data. No matter how many times the steps were redone the same error occurred. some outside assistance is required to help resolve this error. | ||
+ | <!--Dr. Dahlquist please help--!> | ||
==== Creating the GRNmap Input Workbook ==== | ==== Creating the GRNmap Input Workbook ==== | ||
Line 288: | Line 291: | ||
==Conclusion== | ==Conclusion== | ||
+ | The STEM results profile #2 was selected out of the profiles. This profile was selected due to the fact that it demonstrated both down and up trends at different cold shock time points. There are 88 genes in this profile. Of those 88 genes only 34 genes were expected to belong to this profile. The p-value for the profile was 1.0x10^-15, which means that there was a significant difference between the amount of genes that were expected and the amount that occurred. Of the GO list for this profile there were only 6 that had a p-value less than 0.05 ; GO:0008150, GO:0003674, GO:0005575, GO:0007049, GO:0051301, and GO:0016021. Only 3 of these had a corrected p-value less than 0.05; GO:0008150, GO:0003674, GO:0005575. The Gene Ontology terms found were GO:0008150: dopamine neurotransmitter receptor activity, GO:0003674: nuclear-transcribed mRNA catabolic process, GO:0005575: dopamine neurotransmitter receptor activity, GO:0007049: Cell cycle, GO:0051301: Cell Division, and GO:0016021: integral component of membrane. For the yeast track data there were only 5 significant transcription factors. Then the data for CIN5, GLN3, HAP4 was: | ||
+ | CIN5 22.99% in user set, 0.92% in yeast tract, p-value = 0.9253997 | ||
+ | GLN3 42.53% in user set, 1.54% in yeast tract, p-value = 0.0338494 | ||
+ | HAP4 11.49% in user set, 0.93% in yeast tract, p-value = 0.7968973 | ||
+ | Form this sheet the genes chosen for for visualizing the gene network on GRNsight were; Yap1, Pdr3, Pdr1, Rpn4, Gcn4, YGR067C, Nrg2, Crz1, Met31, Tec1, Sum1, Gzf3, Mss11, Xbp1, Zms1, GLN3, HAP4, and CIN5. These genes were chosen from the 5 significant genes from the yeasttract results and then the next 10 results with the lowest p-values. Then the genes GLN3, HAP4, CIN5 were added making the total 18 genes. There was a formatting error with the .xslx sheet which did not allow GRNsight to work with the data. No matter how many times the steps were redone the same error occurred. some outside assistance is required to help resolve this error. So the visualizing of the gene network failed. | ||
+ | |||
==Acknowledgements== | ==Acknowledgements== | ||
+ | I would like to acknowledge my home work partners s Ivy, Delisa, and Mihir with there help with checking our work and helping to answer questions about the procedure. I would like to acknowledge that the methods were taken from week 8 and then adapted to fit the actual lab methods preformed. I would like to acknowledge Dr. Dahlquist for her instruction on the topic and the procedure. "Except for what is noted above, this individual journal entry was completed by me and not copied from another source." [[User:Eyoung20|Eyoung20]] ([[User talk:Eyoung20|talk]]) 22:23, 30 October 2019 (PDT) | ||
+ | |||
==References== | ==References== | ||
+ | LMU BioDB 2019. (2019). Week 9. Retrieved October 30, 2019, from https://xmlpipedb.cs.lmu.edu/biodb/fall2019/index.php/Week_9 | ||
+ | |||
{{Template:eyoung20}} | {{Template:eyoung20}} |
Latest revision as of 21:24, 30 October 2019
Purpose
The purpose of this activity was to link cell clusters to the associated gene ontology terms, and to created a mapping of the gene regulatory networks.
Method
Clustering and GO Term Enrichment with stem (part 2)
- prepared microarray data file for loading into STEM.
- Inserted a new worksheet into the Excel workbook, and it was named "dCIN5_stem".
- Selected all of the data from "dCIN5_ANOVA" worksheet and Pasted special > paste values into "dCIN5_stem" worksheet.
- the leftmost column had the column header "Master_Index". The column was renamed "SPOT". Column B was named "ID". Then was renamed "Gene Symbol". The column named "Standard_Name" was deleted.
- The data was filtered on the B-H corrected p value to be > 0.05.
- After the data was filtered, all of the rows (except for the header row) were selected and deleted by right-clicking and choosing "Delete Row" from the context menu. The filter was removed. This ensured that only the genes with a "significant" change in expression were clustered and not the noise.
- All of the data columns EXCEPT for the Average Log Fold change columns for each timepoint were deleted (for example, dCIN5_AvgLogFC_t15, etc.).
- Renamed the data columns with just the time and units (for example, 15m, 30m, etc.).
- The progress was saved. Then Save As was used to save this spreadsheet as Text (Tab-delimited) (*.txt). Okayed the warnings and the file was closed.
- Then the STEM software was downloaded and extracted. Click here to go to the STEM web site.
- The download link was clicked and the file
stem.zip
was downloaded to the desktop. - The file was unzipped. In Seaver 120, the file icon was right clicked on and the menu item 7-zip > Extract Here was selected.
- A folder called
stem
was created due to those actions.- Gene Ontology and yeast GO annotations were download and placed them in the folder.
- This link was clicked "gene_ontology.obo" and downloaded.
- This link was clicked "gene_association.sgd.gz" and downloaded.
- Inside the folder,
stem.jar
was double-clicked to launch the STEM program.
- The download link was clicked and the file
- Running STEM
- In section 1 (Expression Data Info) of the the main STEM interface window, the Browse... button was selected to navigate and The file was selected.
- The radio button was clicked No normalization/add 0.
- The box next to Spot IDs included in the data file was selected.
- In section 2 (Gene Info) of the main STEM interface window, the default selection for the three drop-down menu selections for Gene Annotation Source , Cross Reference Source, and Gene Location Source were left as "User provided".
- The "Browse..." button to the right of the "Gene Annotation File" item was selected. The "stem" folder was opened and the file "gene_association.sgd.gz" was selected and Opened.
- In section 3 (Options) of the main STEM interface window, the Clustering Method was confirmed to say "STEM Clustering Method" and the defaults for Maximum Number of Model Profiles or Maximum Unit Change in Model Profiles between Time Points were not changed.
- In section 4 (Execute) the yellow Execute button was clicked to run STEM.
- The Error for #DIV/0! appeared on the screen.
- The excel file was reopened, The Find/Replace dialog was opened. #DIV/0! was searched for, nothing was placed in the replace field. "Replace all" was clicked and all the #DIV/0! errors were removed. The file was reserved and the Stem was run again.
- The STEM attempted to run the data for 2 hours with no success.
- This was the stopping point for the Week 8 assignment.
- In section 1 (Expression Data Info) of the the main STEM interface window, the Browse... button was selected to navigate and The file was selected.
- Viewing and Saving STEM Results
- A new window was opened and called "All STEM Profiles (1)". Each box corresponded to a model expression profile. Colored profiles had a statistically significant number of genes assigned; they were arranged in order from most to least significant p value. Profiles with the same color belonged to the same cluster of profiles. The number that was in each box was simply an ID number for the profile.
- Clicked on the button that said "Interface Options...". At the bottom of the Interface Options window appeared below where it says "X-axis scale should be:", clicked on the radio button that said "Based on real time". Then the Interface Options window was closed.
- A screenshot of this window was taken (and pasted it into a PowerPoint presentation to save the figures.
- Each of the SIGNIFICANT profiles was clicked and opened a window which showed a more detailed plot that contained all of the genes in that profile.
- A screenshot was taken of each of the individual profile windows and saved the images in the PowerPoint presentation.
- At the bottom of each profile window, there were two yellow buttons "Profile Gene Table" and "Profile GO Table". For each of the profiles, the "Profile Gene Table" button was clicked to access the list of genes that belonged to the profile. In the window that appeared, the "Save Table" button was clicked and the file was saved to the desktop. The file was named dCIN5_profile#2_genelist.txt
- These files were uploaded to the wiki and linked to the individual journal (add individual journal.
- For each of the significant profiles, the "Profile GO Table" was clicked to see the list of Gene Ontology terms that belonged to the profile. In the window that appearred, the "Save Table" button was clicked and the file was saved to the computer desktop. The file was named dCIN5_profile#2_GOlist.txt.
- These files were saved to the wiki and linked to them to (a individual journal page.)
- A new window was opened and called "All STEM Profiles (1)". Each box corresponded to a model expression profile. Colored profiles had a statistically significant number of genes assigned; they were arranged in order from most to least significant p value. Profiles with the same color belonged to the same cluster of profiles. The number that was in each box was simply an ID number for the profile.
- Analyzing and Interpreting STEM Results
- Selected one of the profiles that were saved in the previous step for further intepretation of the data.. Each member of the group choose a different profile. Answered the following:
- Why did you select this profile? In other words, why was it interesting to you?
- This profile was selected due to the fact that it demonstrated both down and up trends at different cold shock time points.
- How many genes belong to this profile?
- There are 88 genes in this profile.
- How many genes were expected to belong to this profile?
- Only 34 genes were expected to belong to this profile.
- What is the p value for the enrichment of genes in this profile? Bear in mind that we just finished computing p values to determine whether each individual gene had a significant change in gene expression at each time point. This p value determines whether the number of genes that show this particular expression profile across the time points is significantly more than expected.
- The p-value was 1.0x10^-15, which means that there was a significant difference between the amount of genes that were expected and the amount that occurred.
- • The GO list file was opened and saved for this profile in Excel. This list showed all of the Gene Ontology terms that were associated with genes that fit in this profile. The third row was selected and then choose from the menu Data > Filter > Autofilter. Filter on the "p-value" column to show only GO terms that have a p value of < 0.05. How many GO terms are associated with this profile at p < 0.05? The GO list also had a column called "Corrected p-value". This correction was needed because the software had performed thousands of significance tests. Filtered on the "Corrected p-value" column to show only GO terms that have a corrected p value of < 0.05. How many GO terms are associated with this profile with a corrected p value < 0.05?
- Why did you select this profile? In other words, why was it interesting to you?
- Selected one of the profiles that were saved in the previous step for further intepretation of the data.. Each member of the group choose a different profile. Answered the following:
^ genes have a p-value less than 0.05
- Selected 6 Gene Ontology terms from the filtered list (either p < 0.05 or corrected p < 0.05).
- 0008150, 0003674, 0005575, 0007049, 0051301, and 0016021
- Each member of the group reported on there own cluster in the research presentation. Care was taken to choose terms that were the most significant, but that were also not too redundant.
- Note whether the same GO terms are showing up in multiple clusters.
- Look up the definitions for each of the terms at http://geneontology.org. In your research presentation, you will discuss the biological interpretation of these GO terms. In other words, why does the cell react to cold shock by changing the expression of genes associated with these GO terms? Also, what does this have to do with the transcription factor being deleted (for the groups working with deletion strain data)?
- To easily look up the definitions, go to http://geneontology.org.
- Copy and paste the GO ID (e.g. GO:0044848) into the search field on the left of the page.
- In the results page, click on the button that says "Link to detailed information about <term>, in this case "biological phase"".
- The definition will be on the next results page, e.g. here.
- This is the stopping point for the Week 8 assignment and the beginning point for Week 9
- Selected 6 Gene Ontology terms from the filtered list (either p < 0.05 or corrected p < 0.05).
- Viewing and Saving STEM Results
- A new window will open called "All STEM Profiles (1)". Each box corresponds to a model expression profile. Colored profiles have a statistically significant number of genes assigned; they are arranged in order from most to least significant p value. Profiles with the same color belong to the same cluster of profiles. The number in each box is simply an ID number for the profile.
- Click on the button that says "Interface Options...". At the bottom of the Interface Options window that appears below where it says "X-axis scale should be:", click on the radio button that says "Based on real time". Then close the Interface Options window.
- Take a screenshot of this window (on a PC, simultaneously press the
Alt
andPrintScreen
buttons to save the view in the active window to the clipboard) and paste it into a PowerPoint presentation to save your figures.
- Click on each of the SIGNIFICANT profiles (the colored ones) to open a window showing a more detailed plot containing all of the genes in that profile.
- Take a screenshot of each of the individual profile windows and save the images in your PowerPoint presentation.
- At the bottom of each profile window, there are two yellow buttons "Profile Gene Table" and "Profile GO Table". For each of the profiles, click on the "Profile Gene Table" button to see the list of genes belonging to the profile. In the window that appears, click on the "Save Table" button and save the file to your desktop. Make your filename descriptive of the contents, e.g. "wt_profile#_genelist.txt", where you replace the number symbol with the actual profile number.
- Upload these files to the wiki and link to them on your individual journal page. (Note that it will be easier to zip all the files together and upload them as one file).
- For each of the significant profiles, click on the "Profile GO Table" to see the list of Gene Ontology terms belonging to the profile. In the window that appears, click on the "Save Table" button and save the file to your desktop. Make your filename descriptive of the contents, e.g. "wt_profile#_GOlist.txt", where you use "wt", "dGLN3", etc. to indicate the dataset and where you replace the number symbol with the actual profile number. At this point you have saved all of the primary data from the STEM software and it's time to interpret the results!
- Upload these files to the wiki and link to them on your individual journal page. (Note that it will be easier to zip all the files together and upload them as one file).
- A new window will open called "All STEM Profiles (1)". Each box corresponds to a model expression profile. Colored profiles have a statistically significant number of genes assigned; they are arranged in order from most to least significant p value. Profiles with the same color belong to the same cluster of profiles. The number in each box is simply an ID number for the profile.
- Analyzing and Interpreting STEM Results
- Select one of the profiles you saved in the previous step for further intepretation of the data. I suggest that you choose one that has a pattern of up- or down-regulated genes at the cold shock timepoints. Each member of your group should choose a different profile. Answer the following:
- Why did you select this profile? In other words, why was it interesting to you?
- How many genes belong to this profile?
- How many genes were expected to belong to this profile?
- What is the p value for the enrichment of genes in this profile? Bear in mind that we just finished computing p values to determine whether each individual gene had a significant change in gene expression at each time point. This p value determines whether the number of genes that show this particular expression profile across the time points is significantly more than expected.
- Open the GO list file you saved for this profile in Excel. This list shows all of the Gene Ontology terms that are associated with genes that fit this profile. Select the third row and then choose from the menu Data > Filter > Autofilter. Filter on the "p-value" column to show only GO terms that have a p value of < 0.05. How many GO terms are associated with this profile at p < 0.05?
- 6 GO terms had a p-value <0.05 GO:0008150, GO:0003674, GO:0005575, GO:0007049, GO:0051301, and GO:0016021
- The GO list also has a column called "Corrected p-value". This correction is needed because the software has performed thousands of significance tests. Filter on the "Corrected p-value" column to show only GO terms that have a corrected p value of < 0.05. How many GO terms are associated with this profile with a corrected p value < 0.05?
- 3 GO terms had a corrected p-value <0.05: GO:0008150, GO:0003674, GO:0005575
- Select 6 Gene Ontology terms from your filtered list (either p < 0.05 or corrected p < 0.05).
- GO:0008150, GO:0003674, GO:0005575, GO:0007049, GO:0051301, and GO:0016021
- Each member of the group will be reporting on his or her own cluster in your research presentation. You should take care to choose terms that are the most significant, but that are also not too redundant. For example, "RNA metabolism" and "RNA biosynthesis" are redundant with each other because they mean almost the same thing.
- Note whether the same GO terms are showing up in multiple clusters.
- Look up the definitions for each of the terms at http://geneontology.org. In your research presentation, you will discuss the biological interpretation of these GO terms. In other words, why does the cell react to cold shock by changing the expression of genes associated with these GO terms? Also, what does this have to do with the transcription factor being deleted (for the groups working with deletion strain data)?
- To easily look up the definitions, go to http://geneontology.org.
- Copy and paste the GO ID (e.g. GO:0044848) into the search field on the left of the page.
- In the results page, click on the button that says "Link to detailed information about <term>, in this case "biological phase"".
- The definition will be on the next results page, e.g. here.
- GO:0008150: dopamine neurotransmitter receptor activity - Combining with the neurotransmitter dopamine and activating adenylate cyclase via coupling to Gi/Go to initiate a change in cell activity.
- GO:0003674: nuclear-transcribed mRNA catabolic process - The chemical reactions and pathways resulting in the breakdown of the transcript body of a nuclear-transcribed mRNA with stalls in translation elongation
- GO:0005575: dopamine neurotransmitter receptor activity - Combining with the neurotransmitter dopamine and activating adenylate cyclase via coupling to Gi/Go to initiate a change in cell activity.
- GO:0007049: Cell cycle - The progression of biochemical and morphological phases and events that occur in a cell during successive cell replication or nuclear replication events. Canonically, the cell cycle comprises the replication and segregation of genetic material followed by the division of the cell, but in endocycles or syncytial cells nuclear replication or nuclear division may not be followed by cell division.
- GO:0051301: Cell Division - The process resulting in division and partitioning of components of a cell to form more cells; may or may not be accompanied by the physical separation of a cell into distinct, individually membrane-bounded daughter cells.
- GO:0016021: integral component of membrane - The component of a membrane consisting of the gene products and protein complexes having at least some part of their peptide sequence embedded in the hydrophobic region of the membrane.
- Select one of the profiles you saved in the previous step for further intepretation of the data. I suggest that you choose one that has a pattern of up- or down-regulated genes at the cold shock timepoints. Each member of your group should choose a different profile. Answer the following:
Using YEASTRACT to Infer which Transcription Factors Regulate a Cluster of Genes (Tuesday, Feb. 19)
In the previous analysis using STEM, we found a number of gene expression profiles (aka clusters) which grouped genes based on similarity of gene expression changes over time. The implication is that these genes share the same expression pattern because they are regulated by the same (or the same set) of transcription factors. We will explore this using the YEASTRACT database.
- O1. Opened the gene list in Excel for profile 2 from your stem analysis. Choose a cluster with a clear cold shock/recovery up/down or down/up pattern. You should also choose one of the largest clusters
- Copied the list of gene IDs onto your clipboard.
- A web browser was opened and went to the YEASTRACT database.
- On the left panel of the window, click on the link to Rank by TF.
- Paste your list of genes from your cluster into the box labeled ORFs/Genes.
- Check the box for Check for all TFs.
- Accept the defaults for the Regulations Filter (Documented, DNA binding plus expression evidence)
- Do not apply a filter for "Filter Documented Regulations by environmental condition".
- Rank genes by TF using: The % of genes in the list and in YEASTRACT regulated by each TF.
- Click the Search button.
- Answer the following questions:
- In the results window that appears, the p values colored green are considered "significant", the ones colored yellow are considered "borderline significant" and the ones colored pink are considered "not significant". How many transcription factors are green or "significant"?
- 5 are considered significant
- Copy the table of results from the web page and paste it into a new Excel workbook to preserve the results.
- Upload the Excel file to OWW or Box and link to it in your electronic lab notebook.
- Are CIN5, GLN3, and/or HAP4 on the list? If so, what is their "% in user set", "% in YEASTRACT", and "p value".
- CIN5 22.99% in user set, 0.92% in yeast tract, p-value = 0.9253997
- GLN3 42.53% in user set, 1.54% in yeast tract, p-value = 0.0338494
- HAP4 11.49% in user set, 0.93% in yeast tract, p-value = 0.7968973
- In the results window that appears, the p values colored green are considered "significant", the ones colored yellow are considered "borderline significant" and the ones colored pink are considered "not significant". How many transcription factors are green or "significant"?
- For the mathematical model that we will build, we need to define a gene regulatory network of transcription factors that regulate other transcription factors. We can use YEASTRACT to assist us with creating the network. We want to generate a network with approximately 15-20 transcription factors in it.
- You need to select from this list of "significant" transcription factors, which ones you will use to run the model. You will use these transcription factors and add GLN3, HAP4, and ZAP1 if they are not in your list. Explain in your electronic notebook how you decided on which transcription factors to include. Record the list and your justification in your electronic lab notebook. Each group member will select a different network (they can have some overlapping transcription factors, but some should also be different).
- genes chosen: Yap1, Pdr3, Pdr1, Rpn4, Gcn4, YGR067C, Nrg2, Crz1, Met31, Tec1, Sum1, Gzf3, Mss11, Xbp1, Zms1, GLN3, HAP4, CIN5.
- these genes were chosen from the 5 significant genes from the yeasttract results and then the next 10 results with the lowest p-values. Then the genes GLN3, HAP4, CIN5 were added making the total 18 genes.
- Go back to the YEASTRACT database and follow the link to Generate Regulation Matrix.
- Copy and paste the list of transcription factors you identified (plus HAP4, GLN3, and ZAP1) into both the "Transcription factors" field and the "Target ORF/Genes" field.
- We are going to use the "Regulations Filter" options of "Documented", "Only DNA binding evidence"
- Click the "Generate" button.
- In the results window that appears, click on the link to the "Regulation matrix (Semicolon Separated Values (CSV) file)" that appears and save it to your Desktop. Rename this file with a meaningful name so that you can distinguish it from the other files you will generate.
Visualizing Your Gene Regulatory Networks with GRNsight
We will analyze the regulatory matrix files you generated above in Microsoft Excel and visualize them using GRNsight to determine which one will be appropriate to pursue further in the modeling.
- First we need to properly format the output files from YEASTRACT.
- Open the file in Excel. It will not open properly in Excel because a semicolon was used as the column delimiter instead of a comma. To fix this, Select the entire Column A. Then go to the "Data" tab and select "Text to columns". In the Wizard that appears, select "Delimited" and click "Next". In the next window, select "Semicolon", and click "Next". In the next window, leave the data format at "General", and click "Finish". This should now look like a table with the names of the transcription factors across the top and down the first column and all of the zeros and ones distributed throughout the rows and columns. This is called an "adjacency matrix." If there is a "1" in the cell, that means there is a connection between the trancription factor in that row with that column.
- Save this file in Microsoft Excel workbook format (.xlsx).
- For this adjacency matrix to be usable in GRNmap (the modeling software) and GRNsight (the visualization software), we need to transpose the matrix. Insert a new worksheet into your Excel file and name it "network". Go back to the previous sheet and select the entire matrix and copy it. Go to you new worksheet and click on the A1 cell in the upper left. Select "Paste special" from the "Home" tab. In the window that appears, check the box for "Transpose". This will paste your data with the columns transposed to rows and vice versa. This is necessary because we want the transcription factors that are the "regulatORS" across the top and the "regulatEES" along the side.
- The labels for the genes in the columns and rows need to match. Thus, delete the "p" from each of the gene names in the columns. Adjust the case of the labels to make them all upper case.
- In cell A1, copy and paste the text "rows genes affected/cols genes controlling".
- Finally, for ease of working with the adjacency matrix in Excel, we want to alphabatize the gene labels both across the top and side.
- Select the area of the entire adjacency matrix.
- Click the Data tab and click the custom sort button.
- Sort Column A alphabetically, being sure to exclude the header row.
- Now sort row 1 from left to right, excluding cell A1. In the Custom Sort window, click on the options button and select sort left to right, excluding column 1.
- Name the worksheet containing your organized adjacency matrix "network" and Save.
- Now we will visualize what these gene regulatory networks look like with the GRNsight software.
- Go to the GRNsight home page.
- Select the menu item File > Open and select the regulation matrix .xlsx file that has the "network" worksheet in it that you formatted above. If the file has been formatted properly, GRNsight should automatically create a graph of your network. You can click the "Grid Layout" button to arrange the nodes in a grid, or you can click and drag the nodes (genes) around until you get a layout that you like and take a screenshot of the results. Paste it into your PowerPoint presentation.
- If you have nodes (genes) floating around in the display that are not connected to any other nodes, we need to delete them from the network for the modeling to work properly. Go back to the Excel workbook and network sheet and delete both the row and column with the floating gene's name. Then re-upload the edited file to GRNsight to visualize it. Use this final version in your PowerPoint and subsequent modeling.
- There was a formatting error with the .xslx sheet which did not allow GRNsight to work with the data. No matter how many times the steps were redone the same error occurred. some outside assistance is required to help resolve this error.
Data and files
media:Gene_Stem_Run_1_screenshot_ERY.pptx
Conclusion
The STEM results profile #2 was selected out of the profiles. This profile was selected due to the fact that it demonstrated both down and up trends at different cold shock time points. There are 88 genes in this profile. Of those 88 genes only 34 genes were expected to belong to this profile. The p-value for the profile was 1.0x10^-15, which means that there was a significant difference between the amount of genes that were expected and the amount that occurred. Of the GO list for this profile there were only 6 that had a p-value less than 0.05 ; GO:0008150, GO:0003674, GO:0005575, GO:0007049, GO:0051301, and GO:0016021. Only 3 of these had a corrected p-value less than 0.05; GO:0008150, GO:0003674, GO:0005575. The Gene Ontology terms found were GO:0008150: dopamine neurotransmitter receptor activity, GO:0003674: nuclear-transcribed mRNA catabolic process, GO:0005575: dopamine neurotransmitter receptor activity, GO:0007049: Cell cycle, GO:0051301: Cell Division, and GO:0016021: integral component of membrane. For the yeast track data there were only 5 significant transcription factors. Then the data for CIN5, GLN3, HAP4 was: CIN5 22.99% in user set, 0.92% in yeast tract, p-value = 0.9253997 GLN3 42.53% in user set, 1.54% in yeast tract, p-value = 0.0338494 HAP4 11.49% in user set, 0.93% in yeast tract, p-value = 0.7968973 Form this sheet the genes chosen for for visualizing the gene network on GRNsight were; Yap1, Pdr3, Pdr1, Rpn4, Gcn4, YGR067C, Nrg2, Crz1, Met31, Tec1, Sum1, Gzf3, Mss11, Xbp1, Zms1, GLN3, HAP4, and CIN5. These genes were chosen from the 5 significant genes from the yeasttract results and then the next 10 results with the lowest p-values. Then the genes GLN3, HAP4, CIN5 were added making the total 18 genes. There was a formatting error with the .xslx sheet which did not allow GRNsight to work with the data. No matter how many times the steps were redone the same error occurred. some outside assistance is required to help resolve this error. So the visualizing of the gene network failed.
Acknowledgements
I would like to acknowledge my home work partners s Ivy, Delisa, and Mihir with there help with checking our work and helping to answer questions about the procedure. I would like to acknowledge that the methods were taken from week 8 and then adapted to fit the actual lab methods preformed. I would like to acknowledge Dr. Dahlquist for her instruction on the topic and the procedure. "Except for what is noted above, this individual journal entry was completed by me and not copied from another source." Eyoung20 (talk) 22:23, 30 October 2019 (PDT)
References
LMU BioDB 2019. (2019). Week 9. Retrieved October 30, 2019, from https://xmlpipedb.cs.lmu.edu/biodb/fall2019/index.php/Week_9