Cdomin12 Week 9

From LMU BioDB 2019
Jump to navigation Jump to search

User Page

template: cdomin12

Assignment Page Individual Journal Entries Class Journal
Week 1 cdomin12 Week 1 Class Journal Week 1
Week 2 cdomin12 Week 2 Class Journal Week 2
Week 3 RAD53 / YPL153C Week 3 Class Journal Week 3
Week 4 cdomin12 Week 4 Class Journal Week 4
Week 5 IMG/VR Week 5 Class Journal Week 5
Week 6 cdomin12 Week 6 Class Journal Week 6
Week 7 cdomin12 Week 7 Class Journal Week 7
Week 8 cdomin12 Week 8 Class Journal Week 8
Week 9 cdomin12 Week 9 Class Journal Week 9
Week 10 cdomin12 Week 10 Class Journal Week 10
Week 11 cdomin12 Week 11 Skinny Genes
Week 12/13 Skinny Genes Quality Assurance Skinny Genes
Week 15 Skinny Genes Deliverables Skinny Genes

Purpose

To further analysis and understand the importance of significant transcription factors and their role in gene expression during and after cold shock.

Methods/Results

  1. Viewing and Saving STEM Results
    1. A new window opened called "All STEM Profiles (1)".
      • Clicked on the button that says "Interface Options...". At the bottom of the Interface Options window that appears below where it says "X-axis scale should be:", clicked on the radio button that says "Based on real time". Then closed the Interface Options window.
      • Took a screenshot of this window and pasted it into a PowerPoint presentation to save figures.
    2. Clicked on each of the SIGNIFICANT profiles (the colored ones) to open a window showing a more detailed plot containing all of the genes in that profile.
      • Took a screenshot of each of the individual profile windows and saved the images in PowerPoint presentation.
      • At the bottom of each profile window, there were two yellow buttons "Profile Gene Table" and "Profile GO Table". For each of the profiles, clicked on the "Profile Gene Table" button to see the list of genes belonging to the profile. In the window that appears, clicked on the "Save Table" button and saved the file to your desktop. Made filename descriptive of the contents, e.g. "wt_profile48_genelist.txt".
        • Uploaded these files to the wiki and linked to them on individual journal page. (Note that it will be easier to zip all the files together and uploaded them as one file).
      • For each of the significant profiles, clicked on the "Profile GO Table" to see the list of Gene Ontology terms belonging to the profile. In the window that appears, clicked on the "Save Table" button and save the file to your desktop. Make filename descriptive of the contents, e.g. "wt_profile48_GOlist.txt", where you use "wt", "dGLN3", etc. to indicate the dataset and where you replace the number symbol with the actual profile number. At this point you have saved all of the primary data from the STEM software and it's time to interpret the results!
        • Uploaded these files to the wiki and link to them on your individual journal page.
  2. Analyzing and Interpreting STEM Results
    1. Selected one of the profiles you saved in the previous step for further intepretation of the data. I suggest that you choose one that has a pattern of up- or down-regulated genes at the cold shock timepoints. Each member of your group should choose a different profile. Answer the following:

Why did you select this profile? In other words, why was it interesting to you?

It has an upward and then downward trend, which I thought would be interesting to investigate at the different time points in how it related to cold shock.

How many genes belong to this profile?

77 genes

How many genes were expected to belong to this profile?

36.7 expected genes

What is the p value for the enrichment of genes in this profile?

2.7E-9

      • Opened the GO list file you saved for this profile in Excel. This list shows all of the Gene Ontology terms that are associated with genes that fit this profile. Selected the third row and then chose from the menu Data > Filter > Autofilter. Filtered on the "p-value" column to show only GO terms that have a p value of < 0.05. How many GO terms are associated with this profile at p < 0.05? The GO list also has a column called "Corrected p-value". This correction is needed because the software has performed thousands of significance tests. Filter on the "Corrected p-value" column to show only GO terms that have a corrected p value of < 0.05. How many GO terms are associated with this profile with a corrected p value < 0.05?

How many GO terms are associated with this profile at p < 0.05?

4 GO terms

How many GO terms are associated with this profile with a corrected p value < 0.05?

2 GO terms

      • Selected the top 6 Gene Ontology terms from your filtered list (either p < 0.05 or corrected p < 0.05).
        • Note whether the same GO terms are showing up in multiple clusters.
        • Look up the definitions for each of the terms at http://geneontology.org. In your research presentation, you will discuss the biological interpretation of these GO terms. In other words, why does the cell react to cold shock by changing the expression of genes associated with these GO terms? Also, what does this have to do with the transcription factor being deleted (for the groups working with deletion strain data)?
        • To easily look up the definitions, go to http://geneontology.org.
        • Copied and pasted the GO ID (e.g. GO:0044848) into the search field on the left of the page.
        • In the results page, clicked on the button that says "Link to detailed information about <term>, in this case "biological phase"".
        • The definition will be on the next results page, e.g. here.

Definitions of GO Terms

GO:0008380:"The process of removing sections of the primary RNA transcript to remove sequences not present in the mature form of the RNA and joining the remaining sections to form the mature form of the RNA"

GO:0006397: "Any process involved in the conversion of a primary mRNA transcript into one or more mature mRNA(s) prior to translation into polypeptide"

Using YEASTRACT to Infer which Transcription Factors Regulate a Cluster of Genes (Tuesday, October 29)

  1. Opened the gene list in Excel for the one of the significant profiles from your stem analysis. Chose a cluster with a clear cold shock/recovery up/down or down/up pattern.
    • Copied the list of gene IDs onto your clipboard.
  2. Launched a web browser and go to the YEASTRACT database.
    • On the left panel of the window, clicked on the link to Rank by TF.
    • Pasted list of genes from cluster into the box labeled ORFs/Genes.
    • Checked the box for Check for all TFs.
    • Accepted the defaults for the Regulations Filter (Documented, DNA binding plus expression evidence)
    • Did not apply a filter for "Filter Documented Regulations by environmental condition".
    • Ranked genes by TF using: The % of genes in the list and in YEASTRACT regulated by each TF.
    • Click the Search button.
  3. Answered the following questions:
    • In the results window that appears, the p values colored green are considered "significant", the ones colored yellow are considered "borderline significant" and the ones colored pink are considered "not significant". How many transcription factors are green or "significant"?
    • Copied the table of results from the web page and paste it into a new Excel workbook to preserve the results.
      • Uploaded the Excel file to the wiki and linked to it in your electronic lab notebook.
      • Are CIN5, GLN3, and/or HAP4 on the list? If so, what is their "% in user set", "% in YEASTRACT", and "p value".

How many transcription factors are green or "significant"?

1 is significant

Are CIN5, GLN3, and/or HAP4 on the list? If so, what is their "% in user set", "% in YEASTRACT", and "p value".

Yes, they are all on this list

HAP4

% in user set: 15.79

% in YEASTRACT: 1.10

p value: 0.376430967479230

CIN5

% in user set: 19.74

% in YEASTRACT: 0.69

p value:0.976606065487304

GLN3

% in user set: 46.05

% in YEASTRACT: 1.45

p value: 0.009368736870331

    • Selected from this list of "significant" transcription factors, which ones you will use to run the model. Add GLN3, HAP4, and CIN5 . Explained in your electronic notebook how you decided on which transcription factors to include. Record the list and your justification in your electronic lab notebook. Each group member will select a different network (they can have some overlapping transcription factors, but some should also be different).

I choose the top 13 transcription factors, which included GLN3. I then added HAP4 and CIN5 so that they could be included.

    • Go back to the YEASTRACT database and follow the link to Generate Regulation Matrix.
    • Copied and pasted the list of transcription factors you identified (plus HAP4, GLN3, and CIN5) into both the "Transcription factors" field and the "Target ORF/Genes" field.
      • Clicked the "Generate" button.
      • In the results window that appears, clicked on the link to the "Regulation matrix (Semicolon Separated Values (CSV) file)" that appears and save it to your Desktop. Renamed this file with a meaningful name so that you can distinguish it from the other files you will generate.

Visualizing Your Gene Regulatory Networks with GRNsight

    • Opened the file in Excel. It will not open properly in Excel because a semicolon was used as the column delimiter instead of a comma. To fix this, Selectd the entire Column A. Then went to the "Data" tab and selected "Text to columns". In the Wizard that appears, selected "Delimited" and click "Next". In the next window, selected "Semicolon", and clicked "Next". In the next window, leave the data format at "General", and click "Finish". This should now look like a table with the names of the transcription factors across the top and down the first column and all of the zeros and ones distributed throughout the rows and columns. This is called an "adjacency matrix." If there is a "1" in the cell, that means there is a connection between the trancription factor in that row with that column.
    • Saved this file in Microsoft Excel workbook format (.xlsx).
    • Inserted a new worksheet into Excel file and named it "network". Went back to the previous sheet and selected the entire matrix and copied it. Went to new worksheet and clicked on the A1 cell in the upper left. Selected "Paste special" from the "Home" tab. In the window that appears, checked the box for "Transpose". This will paste your data with the columns transposed to rows and vice versa.
    • The labels for the genes in the columns and rows need to match. Thus, deleted the "p" from each of the gene names in the columns. Adjusted the case of the labels to make them all upper case.
    • In cell A1, copied and pasted the text "rows genes affected/cols genes controlling".
      • Selected the area of the entire adjacency matrix.
      • Clicked the Data tab and clicked the custom sort button.
      • Sorted Column A alphabetically, being sure to exclude the header row.
      • Now sorted row 1 from left to right, excluding cell A1. In the Custom Sort window, clicked on the options button and selected sort left to right, excluding column 1.
    • Named the worksheet containing your organized adjacency matrix "network" and Save.
    • Went to the GRNsight home page.
    • Selected the menu item File > Open and selected the regulation matrix .xlsx file that has the "network" worksheet in it that you formatted above. If the file has been formatted properly, GRNsight should automatically create a graph of your network. You can click the "Grid Layout" button to arrange the nodes in a grid, or you can click and drag the nodes (genes) around until you get a layout that you like and take a screenshot of the results. Pasted it into PowerPoint presentation.
      • Went back to the Excel workbook and network sheet and deleted both the row and column with the floating gene's name. Then re-uploaded the edited file to GRNsight to visualize it. Used this final version in your PowerPoint and subsequent modeling.

Data & Files

Excel Workbook

Week 9 Files

Week 9 Slide

text file for STEM analysis

Transcription Factors

File Uploaded to GRNsight

Conclusion

I only had 2 significant transcription factors when I ran the data through YEASTRACT. I chose the top 13 transcription factors to analyze with GRNsight, which include Gln3 and added CIN5 and HAP4 for a total of 15 transcription factors. GRNsight graph revealed only a node of HMO1 connected to itself. This gene regulatory network is interesting as there were no other connecting nodes to one another. This would show that HMO1 self regulates and no other transcription factors are shown to regulate others in my dataset.

Acknowledgments

1. I worked with User:Knguye66, User:Jcowan4, and User:Mavila9 for this assignment.

2."Except for what is noted above, this individual journal entry was completed by me and not copied from another source." Cdomin12 (talk) 19:25, 28 October 2019 (PDT)

References

Cdomin12 (talk) 13:16, 30 October 2019 (PDT)