Cdomin12 Week 9

From LMU BioDB 2019
Revision as of 12:52, 30 October 2019 by Cdomin12 (talk | contribs) (Using YEASTRACT to Infer which Transcription Factors Regulate a Cluster of Genes (Tuesday, October 29): answer)
Jump to navigation Jump to search

User Page

template: cdomin12

Assignment Page Individual Journal Entries Class Journal
Week 1 cdomin12 Week 1 Class Journal Week 1
Week 2 cdomin12 Week 2 Class Journal Week 2
Week 3 RAD53 / YPL153C Week 3 Class Journal Week 3
Week 4 cdomin12 Week 4 Class Journal Week 4
Week 5 IMG/VR Week 5 Class Journal Week 5
Week 6 cdomin12 Week 6 Class Journal Week 6
Week 7 cdomin12 Week 7 Class Journal Week 7
Week 8 cdomin12 Week 8 Class Journal Week 8
Week 9 cdomin12 Week 9 Class Journal Week 9
Week 10 cdomin12 Week 10 Class Journal Week 10
Week 11 cdomin12 Week 11 Skinny Genes
Week 12/13 Skinny Genes Quality Assurance Skinny Genes
Week 15 Skinny Genes Deliverables Skinny Genes

Purpose

Methods/Results

  1. Viewing and Saving STEM Results
    1. A new window opened called "All STEM Profiles (1)".
      • Clicked on the button that says "Interface Options...". At the bottom of the Interface Options window that appears below where it says "X-axis scale should be:", clicked on the radio button that says "Based on real time". Then closed the Interface Options window.
      • Took a screenshot of this window and pasted it into a PowerPoint presentation to save figures.
    2. Clicked on each of the SIGNIFICANT profiles (the colored ones) to open a window showing a more detailed plot containing all of the genes in that profile.
      • Took a screenshot of each of the individual profile windows and saved the images in PowerPoint presentation.
      • At the bottom of each profile window, there were two yellow buttons "Profile Gene Table" and "Profile GO Table". For each of the profiles, clicked on the "Profile Gene Table" button to see the list of genes belonging to the profile. In the window that appears, clicked on the "Save Table" button and saved the file to your desktop. Made filename descriptive of the contents, e.g. "wt_profile48_genelist.txt".
        • Uploaded these files to the wiki and linked to them on individual journal page. (Note that it will be easier to zip all the files together and uploaded them as one file).
      • For each of the significant profiles, clicked on the "Profile GO Table" to see the list of Gene Ontology terms belonging to the profile. In the window that appears, clicked on the "Save Table" button and save the file to your desktop. Make filename descriptive of the contents, e.g. "wt_profile48_GOlist.txt", where you use "wt", "dGLN3", etc. to indicate the dataset and where you replace the number symbol with the actual profile number. At this point you have saved all of the primary data from the STEM software and it's time to interpret the results!
        • Uploaded these files to the wiki and link to them on your individual journal page.
  2. Analyzing and Interpreting STEM Results
    1. Selected one of the profiles you saved in the previous step for further intepretation of the data. I suggest that you choose one that has a pattern of up- or down-regulated genes at the cold shock timepoints. Each member of your group should choose a different profile. Answer the following:

Why did you select this profile? In other words, why was it interesting to you?

It has an upward and then downward trend, which I thought would be interesting to investigate at the different time points in how it related to cold shock.

How many genes belong to this profile?

77 genes

How many genes were expected to belong to this profile?

36.7 expected genes

What is the p value for the enrichment of genes in this profile?

2.7E-9

      • Opened the GO list file you saved for this profile in Excel. This list shows all of the Gene Ontology terms that are associated with genes that fit this profile. Selected the third row and then chose from the menu Data > Filter > Autofilter. Filtered on the "p-value" column to show only GO terms that have a p value of < 0.05. How many GO terms are associated with this profile at p < 0.05? The GO list also has a column called "Corrected p-value". This correction is needed because the software has performed thousands of significance tests. Filter on the "Corrected p-value" column to show only GO terms that have a corrected p value of < 0.05. How many GO terms are associated with this profile with a corrected p value < 0.05?

How many GO terms are associated with this profile at p < 0.05?

4 GO terms

How many GO terms are associated with this profile with a corrected p value < 0.05?

2 GO terms

      • Selected the top 6 Gene Ontology terms from your filtered list (either p < 0.05 or corrected p < 0.05).
          • Note whether the same GO terms are showing up in multiple clusters.
        • Look up the definitions for each of the terms at http://geneontology.org. In your research presentation, you will discuss the biological interpretation of these GO terms. In other words, why does the cell react to cold shock by changing the expression of genes associated with these GO terms? Also, what does this have to do with the transcription factor being deleted (for the groups working with deletion strain data)?
        • To easily look up the definitions, go to http://geneontology.org.
        • Copied and pasted the GO ID (e.g. GO:0044848) into the search field on the left of the page.
        • In the results page, clicked on the button that says "Link to detailed information about <term>, in this case "biological phase"".
        • The definition will be on the next results page, e.g. here.

Definitions of GO Terms

GO:0008380:"The process of removing sections of the primary RNA transcript to remove sequences not present in the mature form of the RNA and joining the remaining sections to form the mature form of the RNA"

GO:0006397: "Any process involved in the conversion of a primary mRNA transcript into one or more mature mRNA(s) prior to translation into polypeptide"

Using YEASTRACT to Infer which Transcription Factors Regulate a Cluster of Genes (Tuesday, October 29)

  1. Opened the gene list in Excel for the one of the significant profiles from your stem analysis. Chose a cluster with a clear cold shock/recovery up/down or down/up pattern.
    • Copied the list of gene IDs onto your clipboard.
  2. Launched a web browser and go to the YEASTRACT database.
    • On the left panel of the window, clicked on the link to Rank by TF.
    • Pasted list of genes from cluster into the box labeled ORFs/Genes.
    • Checked the box for Check for all TFs.
    • Accepted the defaults for the Regulations Filter (Documented, DNA binding plus expression evidence)
    • Did not apply a filter for "Filter Documented Regulations by environmental condition".
    • Ranked genes by TF using: The % of genes in the list and in YEASTRACT regulated by each TF.
    • Click the Search button.
  3. Answered the following questions:
    • In the results window that appears, the p values colored green are considered "significant", the ones colored yellow are considered "borderline significant" and the ones colored pink are considered "not significant". How many transcription factors are green or "significant"?
    • Copied the table of results from the web page and paste it into a new Excel workbook to preserve the results.
      • Uploaded the Excel file to the wiki and linked to it in your electronic lab notebook.
      • Are CIN5, GLN3, and/or HAP4 on the list? If so, what is their "% in user set", "% in YEASTRACT", and "p value".

How many transcription factors are green or "significant"?

1 is significant

Are CIN5, GLN3, and/or HAP4 on the list? If so, what is their "% in user set", "% in YEASTRACT", and "p value".

Yes, they are all on this list

HAP4

% in user set: 15.79

% in YEASTRACT: 1.10

p value: 0.376430967479230

CIN5

% in user set: 19.74

% in YEASTRACT: 0.69

p value:0.976606065487304

GLN3

% in user set: 46.05

% in YEASTRACT: 1.45

p value: 0.009368736870331

    • Selected from this list of "significant" transcription factors, which ones you will use to run the model. Add GLN3, HAP4, and CIN5 . Explained in your electronic notebook how you decided on which transcription factors to include. Record the list and your justification in your electronic lab notebook. Each group member will select a different network (they can have some overlapping transcription factors, but some should also be different).
    • Go back to the YEASTRACT database and follow the link to Generate Regulation Matrix.
    • Copied and pasted the list of transcription factors you identified (plus HAP4, GLN3, and CIN5) into both the "Transcription factors" field and the "Target ORF/Genes" field.
      • Clicked the "Generate" button.
      • In the results window that appears, clicked on the link to the "Regulation matrix (Semicolon Separated Values (CSV) file)" that appears and save it to your Desktop. Renamed this file with a meaningful name so that you can distinguish it from the other files you will generate.

Visualizing Your Gene Regulatory Networks with GRNsight

We will analyze the regulatory matrix files you generated above in Microsoft Excel and visualize them using GRNsight to determine which one will be appropriate to pursue further in the modeling.

  1. First we need to properly format the output files from YEASTRACT.
    • Open the file in Excel. It will not open properly in Excel because a semicolon was used as the column delimiter instead of a comma. To fix this, Select the entire Column A. Then go to the "Data" tab and select "Text to columns". In the Wizard that appears, select "Delimited" and click "Next". In the next window, select "Semicolon", and click "Next". In the next window, leave the data format at "General", and click "Finish". This should now look like a table with the names of the transcription factors across the top and down the first column and all of the zeros and ones distributed throughout the rows and columns. This is called an "adjacency matrix." If there is a "1" in the cell, that means there is a connection between the trancription factor in that row with that column.
    • Save this file in Microsoft Excel workbook format (.xlsx).
    • For this adjacency matrix to be usable in GRNmap (the modeling software) and GRNsight (the visualization software), we need to transpose the matrix. Insert a new worksheet into your Excel file and name it "network". Go back to the previous sheet and select the entire matrix and copy it. Go to you new worksheet and click on the A1 cell in the upper left. Select "Paste special" from the "Home" tab. In the window that appears, check the box for "Transpose". This will paste your data with the columns transposed to rows and vice versa. This is necessary because we want the transcription factors that are the "regulatORS" across the top and the "regulatEES" along the side.
    • The labels for the genes in the columns and rows need to match. Thus, delete the "p" from each of the gene names in the columns. Adjust the case of the labels to make them all upper case.
    • In cell A1, copy and paste the text "rows genes affected/cols genes controlling".
    • Finally, for ease of working with the adjacency matrix in Excel, we want to alphabatize the gene labels both across the top and side.
      • Select the area of the entire adjacency matrix.
      • Click the Data tab and click the custom sort button.
      • Sort Column A alphabetically, being sure to exclude the header row.
      • Now sort row 1 from left to right, excluding cell A1. In the Custom Sort window, click on the options button and select sort left to right, excluding column 1.
    • Name the worksheet containing your organized adjacency matrix "network" and Save.
  2. Now we will visualize what these gene regulatory networks look like with the GRNsight software.
    • Go to the GRNsight home page.
    • Select the menu item File > Open and select the regulation matrix .xlsx file that has the "network" worksheet in it that you formatted above. If the file has been formatted properly, GRNsight should automatically create a graph of your network. You can click the "Grid Layout" button to arrange the nodes in a grid, or you can click and drag the nodes (genes) around until you get a layout that you like and take a screenshot of the results. Paste it into your PowerPoint presentation.
      • If you have nodes (genes) floating around in the display that are not connected to any other nodes, we need to delete them from the network for the modeling to work properly. Go back to the Excel workbook and network sheet and delete both the row and column with the floating gene's name. Then re-upload the edited file to GRNsight to visualize it. Use this final version in your PowerPoint and subsequent modeling.







How many transcription factors are green or "significant"?

1 is significant

Are CIN5, GLN3, and/or HAP4 on the list? If so, what is their "% in user set", "% in YEASTRACT", and "p value".

Yes, they are all on this list

HAP4

% in user set: 15.79

% in YEASTRACT: 1.10

p value: 0.376430967479230

CIN5

% in user set: 19.74

% in YEASTRACT: 0.69

p value:0.976606065487304

GLN3

% in user set: 46.05

% in YEASTRACT: 1.45

p value: 0.009368736870331

Data & Files

Excel Workbook

Week 9 Files

Week 9 Slide

text file for STEM analysis

Transcription Factors

File Uploaded to GRNsight

Conclusion

Acknowledgments

1. I worked with User:Knguye66, User:Jcowan4, and User:Mavila9 for this assignment.

2."Except for what is noted above, this individual journal entry was completed by me and not copied from another source." Cdomin12 (talk) 19:25, 28 October 2019 (PDT)

References