Ntesfaio Week 9

Purpose

The purpose of this week's assignment is to continue from Week 8 of analyzing a microarray dataset and build on the electronic notebook.

Methods

Viewing and Saving Stem Results

powerpoint with all screenshots

DHAP4 gene list

dHAP4 GO list

"All STEM Profiles (1)" window opened.
Each box corresponded to a model expression profile. Colored profiles had a statistically significant number of genes assigned; they were arranged in order from most to least significant p value. Profiles with the same color belonged to the same cluster of profiles. The number in each box was simply an ID number for the profile.

Clicked on the button that said "Interface Options...". At the bottom of the Interface Options window that appears below where it says "X-axis scale should be:", I clicked on the radio button that said "Based on real time". Then closed the Interface Options window.

I took a screenshot of this window (on a PC, I simultaneously pressed the Alt and PrintScreen buttons to save the view in the active window to the clipboard) and pasted it into a PowerPoint presentation to save the figures.

I Clicked on each of the SIGNIFICANT profiles (the colored ones) to open a window showing a more detailed plot containing all of the genes in that profile.

I took a screenshot of each of the individual profile windows and saved the images in my PowerPoint presentation.

At the bottom of each profile window, there were two yellow buttons "Profile Gene Table" and "Profile GO Table". For each of the profiles, I clicked on the "Profile Gene Table" button to see the list of genes belonging to the profile. In the window that appears, I clicked on the "Save Table" button and saved the file to my desktop. I Made a filename descriptive of the contents, e.g. "wt_profile#_genelist.txt", where I replaced the number symbol with the actual profile number.

I Uploaded these files to the wiki and linked to them on my individual journal page. (Note that it will be easier to zip all the files together and upload them as one file).

For each of the significant profiles, I clicked on the "Profile GO Table" to see the list of Gene Ontology terms belonging to the profile. In the window that appeared, I clicked on the "Save Table" button and saved the file to my desktop. I made the filename descriptive of the contents, e.g. "wt_profile#_GOlist.txt", where I used "wt", "dGLN3", etc. to indicate the dataset and where I replaced the number symbol with the actual profile number. At this point I had saved all of the primary data from the STEM software and it was time to interpret the results!

I Uploaded these files to the wiki and linked to them on my individual journal page. (Note that it will be easier to zip all the files together and upload them as one file).

Analyzing and Interpreting STEM results

I Selected one of the profiles I saved in the previous step for further interpretation of the data. Each member of your group should choose a different profile. Answer the following:

I chose to analyze STEM result 45

Why did you select this profile? In other words, why was it interesting to you?

I selected this profile because it had a incline but then stayed steady for a while before declining.

How many genes belong to this profile?

354.0

How many genes were expected to belong to this profile?

44.3 genes were expected

What is the p value for the enrichment of genes in this profile?

p value of 2.9E-201 (significant)

Bear in mind that we just finished computing p values to determine whether each individual gene had a significant change in gene expression at each time point. This p value determines whether the number of genes that show this particular expression profile across the time points is significantly more than expected.

I Opened the GO list file I saved for this profile in Excel. This list showed all of the Gene Ontology terms that are associated with genes that fit this profile. I Selected the third row and then chose from the menu Data > Filter > Autofilter. Filter on the "p-value" column to show only GO terms that have a p value of < 0.05.

How many GO terms are associated with this profile at p < 0.05?

64

The GO list also has a column called "Corrected p-value". This correction is needed because the software has performed thousands of significance tests. Filter on the "Corrected p-value" column to show only GO terms that have a corrected p value of < 0.05.

How many GO terms are associated with this profile with a corrected p value < 0.05?

30

Select the top 6 Gene Ontology terms from your filtered list (either p < 0.05 or corrected p < 0.05).

Top 6 for corrected p < 0.05 are:

GO:0005730

GO:0005355

GO:1904659

GO:0006351

GO:0015761

GO:0015755

Note whether the same GO terms are showing up in multiple clusters.

I Looked up the definitions for each of the terms at http://geneontology.org. In your research presentation, you will discuss the biological interpretation of these GO terms.

In other words, why does the cell react to cold shock by changing the expression of genes associated with these GO terms? Also, what does this have to do with the transcription factor being deleted (for the groups working with deletion strain data)?

To easily look up the definitions, go to http://geneontology.org.

I Copied and pasted the GO ID (e.g. GO:0044848) into the search field on the left of the page.

In the results page, I clicked on the button that says "Link to detailed information about <term>, in this case "biological phase"".

The definition was on the next results page, e.g. here.

GO:0005730

Name: Nucleolus

Definition: A small, dense body one or more of which are present in the nucleus of eukaryotic cells. It is rich in RNA and protein, is not bounded by a limiting membrane, and is not seen during mitosis. Its prime function is the transcription of the nucleolar DNA into 45S ribosomal-precursor RNA, the processing of this RNA into 5.8S, 18S, and 28S components of ribosomal RNA, and the association of these components with 5S RNA and proteins synthesized outside the nucleolus. This association results in the formation of ribonucleoprotein precursors; these pass into the cytoplasm and mature into the 40S and 60S subunits of the ribosome.

GO:0005355

Name: Glucose Transmembrane Transporter Activity

Definition: Enables the transfer of the hexose monosaccharide glucose from one side of a membrane to the other

GO:1904659

Name: glucose transmembrane transport

Definition: The process in which glucose is transported across a membrane.

GO:0006351

Name: transcription, DNA-templated

Definition: The cellular synthesis of RNA on a template of DNA.

GO:0015761

Name: mannose transmembrane transport

Definition: The process in which mannose is transported across a lipid bilayer, from one side of a membrane to the other. Mannose is the aldohexose manno-hexose, the C-2 epimer of glucose. The D-(+)-form is widely distributed in mannans and hemicelluloses and is of major importance in the core oligosaccharide of N-linked oligosaccharides of glycoproteins.

GO:0015755

Name: fructose transmembrane transport

Definition: The directed movement of fructose into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore. Fructose exists in a open chain form or as a ring compound. D-fructose is the sweetest of the sugars and is found free in a large number of fruits and honey.

Using YEASTRACT to Infer which Transcription Factors Regulate a Cluster of Genes (Tuesday, October 29)

I opened the Gene List in Excel for profile 45

I opened YEASTRACT database
I clicked on the link to Rank by TF
I pasted into the box labeled ORFs/Genes
I checked the box for Check for all TFs
I clicked search

How many transcription factors are green or "significant"?

22

I selected the most significant 15-20 genes to run a gene regulatory network

going back to the YEASTRACT database I copied the top 15-20 plus the genes stated below into both the transcription factors and Target ORF/Genes field.

I Selected the regulatory filter option of "Documented", "Only DNA binding evidence"

I clicked generate

I clicked on the link to the "Regulation matrix (Semicolon separated values (CSV) file) and saved to my desktop

Are CIN5, GLN3, and/or HAP4 on the list? If so, what is their "% in user set", "% in YEASTRACT", and "p value".

CIN5p is 13.51% in user set, and 2.61% in scerevisiae. The p value is 9.74 x 10^-13

Gln3p is 38.79% in user set, and 5.60% in scerevisiae. The p value is 1.92 x 10^13

Hap4p is 15.80% in user set, and 5.04% in scerevisiae. The p value is 5.58 x 10^14

Visualizing Your Gene Regulatory Networks with GRNsight

Opened the CSV file into Excel

To fix the semicolons I selected column A, went to Data, then text to columns.

In the table the appeared I clicked "Delimited" and then next

In the next table I selected "Semicolon" and then next

In the next table I selected "General" and then Finish

I saved this file as a excel workbook format

I inserted a new excel file called "Network"

I copied the matrix into the new workbook

I selected "paste special" from the "Home"tab

In the next window I selected "Transpose"

I deleted the p from each of the gene names

In cell A1, I copy and pasted the text "rows genes affected/cols genes controlling"

I next selected the area of the entire matrix

Selected data and clicked the custom sort button

Sorted column A alphabetically

Sorted row 1 from left to right excluding cell A1

I named this "network"

Visualizing gene regulatory networks

I went to the GRNsight homepage

I selected File > Open and selected the regulation matrix .xlsx

I selected the "Grid Layout" and pasted this into my powerpoint presentation

Conclusion

The week 9 assignment continued on STEM results. The stem profile I chose, 45, was the most significant number of genes assigned. Profile 45 was analyzed further by looking at the number of genes present and going over the p values. The GO list (Gene Ontology) and gene list were also saved. The p value had to do with the number of genes present in the profile. The definitions for GO genes were also analyzed to determine what what present in the profile. This week also introduced YEASTRACT to observe if same expression patterns were present in the genes and if they were regulated by the same transcription factors. The GRNsight was also introduced to analyze the regulatory matrix files and determine which one will be pursued in the future. These results are to be built on in the coming week.