Difference between revisions of "Ckaplan Week 10"

From LMU BioDB 2024
Jump to navigation Jump to search
(go terms)
(updating procedure)
Line 4: Line 4:
 
===Procedure:===
 
===Procedure:===
  
Prepare Microarray Data for STEM:
+
To prepare my microarray data file for loading into STEM, I first inserted a new worksheet into my Excel workbook and named it "dgln3_stem". Then, I copied all the data from my "dgln3_ANOVA" worksheet and pasted special > paste values into my new worksheet.
I created a new worksheet named "dGln3_stem" in Excel.
 
I copied data from the "dGln3_ANOVA" worksheet and pasted values into "dGln3_stem".
 
I renamed columns: "Master_Index" to "SPOT", "ID" to "Gene Symbol", and deleted the column "Standard_Name".
 
I filtered data on the B-H corrected p value (> 0.05), deleted irrelevant rows, and retained only significant gene expression changes.
 
I deleted unnecessary columns, leaving only Average Log Fold Change columns for each time point and renamed them.
 
I removed #DIV/0! errors.
 
I saved the spreadsheet as Text (Tab-delimited) (*.txt) after turning on file extensions.
 
  
Setting up STEM:
+
Next, I ensured that the leftmost column had the column header "Master_Index" and renamed it to "SPOT". Column B was named "ID", which I renamed to "Gene Symbol". I deleted the column named "Standard_Name".
I downloaded and extracted the STEM software.
 
I downloaded the Gene Ontology and yeast GO annotations files and placed them in the STEM folder.
 
I launched STEM by double-clicking on stem.jar.
 
In the main STEM interface, I configured settings in sections 1 to 4 as instructed.
 
I ran STEM by clicking the Execute button.
 
Viewing and Saving STEM Results:
 
I reviewed the generated STEM Profiles.
 
I adjusted the X-axis scale to "Based on real time".
 
I took screenshots of significant profiles and saved them in a PowerPoint presentation.
 
I saved gene lists and GO term lists for significant profiles as instructed.
 
Analyzing STEM Results:
 
I chose a significant profile with a clear cold shock/recovery pattern.
 
I examined the number of genes belonging to the profile and the p value for enrichment of genes.
 
I filtered GO terms based on p values and selected 6 significant terms for further analysis.
 
I looked up definitions of selected GO terms on the Gene Ontology website.
 
  
Using YEASTRACT:
+
I filtered the data based on the B-H corrected p-value to be > 0.05, ensuring that only genes with a significant change in expression remained. After filtering, I deleted the rows with non-significant changes by selecting all rows except the header row and right-clicking to choose "Delete Row" from the context menu. Then, I undid the filter.
I copied gene IDs from the chosen profile in Excel.
 
I visited the YEASTRACT database and pasted the gene list.
 
I ranked genes by TF and noted significant transcription factors.
 
Creating and Visualizing Gene Regulatory Network with GRNsight:
 
I selected transcription factors from YEASTRACT results, including GLN3.
 
I loaded the network in GRNsight, ensuring connectivity.
 
I recorded the number of genes and edges.
 
I exported the network image as a PNG and uploaded it to the wiki.
 
Creating GRNmap Input Workbook:
 
I exported data from GRNsight to Excel.
 
I checked sheets for correctness, ensuring adjacency matrix, log2 fold changes, and other parameters.
 
I inserted a new worksheet named "network_weights" and copied the network data.
 
I adjusted optimization parameters as instructed.
 
I saved and uploaded the Excel Workbook to the wiki.
 
  
Creating and Visualizing My Gene Regulatory Network with GRNsight:
+
Following that, I deleted all data columns except for the Average Log Fold Change columns for each timepoint, such as "dgln3_AvgLogFC_t15", etc. I renamed these data columns with just the time and units, like "15m", "30m", etc.
From profile 45, I selected transcription factors from the list of significant ones in YEASTRACT to build a gene regulatory network using GRNsight. I ensured that the network had approximately 15-20 connected transcription factors and exported the network image as a PNG file, uploading it to the wiki and displaying it on my individual journal page.
 
  
Creating the GRNmap Input Workbook:
+
To address any remaining #DIV/0! errors, I opened the Find/Replace dialog, searched for #DIV/0!, and replaced it with nothing. This ensured that no errors were left in the data.
I exported data from GRNsight to Excel for generating the GRNmap input workbook. After exporting, I checked the workbook to ensure it included the required sheets with the specified content, such as the adjacency matrix in the "network" sheet and log2 fold changes in the expression sheets. Additionally, I adjusted the optimization parameters as instructed, created a new worksheet named "network_weights," and transferred the content of the "network" sheet to it. Finally, I saved the workbook, uploaded it to the wiki, and linked it to my individual journal page for further processing by Dr. Dahlquist in the GRNmap modeling software.
+
 
 +
After saving my work, I used Save As to save the spreadsheet as Text (Tab-delimited) (*.txt) and clicked OK to the warnings before closing the file.
 +
 
 +
Moving on to running STEM, I downloaded and extracted the STEM software, then launched the stem.jar file inside the stem folder.
 +
 
 +
In section 1 of the main STEM interface window (Expression Data Info), I browsed and selected my file, chose "No normalization/add 0", and checked the box for Spot IDs included in the data file.
 +
 
 +
For section 2 (Gene Info), I left the default selections for Gene Annotation Source, Cross Reference Source, and Gene Location Source as "User provided". I then browsed and selected the "gene_association.sgd.gz" file from the stem folder for the Gene Annotation File.
 +
 
 +
In section 3 (Options), I made sure the Clustering Method was set to "STEM Clustering Method" and left the defaults for Maximum Number of Model Profiles and Maximum Unit Change in Model Profiles between Time Points.
 +
 
 +
Finally, in section 4 (Execute), I clicked the Execute button to run STEM.
 +
 
 +
Upon completion, I viewed and saved the STEM results. I took screenshots of the "All STEM Profiles" window and individual profile windows, saving them in a PowerPoint presentation.
 +
 
 +
For each significant profile, I saved the gene list and GO terms list.
 +
 
 +
Now, onto analyzing and interpreting the STEM results, I selected a profile with a clear cold shock/recovery pattern for further investigation. I chose profile 45 because it seemed to reflect a significant response to the cold shock conditions and had the largest amount of genes.
 +
 
 +
The number of genes belonging to this profile, as well as the expected number, and the p-value for the enrichment of genes in this profile, were noted.
 +
 
 +
I then opened the GO list file for this profile in Excel, filtered the terms based on p-values and corrected p-values < 0.05, and noted the number of associated GO terms meeting these criteria.
 +
 
 +
Next, I selected six significant GO terms from the filtered list, ensuring they were meaningful in the context of the gene expression profile.
 +
 
 +
I also checked the YEASTRACT database to infer which transcription factors regulate the genes in my selected profile. After pasting my list of genes into the YEASTRACT database, I identified the significant transcription factors and recorded their details, including whether CIN5 or GLN3 were on the list.
 +
 
 +
Lastly, I used GRNsight to create and visualize a gene regulatory network with approximately 15-20 connected transcription factors, including GLN3 and CIN5 if necessary. I recorded the number of genes and edges in the network and exported the network image.
 +
 
 +
For creating the GRNmap input workbook, I exported the data from GRNsight to Excel, ensuring all necessary sheets and data were present and correctly formatted. I inserted a new worksheet named "network_weights" and copied the contents of the "network" sheet into it.
 +
 
 +
Finally, I saved and uploaded my Excel Workbook to the wiki, linking it to my individual journal page for further analysis.
  
 
===Methods/Results:===
 
===Methods/Results:===

Revision as of 11:22, 3 April 2024

Purpose:

This assignment helps us learn microarray data analysis and gene analysis techniques. It also provides practice in determining p-values and organizing data effectively.

Procedure:

To prepare my microarray data file for loading into STEM, I first inserted a new worksheet into my Excel workbook and named it "dgln3_stem". Then, I copied all the data from my "dgln3_ANOVA" worksheet and pasted special > paste values into my new worksheet.

Next, I ensured that the leftmost column had the column header "Master_Index" and renamed it to "SPOT". Column B was named "ID", which I renamed to "Gene Symbol". I deleted the column named "Standard_Name".

I filtered the data based on the B-H corrected p-value to be > 0.05, ensuring that only genes with a significant change in expression remained. After filtering, I deleted the rows with non-significant changes by selecting all rows except the header row and right-clicking to choose "Delete Row" from the context menu. Then, I undid the filter.

Following that, I deleted all data columns except for the Average Log Fold Change columns for each timepoint, such as "dgln3_AvgLogFC_t15", etc. I renamed these data columns with just the time and units, like "15m", "30m", etc.

To address any remaining #DIV/0! errors, I opened the Find/Replace dialog, searched for #DIV/0!, and replaced it with nothing. This ensured that no errors were left in the data.

After saving my work, I used Save As to save the spreadsheet as Text (Tab-delimited) (*.txt) and clicked OK to the warnings before closing the file.

Moving on to running STEM, I downloaded and extracted the STEM software, then launched the stem.jar file inside the stem folder.

In section 1 of the main STEM interface window (Expression Data Info), I browsed and selected my file, chose "No normalization/add 0", and checked the box for Spot IDs included in the data file.

For section 2 (Gene Info), I left the default selections for Gene Annotation Source, Cross Reference Source, and Gene Location Source as "User provided". I then browsed and selected the "gene_association.sgd.gz" file from the stem folder for the Gene Annotation File.

In section 3 (Options), I made sure the Clustering Method was set to "STEM Clustering Method" and left the defaults for Maximum Number of Model Profiles and Maximum Unit Change in Model Profiles between Time Points.

Finally, in section 4 (Execute), I clicked the Execute button to run STEM.

Upon completion, I viewed and saved the STEM results. I took screenshots of the "All STEM Profiles" window and individual profile windows, saving them in a PowerPoint presentation.

For each significant profile, I saved the gene list and GO terms list.

Now, onto analyzing and interpreting the STEM results, I selected a profile with a clear cold shock/recovery pattern for further investigation. I chose profile 45 because it seemed to reflect a significant response to the cold shock conditions and had the largest amount of genes.

The number of genes belonging to this profile, as well as the expected number, and the p-value for the enrichment of genes in this profile, were noted.

I then opened the GO list file for this profile in Excel, filtered the terms based on p-values and corrected p-values < 0.05, and noted the number of associated GO terms meeting these criteria.

Next, I selected six significant GO terms from the filtered list, ensuring they were meaningful in the context of the gene expression profile.

I also checked the YEASTRACT database to infer which transcription factors regulate the genes in my selected profile. After pasting my list of genes into the YEASTRACT database, I identified the significant transcription factors and recorded their details, including whether CIN5 or GLN3 were on the list.

Lastly, I used GRNsight to create and visualize a gene regulatory network with approximately 15-20 connected transcription factors, including GLN3 and CIN5 if necessary. I recorded the number of genes and edges in the network and exported the network image.

For creating the GRNmap input workbook, I exported the data from GRNsight to Excel, ensuring all necessary sheets and data were present and correctly formatted. I inserted a new worksheet named "network_weights" and copied the contents of the "network" sheet into it.

Finally, I saved and uploaded my Excel Workbook to the wiki, linking it to my individual journal page for further analysis.

Methods/Results:

Media: BIOL367_S24_microarray-data_dGLN3CKAS31211.xlsx

Media:Updated_Pvalues_ckaplan.pdf

Media:Andrew&Charlotte_Tables_Gene-GoData.zip

Media:AS&CK_BIOL367_S24_STEM_PHOTOS_dGLN3.pptx

Media:Yeastract_45_Gene_CK.xlsx

Media:GRN_(Yeastmine_-_SGD__2024-03-19;_13_genes,_21_edges)_weighted_(2)_ck_45.xlsx

Media:CKgenegraph.jpg

Media: Go45ckaplan.xlsx

How many GO terms are less than 0.05? -Started with 160- after filtering 72.

  • Why did you select this profile? In other words, why was it interesting to you?

I selected profile 45 because I thought it was interesting because out of all off our profiles, it had the most genes.

  • How many genes belong to this profile?

406

  • How many genes were expected to belong to this profile?

29.9

  • What is the p value for the enrichment of genes in this profile?

0.00

I have 44 green genes

Gln3p 46.65% Cin5p 31.27%

Acknowledgments

I worked with Andrew in and out of class throughout the week. Dr. Dahlquist assisted us in class.

References

Dahlquist, K. Master_sheet_dGLN3.

LMU BioDB 2024. (2024). Week 9. Retrieved Mar 20, 2024, from https://xmlpipedb.cs.lmu.edu/biodb/Spring2024/index.php/Week_9

LMU BioDB 2024. (2024). Week 9. Retrieved Mar 31, 2024, from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_10



Assignment Pages

Individual Journal Entry Pages

Shared Journal Entry Pages