Difference between revisions of "Ckaplan Week 10"

From LMU BioDB 2024
Jump to navigation Jump to search
(creating question)
(defining go terms)
Line 96: Line 96:
  
 
*Select 6 Gene Ontology terms and give definitions:
 
*Select 6 Gene Ontology terms and give definitions:
 +
 +
GO:0000027 "The aggregation, arrangement and bonding together of constituent RNAs and proteins to form the large ribosomal subunit." Source: GOC:jl, PMID:30467428
 +
 +
GO:0034475 "Any process involved in forming the mature 3' end of a U4 snRNA molecule." Source: GOC:mah
 +
 +
GO:0003676 "Binding to a nucleic acid." Source: GOC:jl
 +
 +
GO:0019843 "Binding to a ribosomal RNA." Source: GOC:jl
 +
 +
GO:0005654 "That part of the nuclear content other than the chromosomes or the nucleolus." Source: GOC:ma, ISBN:0124325653
 +
 +
GO:0000055 "The directed movement of a ribosomal large subunit from the nucleus into the cytoplasm." Source: GOC:mah
  
 
===Acknowledgments===
 
===Acknowledgments===

Revision as of 18:51, 3 April 2024

Purpose:

This assignment helps us learn microarray data analysis and gene analysis techniques. It also provides practice in determining p-values and organizing data effectively.

Procedure:

To prepare my microarray data file for loading into STEM, I first inserted a new worksheet into my Excel workbook and named it "dgln3_stem". Then, I copied all the data from my "dgln3_ANOVA" worksheet and pasted special > paste values into my new worksheet.

Next, I ensured that the leftmost column had the column header "Master_Index" and renamed it to "SPOT". Column B was named "ID", which I renamed to "Gene Symbol". I deleted the column named "Standard_Name".

I filtered the data based on the B-H corrected p-value to be > 0.05, ensuring that only genes with a significant change in expression remained. After filtering, I deleted the rows with non-significant changes by selecting all rows except the header row and right-clicking to choose "Delete Row" from the context menu. Then, I undid the filter.

Following that, I deleted all data columns except for the Average Log Fold Change columns for each timepoint, such as "dgln3_AvgLogFC_t15", etc. I renamed these data columns with just the time and units, like "15m", "30m", etc.

To address any remaining #DIV/0! errors, I opened the Find/Replace dialog, searched for #DIV/0!, and replaced it with nothing. This ensured that no errors were left in the data.

After saving my work, I used Save As to save the spreadsheet as Text (Tab-delimited) (*.txt) and clicked OK to the warnings before closing the file.

Moving on to running STEM, I downloaded and extracted the STEM software, then launched the stem.jar file inside the stem folder.

In section 1 of the main STEM interface window (Expression Data Info), I browsed and selected my file, chose "No normalization/add 0", and checked the box for Spot IDs included in the data file.

For section 2 (Gene Info), I left the default selections for Gene Annotation Source, Cross Reference Source, and Gene Location Source as "User provided". I then browsed and selected the "gene_association.sgd.gz" file from the stem folder for the Gene Annotation File.

In section 3 (Options), I made sure the Clustering Method was set to "STEM Clustering Method" and left the defaults for Maximum Number of Model Profiles and Maximum Unit Change in Model Profiles between Time Points.

Finally, in section 4 (Execute), I clicked the Execute button to run STEM.

Upon completion, I viewed and saved the STEM results. I took screenshots of the "All STEM Profiles" window and individual profile windows, saving them in a PowerPoint presentation.

For each significant profile, I saved the gene list and GO terms list.

Now, onto analyzing and interpreting the STEM results, I selected a profile with a clear cold shock/recovery pattern for further investigation. I chose profile 45 because it seemed to reflect a significant response to the cold shock conditions and had the largest amount of genes.

The number of genes belonging to this profile, as well as the expected number, and the p-value for the enrichment of genes in this profile, were noted.

I then opened the GO list file for this profile in Excel, filtered the terms based on p-values and corrected p-values < 0.05, and noted the number of associated GO terms meeting these criteria.

Next, I selected six significant GO terms from the filtered list, ensuring they were meaningful in the context of the gene expression profile.

I also checked the YEASTRACT database to infer which transcription factors regulate the genes in my selected profile. After pasting my list of genes into the YEASTRACT database, I identified the significant transcription factors and recorded their details, including whether CIN5 or GLN3 were on the list.

Lastly, I used GRNsight to create and visualize a gene regulatory network with approximately 15-20 connected transcription factors, including GLN3 and CIN5 if necessary. I recorded the number of genes and edges in the network and exported the network image.

For creating the GRNmap input workbook, I exported the data from GRNsight to Excel, ensuring all necessary sheets and data were present and correctly formatted. I inserted a new worksheet named "network_weights" and copied the contents of the "network" sheet into it.

Finally, I saved and uploaded my Excel Workbook to the wiki, linking it to my individual journal page for further analysis.

Methods/Results:

Media: BIOL367_S24_microarray-data_dGLN3CKAS31211.xlsx

Media:Updated_Pvalues_ckaplan.pdf

Media:Andrew&Charlotte_Tables_Gene-GoData.zip

Media:AS&CK_BIOL367_S24_STEM_PHOTOS_dGLN3.pptx

Media:Yeastract_45_Gene_CK.xlsx

Media:GRN_(Yeastmine_-_SGD__2024-03-19;_13_genes,_21_edges)_weighted_(2)_ck_45.xlsx

Media:CKgenegraph.jpg


  • How many GO terms are less than 0.05?

-Started with 160- after filtering 72. Then after filtering corrected p values I got 12.


  • Why did you select this profile? In other words, why was it interesting to you?

I selected profile 45 because I thought it was interesting because out of all off our profiles, it had the most genes.


  • How many genes belong to this profile?

406


  • How many genes were expected to belong to this profile?

29.9


  • What is the p value for the enrichment of genes in this profile?

0.00


  • How many significant green genes?

I have 44 green genes


  • Are Gln3p and Cin5p present? What are the percentages?

Gln3p 46.65% Cin5p 31.27%


  • Genes and edges in my network:

13 genes and 14 edges

  • Select 6 Gene Ontology terms and give definitions:

GO:0000027 "The aggregation, arrangement and bonding together of constituent RNAs and proteins to form the large ribosomal subunit." Source: GOC:jl, PMID:30467428

GO:0034475 "Any process involved in forming the mature 3' end of a U4 snRNA molecule." Source: GOC:mah

GO:0003676 "Binding to a nucleic acid." Source: GOC:jl

GO:0019843 "Binding to a ribosomal RNA." Source: GOC:jl

GO:0005654 "That part of the nuclear content other than the chromosomes or the nucleolus." Source: GOC:ma, ISBN:0124325653

GO:0000055 "The directed movement of a ribosomal large subunit from the nucleus into the cytoplasm." Source: GOC:mah

Acknowledgments

I worked with Andrew in and out of class throughout the week. Dr. Dahlquist assisted us in class.

References

Dahlquist, K. Master_sheet_dGLN3.

LMU BioDB 2024. (2024). Week 9. Retrieved Mar 20, 2024, from https://xmlpipedb.cs.lmu.edu/biodb/Spring2024/index.php/Week_9

LMU BioDB 2024. (2024). Week 9. Retrieved Mar 31, 2024, from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_10



Assignment Pages

Individual Journal Entry Pages

Shared Journal Entry Pages