Difference between revisions of "Johnllopez Week 10"
From LMU BioDB 2017
(Explained how I prepared the document for STEM) |
(Put a lot of information into "analyzing and interpreting STEM results") |
||
Line 6: | Line 6: | ||
#I then deleted all of the columns except for the Average Log Fold change columns at the timepoints. I renamed the columns with just time and units. This would be used for analyzing the timepoints in STEM later on. | #I then deleted all of the columns except for the Average Log Fold change columns at the timepoints. I renamed the columns with just time and units. This would be used for analyzing the timepoints in STEM later on. | ||
#In addition, to avoid complications with the STEM software, I replaced any values with the error #DIV/0! with a blank string. There were 40 replacements made. | #In addition, to avoid complications with the STEM software, I replaced any values with the error #DIV/0! with a blank string. There were 40 replacements made. | ||
− | #I saved the spreadsheets as usual, then I saved it as a .txt file, which you can see [insert here]. | + | #I saved the spreadsheets as usual, then I saved it as a .txt file, which you can see [insert here]. |
− | ===Downloading and Extracting STEM Software=== | + | |
− | + | ===Downloading and Extracting STEM Software / Running STEM=== | |
+ | #I was able to successfully download the STEM software by going to the following link, downloading/extracting the file, and clicking on the .jar program within it : http://www.cs.cmu.edu/~jernst/stem/ | ||
+ | #Before I ran the software using the .txt file, I changed several settings. For the expression data info, I uploaded the .txt file, selected "no normalization" and "spot ID's included in file". | ||
+ | #In the gene info section, I selected "SGD" for the Gene Annotation Source, "no cross references", and "no gene locations". This ensured that the data would only come from SGD and be specialized for yeast. | ||
+ | #Finally, before executing the file, I made sure the clustering method was "STEM Clustering Method". | ||
===Viewing and Saving STEM Results=== | ===Viewing and Saving STEM Results=== | ||
+ | #After changing the Interface Options to say "X-axis scale should be based on real time", I took a screenshot of the "All STEM Profiles(1)" window, and placed it into the powerpoint given in the next step. | ||
+ | #[insert here | The following] powerpoint contains screenshots of each of the individual colored boxes, which meant that these p-values within that color have a statistically significant number of assigned genes. | ||
+ | #[insert here | This] .zip file contains a series of the genes belonging to each individual significant profile. | ||
+ | #[insert here | This] .zip file contains a series of the gene ontology terms belonging to each individual significant profile. | ||
===Analyzing and Interpreting STEM Results=== | ===Analyzing and Interpreting STEM Results=== | ||
+ | # I chose profile 36 to answer the following questions. I found 36 to be the most interesting because of the drastic expression changes at 30m, 90m, and 120m when the expression changes go from positive, to negative, then to positive again. | ||
+ | # 55 genes belong to this profile. | ||
+ | # 30.5 genes were expected to belong to this profile. | ||
+ | # The p-value for the enrichment of genes is 3.5E-5, or 0.000035. | ||
+ | # After filtering the Gene Ontology terms associated with profile 36 to have a p-value > .05, I discovered that 88 of them were associated with it. | ||
+ | # After filtering the GO terms associated with profile 36 to have a corrected p-value > .05, I discovered that 133 of them were associated with it. | ||
+ | # I then selected the following terms from my filtered list: "regulation of metabolic process", "catalytic activity, acting on RNA", "hydrolase activity, acting on ester bonds", "organelle part", "transcription, DNA-templated", and "response to chemical". | ||
===Summary=== | ===Summary=== | ||
==Acknowledgements and References== | ==Acknowledgements and References== | ||
===Acknowledgements=== | ===Acknowledgements=== | ||
===References=== | ===References=== |
Revision as of 19:33, 6 November 2017
Electronic Lab Notebook
Preparing My Microarray Data File for Loading into STEM
- I started this portion by downloading the following spreadsheets. I added a new worksheet named dSWI4_stem, selected the values from dSWI4_ANOVA, and copied them into the new worksheet.
- I modified this further by renaming the header "Master_Index" column to "SPOT", "ID" to "Gene Symbol", and deleting the "Standard_Name" column.
- I filtered the data on the B-H corrected p-value column to be greater than 0.05, and deleted all the data in the header row. Once I undid the filter, this ensured that all of the genes within the data set would have a B-H corrected p-value of <.05. The result was 2794 genes remaining.
- I then deleted all of the columns except for the Average Log Fold change columns at the timepoints. I renamed the columns with just time and units. This would be used for analyzing the timepoints in STEM later on.
- In addition, to avoid complications with the STEM software, I replaced any values with the error #DIV/0! with a blank string. There were 40 replacements made.
- I saved the spreadsheets as usual, then I saved it as a .txt file, which you can see [insert here].
Downloading and Extracting STEM Software / Running STEM
- I was able to successfully download the STEM software by going to the following link, downloading/extracting the file, and clicking on the .jar program within it : http://www.cs.cmu.edu/~jernst/stem/
- Before I ran the software using the .txt file, I changed several settings. For the expression data info, I uploaded the .txt file, selected "no normalization" and "spot ID's included in file".
- In the gene info section, I selected "SGD" for the Gene Annotation Source, "no cross references", and "no gene locations". This ensured that the data would only come from SGD and be specialized for yeast.
- Finally, before executing the file, I made sure the clustering method was "STEM Clustering Method".
Viewing and Saving STEM Results
- After changing the Interface Options to say "X-axis scale should be based on real time", I took a screenshot of the "All STEM Profiles(1)" window, and placed it into the powerpoint given in the next step.
- [insert here | The following] powerpoint contains screenshots of each of the individual colored boxes, which meant that these p-values within that color have a statistically significant number of assigned genes.
- [insert here | This] .zip file contains a series of the genes belonging to each individual significant profile.
- [insert here | This] .zip file contains a series of the gene ontology terms belonging to each individual significant profile.
Analyzing and Interpreting STEM Results
- I chose profile 36 to answer the following questions. I found 36 to be the most interesting because of the drastic expression changes at 30m, 90m, and 120m when the expression changes go from positive, to negative, then to positive again.
- 55 genes belong to this profile.
- 30.5 genes were expected to belong to this profile.
- The p-value for the enrichment of genes is 3.5E-5, or 0.000035.
- After filtering the Gene Ontology terms associated with profile 36 to have a p-value > .05, I discovered that 88 of them were associated with it.
- After filtering the GO terms associated with profile 36 to have a corrected p-value > .05, I discovered that 133 of them were associated with it.
- I then selected the following terms from my filtered list: "regulation of metabolic process", "catalytic activity, acting on RNA", "hydrolase activity, acting on ester bonds", "organelle part", "transcription, DNA-templated", and "response to chemical".