Difference between revisions of "Data Analysts Week 14"

From LMU BioDB 2024
Jump to navigation Jump to search
(Continuing Milestone 3: adding cut off value)
(Milestone 4: adding more steps)
Line 43: Line 43:
  
 
===Milestone 4===
 
===Milestone 4===
#We inserted a new worksheet and named it "CHP_stem".
+
#We inserted a new worksheet into our Excel workbook and named it "CHP_stem".
 
#We selected all of the data from the "CHP_ANOVA" worksheet and Paste special > paste values into the "CHP_stem" worksheet.
 
#We selected all of the data from the "CHP_ANOVA" worksheet and Paste special > paste values into the "CHP_stem" worksheet.
 
#"Master_Index" was renamed  to "SPOT". Column B named "ID" was renamed to "Gene Symbol". We deleted the column named "Standard_Name".
 
#"Master_Index" was renamed  to "SPOT". Column B named "ID" was renamed to "Gene Symbol". We deleted the column named "Standard_Name".
#We filtered the data on the B-H corrected p value to be > 0.05.
+
#We filtered the data on the B-H corrected p value to be > 0.0001.
 
#We selected all of the rows except for the header row and deleted the rows by right-clicking and choosing "Delete Row" from the context menu. Then we undid this filter.  
 
#We selected all of the rows except for the header row and deleted the rows by right-clicking and choosing "Delete Row" from the context menu. Then we undid this filter.  
 
#We deleted all of the data columns except for the Average Log Fold change columns for each timepoint.
 
#We deleted all of the data columns except for the Average Log Fold change columns for each timepoint.
Line 53: Line 53:
 
#We saved this spreadsheet as Text (Tab-delimited) (*.txt).
 
#We saved this spreadsheet as Text (Tab-delimited) (*.txt).
 
#We downloaded the stem.zip file and selected "Extract all" from the menu, creating a folder called stem.
 
#We downloaded the stem.zip file and selected "Extract all" from the menu, creating a folder called stem.
#We clicked on the "Profile GO Table" to see the list of Gene Ontology. We saved the file to our desktop.  
+
#We clicked on the "Profile GO Table" to see the list of Gene Ontology. We saved these files to our desktop.  
#We clicked on the "Profile Gene Table" button to see the list of genes belonging to the profile.
+
#We clicked on the "Profile Gene Table" button to see the list of genes belonging to each profile, and then downloaded these files to BOX.
 +
 
 +
We made adjustments to the GO term analysis by using the GO enrichment tool at GeneOntology.org by going to [http://geneontology.org/ http://geneontology.org/] and then analyzing our clusters 41 and 7. We copied over each cluster's gene list from their txt. file converted to an Excel file into the "GO Enrichment Analysis" box, chose "Saccharomyces cerevisiae" from the drop-down menu, and then clicked "Launch". We then exported each table and saved as txt. files in BOX, linked here: [https://lmu.app.box.com/file/1512453796651], [https://lmu.app.box.com/file/1512450115120]
  
  
Line 86: Line 88:
 
Rgm1
 
Rgm1
 
Stp2
 
Stp2
 
 
  
 
==Acknowledgements==
 
==Acknowledgements==

Revision as of 10:45, 3 May 2024

Continuing Milestone 3

Hailey Ivanson helped Katie and I with the Bonferroni and B-H values.

  1. We used the formula =IFCHP_Bonferroni_p-value>1,1,CHP_Bonferroni_p-value) and =IFControl_Bonferroni_p-value>1,1,Control_Bonferroni_p-value)
  2. we inserted a new worksheet named "CHP_ANOVA_B-H" and "Control_ANOVA_B-H."
  3. we copied and pasted the "MasterIndex", "ID", and "Standard Name" columns from our previous worksheet into the first two columns of the new worksheet.
  4. We used Paste special > Paste values and copied our unadjusted p values from our ANOVA worksheet and pasted it into Column D.
  5. We selected all of columns A, B, C, and D and sorted by ascending values, smallest to largest.
  6. We typed "Rank" in cell E1. we typed "1" into cell E2 and "2" into cell E3. Then we selected both cells E2 and E3 and double clicked on the plus sign to fill the column with a series of numbers from 1 to 4697.
  7. Hailey Ivanson assisted us in calculating the Benjamini and Hochberg p value correction. We typed CHP_B-H_p-value and repeated for the control. We copied that equation to the entire column. =(D2*4697)/E2
  8. We then typed "CHP_B-H_p-value" into cell G1.
  9. In cell G2: we used the equation =IF(F2>1,1,F2) and copied that equation to the entire column.
  10. We selected columns A through G and sorted them in ascending order.
  11. We copied column G and used Paste special > Paste values to paste it into the next column of our ANOVA sheet.
  12. We zipped and uploaded the .xlsx file.
  13. We performed a sanity check by selecting row 1: Data > Filter > Autofilter- p value less than 0.05

Sanity Check Results:

  • CONTROL

How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 4697)? 3699; 79% <.01 3219; 68% <.001 2558; 54% <.0001 1921; 41% <.00001 1325; 28%

  • CHP

How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 4697)? 2863; 61% <.01 2403; 51% <.001 1884; 40% <.0001 1435; 31% <.00001 1076; 22%

Our suitable p-value cut-off was chosen to be <.00001, as it was the closet value to 25%

Milestone 4

  1. We inserted a new worksheet into our Excel workbook and named it "CHP_stem".
  2. We selected all of the data from the "CHP_ANOVA" worksheet and Paste special > paste values into the "CHP_stem" worksheet.
  3. "Master_Index" was renamed to "SPOT". Column B named "ID" was renamed to "Gene Symbol". We deleted the column named "Standard_Name".
  4. We filtered the data on the B-H corrected p value to be > 0.0001.
  5. We selected all of the rows except for the header row and deleted the rows by right-clicking and choosing "Delete Row" from the context menu. Then we undid this filter.
  6. We deleted all of the data columns except for the Average Log Fold change columns for each timepoint.
  7. we renamed the data columns with just the time and units.
  8. We clicked "Replace all" to remove the #DIV/0! errors.
  9. We saved this spreadsheet as Text (Tab-delimited) (*.txt).
  10. We downloaded the stem.zip file and selected "Extract all" from the menu, creating a folder called stem.
  11. We clicked on the "Profile GO Table" to see the list of Gene Ontology. We saved these files to our desktop.
  12. We clicked on the "Profile Gene Table" button to see the list of genes belonging to each profile, and then downloaded these files to BOX.

We made adjustments to the GO term analysis by using the GO enrichment tool at GeneOntology.org by going to http://geneontology.org/ and then analyzing our clusters 41 and 7. We copied over each cluster's gene list from their txt. file converted to an Excel file into the "GO Enrichment Analysis" box, chose "Saccharomyces cerevisiae" from the drop-down menu, and then clicked "Launch". We then exported each table and saved as txt. files in BOX, linked here: [1], [2]


4/24/24 Milestone 4 P value >0.00001 41 and 7 significant profiles

Profile 41 grnsight: Rpn4 Gcn4 Pdr1 Xbp1 Met28 Mga2 Spt23 Bas1 Yap1 Sok2 Msn2 Crz1 Rlm1 Fhl1 Pdr3 Cbf1 Rph1 Met31 Stp1 Msn4 Tec1 Rgm1 Stp2

Acknowledgements

This procedure was adapted from the Data Analysis page Milestone 3 and 4 protocol, linked here: Data Analysis The procedure for Milestone 3 was also adapted from the steps outlined in the Week 9 assignment page. The procedure for Milestone 4 was also adapted from the steps outlined in the Week 10 assignment page. Our quality assurance, Hailey Ivanson was a key part in completing this milestone, and her help was very valuable. Except for what is noted above, this individual journal entry was completed by Katie and Charlotte and not copied from another source.

Ckapla12 (talk) 14:28, 23 April 2024 (PDT)

Kmill104 (talk) 19:15, 2 May 2024 (PDT)

References

LMU BioDB 2024. (2024). Week 14. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_14

LMU BioDB 2024. (2024). Week 9. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_9

LMU BioDB 2024. (2024). Week 10. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_10

LMU BioDB 2024. (2024). Data Analysis. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Data_Analysis

Team Pages

Role Pages