Data Analysts Week 14

From LMU BioDB 2024
Revision as of 11:12, 3 May 2024 by Kmill104 (talk | contribs) (Milestone 4)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Continuing Milestone 3

Hailey Ivanson helped Katie and I with the Bonferroni and B-H values.

  1. We used the formula =IFCHP_Bonferroni_p-value>1,1,CHP_Bonferroni_p-value) and =IFControl_Bonferroni_p-value>1,1,Control_Bonferroni_p-value)
  2. we inserted a new worksheet named "CHP_ANOVA_B-H" and "Control_ANOVA_B-H."
  3. we copied and pasted the "MasterIndex", "ID", and "Standard Name" columns from our previous worksheet into the first two columns of the new worksheet.
  4. We used Paste special > Paste values and copied our unadjusted p values from our ANOVA worksheet and pasted it into Column D.
  5. We selected all of columns A, B, C, and D and sorted by ascending values, smallest to largest.
  6. We typed "Rank" in cell E1. we typed "1" into cell E2 and "2" into cell E3. Then we selected both cells E2 and E3 and double clicked on the plus sign to fill the column with a series of numbers from 1 to 4697.
  7. Hailey Ivanson assisted us in calculating the Benjamini and Hochberg p value correction. We typed CHP_B-H_p-value and repeated for the control. We copied that equation to the entire column. =(D2*4697)/E2
  8. We then typed "CHP_B-H_p-value" into cell G1.
  9. In cell G2: we used the equation =IF(F2>1,1,F2) and copied that equation to the entire column.
  10. We selected columns A through G and sorted them in ascending order.
  11. We copied column G and used Paste special > Paste values to paste it into the next column of our ANOVA sheet.
  12. We zipped and uploaded the .xlsx file.
  13. We performed a sanity check by selecting row 1: Data > Filter > Autofilter- p value less than 0.05

Sanity Check Results:

  • CONTROL

How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 4697)? 3699; 79% <.01 3219; 68% <.001 2558; 54% <.0001 1921; 41% <.00001 1325; 28%

  • CHP

How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 4697)? 2863; 61% <.01 2403; 51% <.001 1884; 40% <.0001 1435; 31% <.00001 1076; 22%

Our suitable p-value cut-off was chosen to be <.00001, as it was the closet value to 25%

Milestone 4

  1. We inserted a new worksheet into our Excel workbook and named it "CHP_stem".
  2. We selected all of the data from the "CHP_ANOVA" worksheet and Paste special > paste values into the "CHP_stem" worksheet.
  3. "Master_Index" was renamed to "SPOT". Column B named "ID" was renamed to "Gene Symbol". We deleted the column named "Standard_Name".
  4. We filtered the data on the B-H corrected p value to be > 0.0001.
  5. We selected all of the rows except for the header row and deleted the rows by right-clicking and choosing "Delete Row" from the context menu. Then we undid this filter.
  6. We deleted all of the data columns except for the Average Log Fold change columns for each timepoint.
  7. we renamed the data columns with just the time and units.
  8. We clicked "Replace all" to remove the #DIV/0! errors.
  9. We saved this spreadsheet as Text (Tab-delimited) (*.txt).
  10. We downloaded the stem.zip file and selected "Extract all" from the menu, creating a folder called stem.
  11. We clicked on the "Profile GO Table" to see the list of Gene Ontology. We saved these files to our desktop.
  12. We clicked on the "Profile Gene Table" button to see the list of genes belonging to each profile, and then downloaded these files to BOX.

We made adjustments to the GO term analysis by using the GO enrichment tool at GeneOntology.org by going to http://geneontology.org/ and then analyzing our clusters 41 and 7. We copied over each cluster's gene list from their txt. file converted to an Excel file into the "GO Enrichment Analysis" box, chose "Saccharomyces cerevisiae" from the drop-down menu, and then clicked "Launch". We then exported each table and saved as txt. files in BOX, linked here: GO Terms 41, GO Terms 7

    • We ran YEASTRACT according to the protocol listed in Week 10 for clusters 41 and 7.
    • We opened the gene list file in Excel for cluster 41 and copied the list of gene IDs.
    • We went to the YEASTRACT website at YEASTRACT database, and clicked on "Rank by TF" in the left panel of the window. We pasted in our gene list for 41 into the box called "ORFs/Genes".
    • We checked the box for Check for all TFs.
    • We accepted the defaults for the Regulations Filter (Documented, DNA binding or expression evidence)
    • We didn't apply a filter for "Filter Documented Regulations by environmental condition".
    • We ranked genes by TF using: The % of genes in the list and in YEASTRACT regulated by each TF.
    • We clicked the Search button.
    • We copied the table of results from the web page and pasted it into a new Excel workbook to preserve the results, which was uploaded to BOX as a txt. file and linked here: YEASTRACT 41
    • These steps were then repeated for Cluster 7, results were uploaded to BOX and linked here: YEASTRACT 7


4/24/24 Milestone 4 P value >0.00001 41 and 7 significant profiles

We then chose 23 transcription factors from profile 41, listed here. Rpn4 Gcn4 Pdr1 Xbp1 Met28 Mga2 Spt23 Bas1 Yap1 Sok2 Msn2 Crz1 Rlm1 Fhl1 Pdr3 Cbf1 Rph1 Met31 Stp1 Msn4 Tec1 Rgm1 Stp2

Acknowledgements

This procedure was adapted from the Data Analysis page Milestone 3 and 4 protocol, linked here: Data Analysis The procedure for Milestone 3 was also adapted from the steps outlined in the Week 9 assignment page. The procedure for Milestone 4 was also adapted from the steps outlined in the Week 10 assignment page. Our quality assurance, Hailey Ivanson was a key part in completing this milestone, and her help was very valuable. Except for what is noted above, this individual journal entry was completed by Katie and Charlotte and not copied from another source.

Ckapla12 (talk) 14:28, 23 April 2024 (PDT)

Kmill104 (talk) 19:15, 2 May 2024 (PDT)

References

LMU BioDB 2024. (2024). Week 14. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_14

LMU BioDB 2024. (2024). Week 9. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_9

LMU BioDB 2024. (2024). Week 10. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_10

LMU BioDB 2024. (2024). Data Analysis. Retrieved April 23, 2024 from https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Data_Analysis

Team Pages

Role Pages