Sulfiknights DA Week 12/13

From LMU BioDB 2019
Jump to navigation Jump to search
Sulfiknight Links
BIOL Databases Main Page Sulfiknights: Project Overview Page Final Project Deliverables Requirements Sulfiknights: Final Project Deliverables Members Project Manager & Quality Assurance: Naomi Tesfaiohannes Quality Assurance: Joey Nimmers-Minor Data Analysis: Ivy-Quynh Macaraeg & Marcus Avila Designer: DeLisa Madere
Assignment Pages Week 11 Week 12/13 Week 15

Template:Sulfiknights

Purpose

The purpose of this portion of the investigation is to analyze the data provided by Thorsen et al. in their experiment of profiling changes of Saccharomyces cerevisiae gene expression in response to exposure to arsenite.

Methods & Results

Sample to data relationship table:

  • GSE6068_setA_family - swt vs nswt 1mM @ 1 hr (6 replicates); 15 min, 30 min, 18 hrs (3 replicates each)
  • GSE6129_set0_family - swt vs nswt 1mM @ 1hr; 6 replicates
  • GSE6129_set1_family - swt vs nswt .2mM @ 1 hr, 15 min, 30 min, 18 hr; 3 replicates each
  • GSE6129_set2_family - sYAP4 vs swt .2mM AND 1 mM each @ 1 hr, 3 replicates each
  • GSE6129_set3_family - sMET4 vs swt .2mM AND 1 mM each @ 1 hr, 3 replicates each

Organizing the Data

  1. Downloaded data from Thorsen et al.
  2. Changed "Name" to "Standard_ID" in column C.
  3. Changed column headers in each sheet:
    • swt = stressed wild type, nswt= nonstressed wild type
    • 1mM, 0.2mM = concentration of As(III) at which the cell was exposed
    • 1h, 15m, 30m, 18h = time point (h = hours, m = minutes)
    • rn = replicate number (n)
  4. Inserted MasterIndex in "GSE6068_setA_family"sheet.

Conducting the ANOVA

  • All data analysis was conducted on data on the "GSE6068_setA_family" sheet.
  • ANOVA procedure was based on the methods from Week 8.
  1. A new sheet called "swtVnwt_1mM_ANOVA" was created and all data from Columns A-R was copied and pasted into this sheet.
  2. Columns S-V were called "swtVnwt_AvgLogFC_1mM_15m" changing the last part for each time point 15m, 30m, 60m, and 1080m respectively and contained the equation the average of each replicate for each time point.
  3. Column "swtVnwt_ss_HO" was created containing the equation SUMSQ(all LogFC values in row).
  4. Columns X-AA were named "swtVnwt_ss_15m" changing the last part for each time point 15m, 30m, 60m, and 1080m respectively and contained the equation =SUMSQ(average LogFC at that time point)-COUNTA(average LogFC at that time point)*V2^2.
  5. Named column AB "swtVnwt_SS_full" and inserted the sum of columns X-AA.
  6. The next column was named "swtVnwt_Fstat" and inserted the equation =((15-4)/4)*(<swtVnwt_ss_HO>-AB2)/AB2.
  7. Column "swtVnwt_p-value" was created and the =FDIST(<swtVnwt_Fstat>,4,15-4) equation was inputed.
  8. A sanity check was run:
    • 1068/4785 p-values are less than 0.05
  9. Two "swtVnwt_Bonferroni_p-value" columns were created, one containing the equation <swtVnwt_p-value>*4785 and the other containing the equation =IF(AE2>1,1,AE2).
  10. A new sheet called "swtVnwt_1mM_ANOVA_B-H" was created, and we copied and pasted Columns A-C in it and pasted column "swtVnwt_p-value" in column D.
  11. Genes were ranked by p-value and then by MasterIndex.
  12. Column "swtVnwt_B-H_p-value" was created containing the equation =(D2*4785)/E2 and then replicated into column G. This column was then pasted into the previous sheet.

Sanity Check

  1. Selected row 1
  2. Autofilter all headers
  3. Filtered for how many p-values are <0.05 for unadjusted p-values, BH p-values, and Bonferroni p-values
  4. Filtered for how many unadjusted p-values for <0.01, <0.001, and 0.0001

Data & Files

Thorsen Data

Thorsen P-value Slide

Conclusion

We were able to reorganize the data in the way that was best suited for further ANOVA, statistical, and database analyses. We determined that a certain number of genes have p-values less than 0.05, and BH values were determined as well. So far, ANOVA analysis has only been completed on the SetA family dataset because the other datasets include varying conditions. The statistical significance found is in line with the results of Thorsen et al. that observed an increase in genes associated with sulfate assimilation and glutathione biosynthesis pathways.

Acknowledgments

We would like to thank Dr. Dahlquist for providing the instructions in Week 8 that were referenced to perform the ANOVA of the SetA family dataset and for reviewing the analysis. We would also like to thank Thorsen et al. for providing the datasets.

References