Sulfiknights DA Week 12/13
Sulfiknight Links | |||||||||
---|---|---|---|---|---|---|---|---|---|
BIOL Databases Main Page | Sulfiknights: Project Overview Page | Final Project Deliverables Requirements | Sulfiknights: Final Project Deliverables | Members | Project Manager & Quality Assurance: Naomi Tesfaiohannes | Quality Assurance: Joey Nimmers-Minor | Data Analysis: Ivy-Quynh Macaraeg & Marcus Avila | Designer: DeLisa Madere | |
Assignment Pages | Week 11 | Week 12/13 | Week 15 |
Contents
Purpose
The purpose of this portion of the investigation is to analyze the data provided by Thorsen et al. in their experiment of profiling changes of Saccharomyces cerevisiae gene expression in response to exposure to arsenite.
Methods & Results
Sample to data relationship table:
- GSE6068_setA_family - swt vs nswt 1mM @ 1 hr (6 replicates); 15 min, 30 min, 18 hrs (3 replicates each)
- GSE6129_set0_family - swt vs nswt 1mM @ 1hr; 6 replicates
- GSE6129_set1_family - swt vs nswt .2mM @ 1 hr, 15 min, 30 min, 18 hr; 3 replicates each
- GSE6129_set2_family - sYAP4 vs swt .2mM AND 1 mM each @ 1 hr, 3 replicates each
- GSE6129_set3_family - sMET4 vs swt .2mM AND 1 mM each @ 1 hr, 3 replicates each
Organizing the Data
- Downloaded data from Thorsen et al.
- Changed "Name" to "Standard_ID" in column C.
- Names were changed to the standard names by the "ORF List <-> Gene List" tool from YEASTRACT.
- Changed column headers in each sheet:
- swt = stressed wild type, nswt= nonstressed wild type
- 1mM, 0.2mM = concentration of As(III) at which the cell was exposed
- 1h, 15m, 30m, 18h = time point (h = hours, m = minutes)
- rn = replicate number (n)
- Inserted MasterIndex in "GSE6068_setA_family"sheet.
Conducting the ANOVA
- All data analysis was conducted on data on the "GSE6068_setA_family" sheet.
- ANOVA procedure was based on the methods from Week 8.
- A new sheet called "swtVnwt_1mM_ANOVA" was created and all data from Columns A-R was copied and pasted into this sheet.
- Columns S-V were called "swtVnwt_AvgLogFC_1mM_15m" changing the last part for each time point 15m, 30m, 60m, and 1080m respectively and contained the equation the average of each replicate for each time point.
- Column "swtVnwt_ss_HO" was created containing the equation SUMSQ(all LogFC values in row).
- Columns X-AA were named "swtVnwt_ss_15m" changing the last part for each time point 15m, 30m, 60m, and 1080m respectively and contained the equation =SUMSQ(average LogFC at that time point)-COUNTA(average LogFC at that time point)*V2^2.
- Named column AB "swtVnwt_SS_full" and inserted the sum of columns X-AA.
- The next column was named "swtVnwt_Fstat" and inserted the equation =((15-4)/4)*(<swtVnwt_ss_HO>-AB2)/AB2.
- Column "swtVnwt_p-value" was created and the =FDIST(<swtVnwt_Fstat>,4,15-4) equation was inputed.
- A sanity check was run:
- 1068/4785 p-values are less than 0.05
- Two "swtVnwt_Bonferroni_p-value" columns were created, one containing the equation <swtVnwt_p-value>*4785 and the other containing the equation =IF(AE2>1,1,AE2).
- A new sheet called "swtVnwt_1mM_ANOVA_B-H" was created, and we copied and pasted Columns A-C in it and pasted column "swtVnwt_p-value" in column D.
- Genes were ranked by p-value and then by MasterIndex.
- Column "swtVnwt_B-H_p-value" was created containing the equation =(D2*4785)/E2 and then replicated into column G. This column was then pasted into the previous sheet.
Sanity Check
- Selected row 1
- Autofilter all headers
- Filtered for how many p-values are <0.05 for unadjusted p-values, BH p-values, and Bonferroni p-values
- Filtered for how many unadjusted p-values for <0.01, <0.001, and 0.0001
- All values are reflected in Thorsen P-value Slide
Data & Files
Conclusion
We were able to reorganize the data in the way that was best suited for further ANOVA, statistical, and database analyses. We determined that a certain number of genes have p-values less than 0.05, and BH values were determined as well. So far, ANOVA analysis has only been completed on the SetA family dataset because the other datasets include varying conditions. The statistical significance found is in line with the results of Thorsen et al. that observed an increase in genes associated with sulfate assimilation and glutathione biosynthesis pathways.
Acknowledgments
We would like to thank Dr. Dahlquist for providing the instructions in Week 8 that were referenced to perform the ANOVA of the SetA family dataset and for reviewing the analysis. We would also like to thank Thorsen et al. for providing the datasets.
References
- LMU BioDB 2019. (2019). Week 8. Retrieved November 25, 2019, from https://xmlpipedb.cs.lmu.edu/biodb/fall2019/index.php/Week_8.
- Thorsen, M., Lagniel, G., Kristiansson, E., Junot, C., Nerman, O., Labarre, J., & Tamás, M. J. (2007). Quantitative transcriptome, proteome, and sulfur metabolite profiling of the Saccharomyces cerevisiae response to arsenite. Physiological genomics, 30(1), 35-43. DOI: https://doi.org/10.1152/physiolgenomics.00236.2006