Sulfiknights DA Week 12/13
Jump to navigation
Jump to search
Sulfiknight Links | |||||||||
---|---|---|---|---|---|---|---|---|---|
BIOL Databases Main Page | Sulfiknights: Project Overview Page | Final Project Deliverables Requirements | Sulfiknights: Final Project Deliverables | Members | Project Manager & Quality Assurance: Naomi Tesfaiohannes | Quality Assurance: Joey Nimmers-Minor | Data Analysis: Ivy-Quynh Macaraeg & Marcus Avila | Designer: DeLisa Madere | |
Assignment Pages | Week 11 | Week 12/13 | Week 15 |
Contents
Purpose
Methods & Results
Sample to data relationship table:
- GSE6068_setA_family - swt vs nswt 1mM @ 1 hr (6 replicates); 15 min, 30 min, 18 hrs (3 replicates each)
- GSE6129_set0_family - swt vs nswt 1mM @ 1hr; 6 replicates
- GSE6129_set1_family - swt vs nswt .2mM @ 1 hr, 15 min, 30 min, 18 hr; 3 replicates each
- GSE6129_set2_family - sYAP4 vs swt .2mM AND 1 mM each @ 1 hr, 3 replicates each
- GSE6129_set3_family - sMET4 vs swt .2mM AND 1 mM each @ 1 hr, 3 replicates each
Organizing the Data
- Downloaded data from Thorsen et al.
- Changed "Name" to "Standard_ID" in column C.
- Names were changed to the standard names by the "ORF List <-> Gene List" tool from YEASTRACT.
- Changed column headers in each sheet:
- swt = stressed wild type, nswt= nonstressed wild type
- 1mM, 0.2mM = concentration of As(III) at which the cell was exposed
- 1h, 15m, 30m, 18h = time point (h = hours, m = minutes)
- rn = replicate number (n)
- Inserted MasterIndex in "GSE6068_setA_family"sheet.
Conducting the ANOVA
- All data analysis was conducted on data on the "GSE6068_setA_family" sheet.
- ANOVA procedure was based on the methods from Week 8.
- A new sheet called "swtVnwt_1mM_ANOVA" was created and all data from Columns A-R was copied and pasted into this sheet.
- Columns S-V were called "swtVnwt_AvgLogFC_1mM_15m" changing the last part for each time point 15m, 30m, 60m, and 1080m respectively and contained the equation the average of each replicate for each time point.
- Column "swtVnwt_ss_HO" was created containing the equation SUMSQ(all LogFC values in row).
- Columns X-AA were named "swtVnwt_ss_15m" changing the last part for each time point 15m, 30m, 60m, and 1080m respectively and contained the equation =SUMSQ(average LogFC at that time point)-COUNTA(average LogFC at that time point)*V2^2.
- Named column AB "swtVnwt_SS_full" and inserted the sum of columns X-AA.
- The next column was named "swtVnwt_Fstat" and inserted the equation =((15-4)/4)*(<swtVnwt_ss_HO>-AB2)/AB2.
- Column "swtVnwt_p-value" was created and the =FDIST(<swtVnwt_Fstat>,4,15-4) equation was inputed.
- A sanity check was run:
- 1068/4785 p-values are less than 0.05
- Two "swtVnwt_Bonferroni_p-value" columns were created, one containing the equation <swtVnwt_p-value>*4785 and the other containing the equation =IF(AE2>1,1,AE2).
- A new sheet called "swtVnwt_1mM_ANOVA_B-H" was created, and we copied and pasted Columns A-C in it and pasted column "swtVnwt_p-value" in column D.
- Genes were ranked by p-value and then by MasterIndex.
- Column "swtVnwt_B-H_p-value" was created containing the equation =(D2*4785)/E2 and then replicated into column G. This column was then pasted into the previous sheet.
Sanity Check
Data & Files
Conclusion
We were able to reorganize the data in the way that was best suited for further ANOVA, statistical, and database analyses. We determined that a certain number of genes have p-values less than 0.05, and BH values were determined as well. So far, ANOVA analysis has been completed on the SetA family dataset.
Acknowledgments
References
- LMU BioDB 2019. (2019). Week 8. Retrieved November 25, 2019, from https://xmlpipedb.cs.lmu.edu/biodb/fall2019/index.php/Week_8.
- Thorsen, M., Lagniel, G., Kristiansson, E., Junot, C., Nerman, O., Labarre, J., & Tamás, M. J. (2007). Quantitative transcriptome, proteome, and sulfur metabolite profiling of the Saccharomyces cerevisiae response to arsenite. Physiological genomics, 30(1), 35-43. DOI: https://doi.org/10.1152/physiolgenomics.00236.2006