Difference between revisions of "Sulfiknights DA Week 12/13"

From LMU BioDB 2019
Jump to navigation Jump to search
(References: added reference)
(Acknowledgments: fix dataset)
 
(9 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
{{Sulfiknights}}
 
{{Sulfiknights}}
 
==Purpose==
 
==Purpose==
 +
The purpose of this portion of the investigation is to analyze the data provided by Thorsen et al. in their experiment of profiling changes of ''Saccharomyces cerevisiae''  gene expression in response to exposure to arsenite.
  
 
==Methods & Results==
 
==Methods & Results==
Line 24: Line 25:
 
*ANOVA procedure was based on the methods from [[Week 8]].
 
*ANOVA procedure was based on the methods from [[Week 8]].
 
#A new sheet called "swtVnwt_1mM_ANOVA" was created and all data from Columns A-R was copied and pasted into this sheet.
 
#A new sheet called "swtVnwt_1mM_ANOVA" was created and all data from Columns A-R was copied and pasted into this sheet.
#Columns "swtVnwt_AvgLogFC_1mM_15m" - "swtVnwt_AvgLogFC_1mM_1080m" for each time point 15m, 30m, 60m, and 1080m were created and contain the equation the average of each replicate for each time point.  
+
#Columns S-V were called "swtVnwt_AvgLogFC_1mM_15m" changing the last part for each time point 15m, 30m, 60m, and 1080m respectively and contained the equation the average of each replicate for each time point.  
#Column "swtVnwt_ss_HO" was created containing the equation SUMSQ(D2:R2).  
+
#Column "swtVnwt_ss_HO" was created containing the equation SUMSQ(all LogFC values in row).
#Sanity Check: 1068/4785 p-values are less than 0.05
+
#Columns X-AA were named "swtVnwt_ss_15m" changing the last part for each time point 15m, 30m, 60m, and 1080m respectively and contained the equation =SUMSQ(average LogFC at that time point)-COUNTA(average LogFC at that time point)*V2^2.
 +
#Named column AB "swtVnwt_SS_full" and inserted the sum of columns X-AA.
 +
#The next column was named "swtVnwt_Fstat" and inserted the equation =((15-4)/4)*(<swtVnwt_ss_HO>-AB2)/AB2.
 +
#Column "swtVnwt_p-value" was created and the =FDIST(<swtVnwt_Fstat>,4,15-4) equation was inputed.  
 +
#A sanity check was run:
 +
#*1068/4785 p-values are less than 0.05
 +
#Two "swtVnwt_Bonferroni_p-value" columns were created, one containing the equation <swtVnwt_p-value>*4785 and the other containing the equation =IF(AE2>1,1,AE2).
 +
#A new sheet called "swtVnwt_1mM_ANOVA_B-H" was created, and we copied and pasted Columns A-C in it and pasted column "swtVnwt_p-value" in column D.
 +
#Genes were ranked by p-value and then by MasterIndex.
 +
#Column "swtVnwt_B-H_p-value" was created containing the equation =(D2*4785)/E2 and then replicated into column G. This column was then pasted into the previous sheet.
 +
 
 +
===Sanity Check===
 +
#Selected row 1
 +
#Autofilter all headers
 +
#Filtered for how many p-values are <0.05 for unadjusted p-values, BH p-values, and Bonferroni p-values
 +
#Filtered for how many unadjusted p-values for <0.01, <0.001, and 0.0001
 +
*All values are reflected in [[Media:Thorsendata_sulfiknights_pvalueslide.pptx|Thorsen P-value Slide]]
  
 
==Data & Files==
 
==Data & Files==
 
[[Media:Thorsendata_sulfiknights.xlsx|Thorsen Data]]
 
[[Media:Thorsendata_sulfiknights.xlsx|Thorsen Data]]
 +
 +
[[Media:Thorsendata_sulfiknights_pvalueslide.pptx|Thorsen P-value Slide]]
  
 
==Conclusion==
 
==Conclusion==
 +
We were able to reorganize the data in the way that was best suited for further ANOVA, statistical, and database analyses. We determined that a certain number of genes have p-values less than 0.05, and BH values were determined as well. So far, ANOVA analysis has only been completed on the SetA family dataset because the other datasets include varying conditions. The statistical significance found is in line with the results of Thorsen et al. that observed an increase in genes associated with sulfate assimilation and glutathione biosynthesis pathways.
  
 
==Acknowledgments==
 
==Acknowledgments==
 +
We would like to thank Dr. Dahlquist for providing the instructions in [[Week 8]] that were referenced to perform the ANOVA of the SetA family dataset and for reviewing the analysis. We would also like to thank Thorsen et al. for providing the datasets.
  
 
==References==
 
==References==
 
*LMU BioDB 2019. (2019). Week 8. Retrieved November 25, 2019, from [https://xmlpipedb.cs.lmu.edu/biodb/fall2019/index.php/Week_8 https://xmlpipedb.cs.lmu.edu/biodb/fall2019/index.php/Week_8].
 
*LMU BioDB 2019. (2019). Week 8. Retrieved November 25, 2019, from [https://xmlpipedb.cs.lmu.edu/biodb/fall2019/index.php/Week_8 https://xmlpipedb.cs.lmu.edu/biodb/fall2019/index.php/Week_8].
 
*Thorsen, M., Lagniel, G., Kristiansson, E., Junot, C., Nerman, O., Labarre, J., & Tamás, M. J. (2007). Quantitative transcriptome, proteome, and sulfur metabolite profiling of the Saccharomyces cerevisiae response to arsenite. Physiological genomics, 30(1), 35-43. DOI: https://doi.org/10.1152/physiolgenomics.00236.2006
 
*Thorsen, M., Lagniel, G., Kristiansson, E., Junot, C., Nerman, O., Labarre, J., & Tamás, M. J. (2007). Quantitative transcriptome, proteome, and sulfur metabolite profiling of the Saccharomyces cerevisiae response to arsenite. Physiological genomics, 30(1), 35-43. DOI: https://doi.org/10.1152/physiolgenomics.00236.2006

Latest revision as of 23:50, 25 November 2019

Sulfiknight Links
BIOL Databases Main Page Sulfiknights: Project Overview Page Final Project Deliverables Requirements Sulfiknights: Final Project Deliverables Members Project Manager & Quality Assurance: Naomi Tesfaiohannes Quality Assurance: Joey Nimmers-Minor Data Analysis: Ivy-Quynh Macaraeg & Marcus Avila Designer: DeLisa Madere
Assignment Pages Week 11 Week 12/13 Week 15

Template:Sulfiknights

Purpose

The purpose of this portion of the investigation is to analyze the data provided by Thorsen et al. in their experiment of profiling changes of Saccharomyces cerevisiae gene expression in response to exposure to arsenite.

Methods & Results

Sample to data relationship table:

  • GSE6068_setA_family - swt vs nswt 1mM @ 1 hr (6 replicates); 15 min, 30 min, 18 hrs (3 replicates each)
  • GSE6129_set0_family - swt vs nswt 1mM @ 1hr; 6 replicates
  • GSE6129_set1_family - swt vs nswt .2mM @ 1 hr, 15 min, 30 min, 18 hr; 3 replicates each
  • GSE6129_set2_family - sYAP4 vs swt .2mM AND 1 mM each @ 1 hr, 3 replicates each
  • GSE6129_set3_family - sMET4 vs swt .2mM AND 1 mM each @ 1 hr, 3 replicates each

Organizing the Data

  1. Downloaded data from Thorsen et al.
  2. Changed "Name" to "Standard_ID" in column C.
  3. Changed column headers in each sheet:
    • swt = stressed wild type, nswt= nonstressed wild type
    • 1mM, 0.2mM = concentration of As(III) at which the cell was exposed
    • 1h, 15m, 30m, 18h = time point (h = hours, m = minutes)
    • rn = replicate number (n)
  4. Inserted MasterIndex in "GSE6068_setA_family"sheet.

Conducting the ANOVA

  • All data analysis was conducted on data on the "GSE6068_setA_family" sheet.
  • ANOVA procedure was based on the methods from Week 8.
  1. A new sheet called "swtVnwt_1mM_ANOVA" was created and all data from Columns A-R was copied and pasted into this sheet.
  2. Columns S-V were called "swtVnwt_AvgLogFC_1mM_15m" changing the last part for each time point 15m, 30m, 60m, and 1080m respectively and contained the equation the average of each replicate for each time point.
  3. Column "swtVnwt_ss_HO" was created containing the equation SUMSQ(all LogFC values in row).
  4. Columns X-AA were named "swtVnwt_ss_15m" changing the last part for each time point 15m, 30m, 60m, and 1080m respectively and contained the equation =SUMSQ(average LogFC at that time point)-COUNTA(average LogFC at that time point)*V2^2.
  5. Named column AB "swtVnwt_SS_full" and inserted the sum of columns X-AA.
  6. The next column was named "swtVnwt_Fstat" and inserted the equation =((15-4)/4)*(<swtVnwt_ss_HO>-AB2)/AB2.
  7. Column "swtVnwt_p-value" was created and the =FDIST(<swtVnwt_Fstat>,4,15-4) equation was inputed.
  8. A sanity check was run:
    • 1068/4785 p-values are less than 0.05
  9. Two "swtVnwt_Bonferroni_p-value" columns were created, one containing the equation <swtVnwt_p-value>*4785 and the other containing the equation =IF(AE2>1,1,AE2).
  10. A new sheet called "swtVnwt_1mM_ANOVA_B-H" was created, and we copied and pasted Columns A-C in it and pasted column "swtVnwt_p-value" in column D.
  11. Genes were ranked by p-value and then by MasterIndex.
  12. Column "swtVnwt_B-H_p-value" was created containing the equation =(D2*4785)/E2 and then replicated into column G. This column was then pasted into the previous sheet.

Sanity Check

  1. Selected row 1
  2. Autofilter all headers
  3. Filtered for how many p-values are <0.05 for unadjusted p-values, BH p-values, and Bonferroni p-values
  4. Filtered for how many unadjusted p-values for <0.01, <0.001, and 0.0001

Data & Files

Thorsen Data

Thorsen P-value Slide

Conclusion

We were able to reorganize the data in the way that was best suited for further ANOVA, statistical, and database analyses. We determined that a certain number of genes have p-values less than 0.05, and BH values were determined as well. So far, ANOVA analysis has only been completed on the SetA family dataset because the other datasets include varying conditions. The statistical significance found is in line with the results of Thorsen et al. that observed an increase in genes associated with sulfate assimilation and glutathione biosynthesis pathways.

Acknowledgments

We would like to thank Dr. Dahlquist for providing the instructions in Week 8 that were referenced to perform the ANOVA of the SetA family dataset and for reviewing the analysis. We would also like to thank Thorsen et al. for providing the datasets.

References