Difference between revisions of "Sulfiknights DA Week 12/13"

Latest revision as of 00:50, 26 November 2019

Sulfiknight Links
BIOL Databases Main Page	Sulfiknights: Project Overview Page	Final Project Deliverables Requirements	Sulfiknights: Final Project Deliverables	Members	Project Manager & Quality Assurance: Naomi Tesfaiohannes	Quality Assurance: Joey Nimmers-Minor	Data Analysis: Ivy-Quynh Macaraeg & Marcus Avila	Designer: DeLisa Madere
BIOL Databases Main Page	Sulfiknights: Project Overview Page	Final Project Deliverables Requirements	Sulfiknights: Final Project Deliverables	Assignment Pages	Week 11	Week 12/13	Week 15

Template:Sulfiknights

Purpose

The purpose of this portion of the investigation is to analyze the data provided by Thorsen et al. in their experiment of profiling changes of Saccharomyces cerevisiae gene expression in response to exposure to arsenite.

Methods & Results

Sample to data relationship table:

GSE6068_setA_family - swt vs nswt 1mM @ 1 hr (6 replicates); 15 min, 30 min, 18 hrs (3 replicates each)
GSE6129_set0_family - swt vs nswt 1mM @ 1hr; 6 replicates
GSE6129_set1_family - swt vs nswt .2mM @ 1 hr, 15 min, 30 min, 18 hr; 3 replicates each
GSE6129_set2_family - sYAP4 vs swt .2mM AND 1 mM each @ 1 hr, 3 replicates each
GSE6129_set3_family - sMET4 vs swt .2mM AND 1 mM each @ 1 hr, 3 replicates each

Organizing the Data

Downloaded data from Thorsen et al.
Changed "Name" to "Standard_ID" in column C.
- Names were changed to the standard names by the "ORF List <-> Gene List" tool from YEASTRACT.
Changed column headers in each sheet:
- swt = stressed wild type, nswt= nonstressed wild type
- 1mM, 0.2mM = concentration of As(III) at which the cell was exposed
- 1h, 15m, 30m, 18h = time point (h = hours, m = minutes)
- rn = replicate number (n)
Inserted MasterIndex in "GSE6068_setA_family"sheet.

Conducting the ANOVA

All data analysis was conducted on data on the "GSE6068_setA_family" sheet.
ANOVA procedure was based on the methods from Week 8.

A new sheet called "swtVnwt_1mM_ANOVA" was created and all data from Columns A-R was copied and pasted into this sheet.
Columns S-V were called "swtVnwt_AvgLogFC_1mM_15m" changing the last part for each time point 15m, 30m, 60m, and 1080m respectively and contained the equation the average of each replicate for each time point.
Column "swtVnwt_ss_HO" was created containing the equation SUMSQ(all LogFC values in row).
Columns X-AA were named "swtVnwt_ss_15m" changing the last part for each time point 15m, 30m, 60m, and 1080m respectively and contained the equation =SUMSQ(average LogFC at that time point)-COUNTA(average LogFC at that time point)*V2^2.
Named column AB "swtVnwt_SS_full" and inserted the sum of columns X-AA.
The next column was named "swtVnwt_Fstat" and inserted the equation =((15-4)/4)*(<swtVnwt_ss_HO>-AB2)/AB2.
Column "swtVnwt_p-value" was created and the =FDIST(<swtVnwt_Fstat>,4,15-4) equation was inputed.
A sanity check was run:
- 1068/4785 p-values are less than 0.05
Two "swtVnwt_Bonferroni_p-value" columns were created, one containing the equation <swtVnwt_p-value>*4785 and the other containing the equation =IF(AE2>1,1,AE2).
A new sheet called "swtVnwt_1mM_ANOVA_B-H" was created, and we copied and pasted Columns A-C in it and pasted column "swtVnwt_p-value" in column D.
Genes were ranked by p-value and then by MasterIndex.
Column "swtVnwt_B-H_p-value" was created containing the equation =(D2*4785)/E2 and then replicated into column G. This column was then pasted into the previous sheet.

Sanity Check

Selected row 1
Autofilter all headers
Filtered for how many p-values are <0.05 for unadjusted p-values, BH p-values, and Bonferroni p-values
Filtered for how many unadjusted p-values for <0.01, <0.001, and 0.0001

All values are reflected in Thorsen P-value Slide

Data & Files

Thorsen Data

Thorsen P-value Slide

Conclusion

We were able to reorganize the data in the way that was best suited for further ANOVA, statistical, and database analyses. We determined that a certain number of genes have p-values less than 0.05, and BH values were determined as well. So far, ANOVA analysis has only been completed on the SetA family dataset because the other datasets include varying conditions. The statistical significance found is in line with the results of Thorsen et al. that observed an increase in genes associated with sulfate assimilation and glutathione biosynthesis pathways.

Acknowledgments

We would like to thank Dr. Dahlquist for providing the instructions in Week 8 that were referenced to perform the ANOVA of the SetA family dataset and for reviewing the analysis. We would also like to thank Thorsen et al. for providing the datasets.

References

LMU BioDB 2019. (2019). Week 8. Retrieved November 25, 2019, from https://xmlpipedb.cs.lmu.edu/biodb/fall2019/index.php/Week_8.
Thorsen, M., Lagniel, G., Kristiansson, E., Junot, C., Nerman, O., Labarre, J., & Tamás, M. J. (2007). Quantitative transcriptome, proteome, and sulfur metabolite profiling of the Saccharomyces cerevisiae response to arsenite. Physiological genomics, 30(1), 35-43. DOI: https://doi.org/10.1152/physiolgenomics.00236.2006

@@ Line 1: / Line 1: @@
 {{Sulfiknights}}
 ==Purpose==
+The purpose of this portion of the investigation is to analyze the data provided by Thorsen et al. in their experiment of profiling changes of ''Saccharomyces cerevisiae''  gene expression in response to exposure to arsenite.
 ==Methods & Results==
@@ Line 24: / Line 25: @@
 *ANOVA procedure was based on the methods from [[Week 8]].
 #A new sheet called "swtVnwt_1mM_ANOVA" was created and all data from Columns A-R was copied and pasted into this sheet.
-#Columns "swtVnwt_AvgLogFC_1mM_15m" - "swtVnwt_AvgLogFC_1mM_1080m" for each time point 15m, 30m, 60m, and 1080m were created and contain the equation the average of each replicate for each time point.
+#Columns S-V were called "swtVnwt_AvgLogFC_1mM_15m" changing the last part for each time point 15m, 30m, 60m, and 1080m respectively and contained the equation the average of each replicate for each time point.
-#Column "swtVnwt_ss_HO" was created containing the equation SUMSQ(D2:R2).
+#Column "swtVnwt_ss_HO" was created containing the equation SUMSQ(all LogFC values in row).
-#Sanity Check: 1068/4785 p-values are less than 0.05
+#Columns X-AA were named "swtVnwt_ss_15m" changing the last part for each time point 15m, 30m, 60m, and 1080m respectively and contained the equation =SUMSQ(average LogFC at that time point)-COUNTA(average LogFC at that time point)*V2^2.
+#Named column AB "swtVnwt_SS_full" and inserted the sum of columns X-AA.
+#The next column was named "swtVnwt_Fstat" and inserted the equation =((15-4)/4)*(<swtVnwt_ss_HO>-AB2)/AB2.
+#Column "swtVnwt_p-value" was created and the =FDIST(<swtVnwt_Fstat>,4,15-4) equation was inputed.
+#A sanity check was run:
+#*1068/4785 p-values are less than 0.05
+#Two "swtVnwt_Bonferroni_p-value" columns were created, one containing the equation <swtVnwt_p-value>*4785 and the other containing the equation =IF(AE2>1,1,AE2).
+#A new sheet called "swtVnwt_1mM_ANOVA_B-H" was created, and we copied and pasted Columns A-C in it and pasted column "swtVnwt_p-value" in column D.
+#Genes were ranked by p-value and then by MasterIndex.
+#Column "swtVnwt_B-H_p-value" was created containing the equation =(D2*4785)/E2 and then replicated into column G. This column was then pasted into the previous sheet.
+===Sanity Check===
+#Selected row 1
+#Autofilter all headers
+#Filtered for how many p-values are <0.05 for unadjusted p-values, BH p-values, and Bonferroni p-values
+#Filtered for how many unadjusted p-values for <0.01, <0.001, and 0.0001
+*All values are reflected in [[Media:Thorsendata_sulfiknights_pvalueslide.pptx|Thorsen P-value Slide]]
 ==Data & Files==
 [[Media:Thorsendata_sulfiknights.xlsx|Thorsen Data]]
+[[Media:Thorsendata_sulfiknights_pvalueslide.pptx|Thorsen P-value Slide]]
 ==Conclusion==
+We were able to reorganize the data in the way that was best suited for further ANOVA, statistical, and database analyses. We determined that a certain number of genes have p-values less than 0.05, and BH values were determined as well. So far, ANOVA analysis has only been completed on the SetA family dataset because the other datasets include varying conditions. The statistical significance found is in line with the results of Thorsen et al. that observed an increase in genes associated with sulfate assimilation and glutathione biosynthesis pathways.
 ==Acknowledgments==
+We would like to thank Dr. Dahlquist for providing the instructions in [[Week 8]] that were referenced to perform the ANOVA of the SetA family dataset and for reviewing the analysis. We would also like to thank Thorsen et al. for providing the datasets.
 ==References==
 *LMU BioDB 2019. (2019). Week 8. Retrieved November 25, 2019, from [https://xmlpipedb.cs.lmu.edu/biodb/fall2019/index.php/Week_8 https://xmlpipedb.cs.lmu.edu/biodb/fall2019/index.php/Week_8].
 *Thorsen, M., Lagniel, G., Kristiansson, E., Junot, C., Nerman, O., Labarre, J., & Tamás, M. J. (2007). Quantitative transcriptome, proteome, and sulfur metabolite profiling of the Saccharomyces cerevisiae response to arsenite. Physiological genomics, 30(1), 35-43. DOI: https://doi.org/10.1152/physiolgenomics.00236.2006

Difference between revisions of "Sulfiknights DA Week 12/13"

Latest revision as of 00:50, 26 November 2019

Contents

Purpose

Methods & Results

Sample to data relationship table:

Organizing the Data

Conducting the ANOVA

Sanity Check

Data & Files

Conclusion

Acknowledgments

References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools