Difference between revisions of "Sulfiknights DA Week 12/13"

From LMU BioDB 2019
Jump to navigation Jump to search
(References: added reference)
(Methods & Results: added steps)
Line 24: Line 24:
 
*ANOVA procedure was based on the methods from [[Week 8]].
 
*ANOVA procedure was based on the methods from [[Week 8]].
 
#A new sheet called "swtVnwt_1mM_ANOVA" was created and all data from Columns A-R was copied and pasted into this sheet.
 
#A new sheet called "swtVnwt_1mM_ANOVA" was created and all data from Columns A-R was copied and pasted into this sheet.
#Columns "swtVnwt_AvgLogFC_1mM_15m" - "swtVnwt_AvgLogFC_1mM_1080m" for each time point 15m, 30m, 60m, and 1080m were created and contain the equation the average of each replicate for each time point.  
+
#Columns S-V were called "swtVnwt_AvgLogFC_1mM_15m" changing the last part for each time point 15m, 30m, 60m, and 1080m respectively and contained the equation the average of each replicate for each time point.  
#Column "swtVnwt_ss_HO" was created containing the equation SUMSQ(D2:R2).  
+
#Column "swtVnwt_ss_HO" was created containing the equation SUMSQ(all LogFC values in row).
#Sanity Check: 1068/4785 p-values are less than 0.05
+
#Columns X-AA were named "swtVnwt_ss_15m" changing the last part for each time point 15m, 30m, 60m, and 1080m respectively and contained the equation =SUMSQ(average LogFC at that time point)-COUNTA(average LogFC at that time point)*V2^2.
 +
#Named column AB "swtVnwt_SS_full" and inserted the sum of columns X-AA.
 +
#The next column was named "swtVnwt_Fstat" and inserted the equation =((15-4)/4)*(<swtVnwt_ss_HO>-AB2)/AB2.
 +
#Column "swtVnwt_p-value" was created and the =FDIST(<swtVnwt_Fstat>,4,15-4) equation was inputed.  
 +
#A sanity check was run:
 +
#*1068/4785 p-values are less than 0.05
 +
#Two "swtVnwt_Bonferroni_p-value" columns were created, one containing the equation <swtVnwt_p-value>*4785 and the other containing the equation =IF(AE2>1,1,AE2).
 +
#A new sheet called "swtVnwt_1mM_ANOVA_B-H" was created, and we copied and pasted Columns A-C in it and pasted column "swtVnwt_p-value" in column D.
 +
#Genes were ranked by p-value and then by MasterIndex.
 +
#Column "swtVnwt_B-H_p-value" was created containing the equation =(D2*4785)/E2 and then replicated into column G. This column was then pasted into the previous sheet.
  
 
==Data & Files==
 
==Data & Files==

Revision as of 18:17, 25 November 2019

Sulfiknight Links
BIOL Databases Main Page Sulfiknights: Project Overview Page Final Project Deliverables Requirements Sulfiknights: Final Project Deliverables Members Project Manager & Quality Assurance: Naomi Tesfaiohannes Quality Assurance: Joey Nimmers-Minor Data Analysis: Ivy-Quynh Macaraeg & Marcus Avila Designer: DeLisa Madere
Assignment Pages Week 11 Week 12/13 Week 15

Template:Sulfiknights

Purpose

Methods & Results

Sample to data relationship table:

  • GSE6068_setA_family - swt vs nswt 1mM @ 1 hr (6 replicates); 15 min, 30 min, 18 hrs (3 replicates each)
  • GSE6129_set0_family - swt vs nswt 1mM @ 1hr; 6 replicates
  • GSE6129_set1_family - swt vs nswt .2mM @ 1 hr, 15 min, 30 min, 18 hr; 3 replicates each
  • GSE6129_set2_family - sYAP4 vs swt .2mM AND 1 mM each @ 1 hr, 3 replicates each
  • GSE6129_set3_family - sMET4 vs swt .2mM AND 1 mM each @ 1 hr, 3 replicates each

Organizing the Data

  1. Downloaded data from Thorsen et al.
  2. Changed "Name" to "Standard_ID" in column C.
  3. Changed column headers in each sheet:
    • swt = stressed wild type, nswt= nonstressed wild type
    • 1mM, 0.2mM = concentration of As(III) at which the cell was exposed
    • 1h, 15m, 30m, 18h = time point (h = hours, m = minutes)
    • rn = replicate number (n)
  4. Inserted MasterIndex in "GSE6068_setA_family"sheet.

Conducting the ANOVA

  • All data analysis was conducted on data on the "GSE6068_setA_family" sheet.
  • ANOVA procedure was based on the methods from Week 8.
  1. A new sheet called "swtVnwt_1mM_ANOVA" was created and all data from Columns A-R was copied and pasted into this sheet.
  2. Columns S-V were called "swtVnwt_AvgLogFC_1mM_15m" changing the last part for each time point 15m, 30m, 60m, and 1080m respectively and contained the equation the average of each replicate for each time point.
  3. Column "swtVnwt_ss_HO" was created containing the equation SUMSQ(all LogFC values in row).
  4. Columns X-AA were named "swtVnwt_ss_15m" changing the last part for each time point 15m, 30m, 60m, and 1080m respectively and contained the equation =SUMSQ(average LogFC at that time point)-COUNTA(average LogFC at that time point)*V2^2.
  5. Named column AB "swtVnwt_SS_full" and inserted the sum of columns X-AA.
  6. The next column was named "swtVnwt_Fstat" and inserted the equation =((15-4)/4)*(<swtVnwt_ss_HO>-AB2)/AB2.
  7. Column "swtVnwt_p-value" was created and the =FDIST(<swtVnwt_Fstat>,4,15-4) equation was inputed.
  8. A sanity check was run:
    • 1068/4785 p-values are less than 0.05
  9. Two "swtVnwt_Bonferroni_p-value" columns were created, one containing the equation <swtVnwt_p-value>*4785 and the other containing the equation =IF(AE2>1,1,AE2).
  10. A new sheet called "swtVnwt_1mM_ANOVA_B-H" was created, and we copied and pasted Columns A-C in it and pasted column "swtVnwt_p-value" in column D.
  11. Genes were ranked by p-value and then by MasterIndex.
  12. Column "swtVnwt_B-H_p-value" was created containing the equation =(D2*4785)/E2 and then replicated into column G. This column was then pasted into the previous sheet.

Data & Files

Thorsen Data

Conclusion

Acknowledgments

References