Difference between revisions of "Sulfiknights DA Week 12/13"

From LMU BioDB 2019
Jump to navigation Jump to search
(Sanity Check: added steps)
(Data & Files: added ppt)
Line 46: Line 46:
 
==Data & Files==
 
==Data & Files==
 
[[Media:Thorsendata_sulfiknights.xlsx|Thorsen Data]]
 
[[Media:Thorsendata_sulfiknights.xlsx|Thorsen Data]]
 +
[[Media:Thorsendata_sulfiknights_pvalueslide.pptx|Thorsen P-value Slide]]
  
 
==Conclusion==
 
==Conclusion==

Revision as of 18:56, 25 November 2019

Sulfiknight Links
BIOL Databases Main Page Sulfiknights: Project Overview Page Final Project Deliverables Requirements Sulfiknights: Final Project Deliverables Members Project Manager & Quality Assurance: Naomi Tesfaiohannes Quality Assurance: Joey Nimmers-Minor Data Analysis: Ivy-Quynh Macaraeg & Marcus Avila Designer: DeLisa Madere
Assignment Pages Week 11 Week 12/13 Week 15

Template:Sulfiknights

Purpose

Methods & Results

Sample to data relationship table:

  • GSE6068_setA_family - swt vs nswt 1mM @ 1 hr (6 replicates); 15 min, 30 min, 18 hrs (3 replicates each)
  • GSE6129_set0_family - swt vs nswt 1mM @ 1hr; 6 replicates
  • GSE6129_set1_family - swt vs nswt .2mM @ 1 hr, 15 min, 30 min, 18 hr; 3 replicates each
  • GSE6129_set2_family - sYAP4 vs swt .2mM AND 1 mM each @ 1 hr, 3 replicates each
  • GSE6129_set3_family - sMET4 vs swt .2mM AND 1 mM each @ 1 hr, 3 replicates each

Organizing the Data

  1. Downloaded data from Thorsen et al.
  2. Changed "Name" to "Standard_ID" in column C.
  3. Changed column headers in each sheet:
    • swt = stressed wild type, nswt= nonstressed wild type
    • 1mM, 0.2mM = concentration of As(III) at which the cell was exposed
    • 1h, 15m, 30m, 18h = time point (h = hours, m = minutes)
    • rn = replicate number (n)
  4. Inserted MasterIndex in "GSE6068_setA_family"sheet.

Conducting the ANOVA

  • All data analysis was conducted on data on the "GSE6068_setA_family" sheet.
  • ANOVA procedure was based on the methods from Week 8.
  1. A new sheet called "swtVnwt_1mM_ANOVA" was created and all data from Columns A-R was copied and pasted into this sheet.
  2. Columns S-V were called "swtVnwt_AvgLogFC_1mM_15m" changing the last part for each time point 15m, 30m, 60m, and 1080m respectively and contained the equation the average of each replicate for each time point.
  3. Column "swtVnwt_ss_HO" was created containing the equation SUMSQ(all LogFC values in row).
  4. Columns X-AA were named "swtVnwt_ss_15m" changing the last part for each time point 15m, 30m, 60m, and 1080m respectively and contained the equation =SUMSQ(average LogFC at that time point)-COUNTA(average LogFC at that time point)*V2^2.
  5. Named column AB "swtVnwt_SS_full" and inserted the sum of columns X-AA.
  6. The next column was named "swtVnwt_Fstat" and inserted the equation =((15-4)/4)*(<swtVnwt_ss_HO>-AB2)/AB2.
  7. Column "swtVnwt_p-value" was created and the =FDIST(<swtVnwt_Fstat>,4,15-4) equation was inputed.
  8. A sanity check was run:
    • 1068/4785 p-values are less than 0.05
  9. Two "swtVnwt_Bonferroni_p-value" columns were created, one containing the equation <swtVnwt_p-value>*4785 and the other containing the equation =IF(AE2>1,1,AE2).
  10. A new sheet called "swtVnwt_1mM_ANOVA_B-H" was created, and we copied and pasted Columns A-C in it and pasted column "swtVnwt_p-value" in column D.
  11. Genes were ranked by p-value and then by MasterIndex.
  12. Column "swtVnwt_B-H_p-value" was created containing the equation =(D2*4785)/E2 and then replicated into column G. This column was then pasted into the previous sheet.

Sanity Check

  1. Selected row 1
  2. Autofilter all headers
  3. Filtered for how many p-values are <0.05 for unadjusted p-values, BH p-values, and Bonferroni p-values
  4. Filtered for how many unadjusted p-values for <0.01, <0.001, and 0.0001

Data & Files

Thorsen Data Thorsen P-value Slide

Conclusion

We were able to reorganize the data in the way that was best suited for further ANOVA, statistical, and database analyses. We determined that a certain number of genes have p-values less than 0.05, and BH values were determined as well. So far, ANOVA analysis has been completed on the SetA family dataset.

Acknowledgments

References