Hivanson Week 9

From LMU BioDB 2024
Revision as of 22:19, 20 March 2024 by Hivanson (talk | contribs) (Sanity Check: Number of genes significantly changed: reformat questions about individual genes)
Jump to navigation Jump to search

Purpose

Methods/Results

Strain name: ∆CIN5 strain

Filename: HI_BIOL367_S24_microarray-data_dCIN5.xcls

Number of replicates per strain: 4

Timepoints: 15 minutes, 30 minutes, 60 minutes, 90 minutes, 120 minutes

Statistical Analysis Part 1: ANOVA

  1. I created a new worksheet and named it "dCIN5_ANOVA"
  2. I copied all data from the "Master_Sheet" worksheet and pasted it into dCIN5_ANOVA.
  3. I created five column headers of the form dCIN5_AvgLogFC_(TIME) where (TIME) is 15, 30, 60, 90, and 120.
  4. In the cell below the dCIN5_AvgLogFC_t15 header, I typed =AVERAGE(
  5. Then I highlighted all the data in row 2 associated with t15, pressed the closing paren key, and pressed the "enter" key.
  6. I extended this formula down for all genes.
  7. I repeated this averaging process with the t30, t60, t90, and the t120 data.
  8. In the first empty column to the right of the dCIN5_AvgLogFC_t120 calculation, I created the column header dCIN5_ss_HO.
  9. In the first cell below this header, I typed =SUMSQ(
  10. I highlighted all the LogFC data in row 2 until the average, pressed the closing paren key, and pressed the "enter" key.
  11. In the next empty column to the right of dCIN5_ss_HO, I created the column headers dCIN5_ss_(TIME) as in (3).
  12. In the first cell below the header dCIN5_ss_t15, I typed =SUMSQ(<range of cells for logFC_t15>)-COUNTA(<range of cells for logFC_t15>)*<AvgLogFC_t15>^2 and hit enter.
  13. I extended this formula down for all genes.
  14. I repeated this computation for the t30 through t120 data points. =
  15. In the first column to the right of dCIN5_ss_t120, I created the column header dCIN5_SS_full.
  16. In the first row below this header, I type =sum(<range of cells containing "ss" for each timepoint>) and hit enter.
  17. In the next two columns to the right, I created the headers dCIN5_Fstat and dCIN5_p-value.
  18. In the first cell of the dCIN5_Fstat column, I typed =((20-5)/5)*(<dCIN5_ss_HO>-<dCIN5_SS_full>)/<dCIN5_SS_full> and hit enter.
    • I replaced the phrase dCIN5_ss_HO with the cell designation.
    • I replaced the phrase <dCIN5_SS_full> with the cell designation.
    • I copied this to the whole column.
  19. In the first cell below the dCIN5_p-value header, I typed =FDIST(<dCIN5_Fstat>,5,20-5)
  20. I performed a quick sanity check to see if all of these computations were done correctly.
    • I filtered the dCIN5_p-value column so that the p value has to be less than 0.05.
    • Before further calculation, I undid this filter.

Calculating the Bonferroni and p value Correction

  1. I labeled the next two columns to the right with the same label, dCIN5_Bonferroni_p-value.
  2. I type the equation =<dCIN5_p-value>*6189, and copied to all genes
  3. I replace any corrected p value that is greater than 1 by the number 1 by typing the following formula into the first cell below the second dCIN5_Bonferroni_p-value header: =IF(dCIN5_Bonferroni_p-value>1,1,dCIN5_Bonferroni_p-value), and copied to all genes.

Calculating the Benjamini & Hochberg p value Correction

  1. I inserted a new worksheet named "b-h_ANOVA".
  2. I copy and paste the "MasterIndex", "ID", and "Standard Name" columns from Master_Sheet_dCIN5 into the first two columns of the new worksheet.
  3. I copied my unadjusted p values from your ANOVA worksheet and pasted it into Column D using "paste values."
  4. I selected all of columns A, B, C, and D. Sort by ascending values on Column D.
  5. I typed the header "Rank" in cell E1 and created a series of numbers in ascending order from 1 to 6189 in this column.
  6. To calculate the Benjamini and Hochberg p value correction, I typed dCIN5_B-H_p-value in cell F1. I typed the following formula in cell F2: =(D2*6189)/E2 and pressed enter. I copied that equation to the entire column.
  7. I typed "dCIN5-H_p-value" into cell G1.
  8. I typed the following formula into cell G2: =IF(F2>1,1,F2) and pressed enter. I copied that equation to the entire column.
  9. I selected columns A through G.
  10. I sorted them by my Column A MasterIndex in ascending order.
  11. I copied column G and used paste values to paste it into the next column on the right of your ANOVA_dCIN5 sheet.

Sanity Check: Number of genes significantly changed

  1. In the ANOVA_dCIN5 worksheet, I filtered the unadjusted p value to display only those with a p value of less than 0.05, 0.01, 0.001, and 0.0001.
  2. I used =SUBTOTAL(3,A:A) to count the total output, then subtracted 1 to get the number of genes that fit the filter.
  3. For the percentage, I used =(100*(<subtotal>-1))/6189). Results as follows:
  • How many genes have p < 0.05? and what is the percentage (out of 6189)?
    • 2290 genes; 37.0%
  • How many genes have p < 0.01? and what is the percentage (out of 6189)?
    • 1380 genes; 22.3%
  • How many genes have p < 0.001? and what is the percentage (out of 6189)?
    • 691 genes; 11.2%
  • How many genes have p < 0.0001? and what is the percentage (out of 6189)?
    • 358 genes; 5.8%
  1. I repeated the above steps for the Bonferroni-corrected p value of less than 0.05, and the Benjamini and Hochberg-corrected p value of less than 0.05. Results are as follows:
  • How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 6189)?
    • 151 genes; 2.4%
  • How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 6189)?
    • 1453 genes; 23.5%

Find NSR1 in your dataset. What is its unadjusted, Bonferroni-corrected, and B-H-corrected p values? What is its average Log fold change at each of the timepoints in the experiment?

  • Unadjusted p value: 6.37625E-08
  • Bonferroni-corrected p value: 0.000394626
  • B-H-corrected p value: 2.19237E-05
  • Average Log fold change @ 15 minutes: 4.070025
  • Average Log fold change @ 30 minutes: 3.611475
  • Average Log fold change @ 60 minutes: 4.2985
  • Average Log fold change @ 90 minutes: -2.900925
  • Average Log fold change @ 120 minutes: -0.9315
  • NSR1 shows increased expression from time 15 minutes through 60 minutes. At 90 minutes, NSR1 expression decreases, and at 120 minutes, the expression of NSR1 remains decreased. There is a significant increase or decrease at at least one of these points.

What is IMD3's unadjusted, Bonferroni-corrected, and B-H-corrected p values? What is IMD3's average Log fold change at each of the timepoints in the experiment?

  • Unadjusted p value: 0.111670609
  • Bonferroni-corrected p value: 1
  • B-H-corrected p value: 0.232000469
  • Average Log fold change @ 15 minutes: 1.638433333
  • Average Log fold change @ 30 minutes: -0.100766667
  • Average Log fold change @ 60 minutes: 1.659233333
  • Average Log fold change @ 90 minutes: -0.608333333
  • Average Log fold change @ 120 minutes: -0.168133333

Data & Files

Excel microarray data

p value table slide for ∆CIN5

Conclusion

Acknowledgments

References

Template:Hivanson