Hivanson Week 9

I created a new worksheet and named it "dCIN5_ANOVA"
I copied all data from the "Master_Sheet" worksheet and pasted it into dCIN5_ANOVA.
I created five column headers of the form dCIN5_AvgLogFC_(TIME) where (TIME) is 15, 30, 60, 90, and 120.
In the cell below the dCIN5_AvgLogFC_t15 header, I typed =AVERAGE(
Then I highlighted all the data in row 2 associated with t15, pressed the closing paren key, and pressed the "enter" key.
I extended this formula down for all genes.
I repeated this averaging process with the t30, t60, t90, and the t120 data.
In the first empty column to the right of the dCIN5_AvgLogFC_t120 calculation, I created the column header dCIN5_ss_HO.
In the first cell below this header, I typed =SUMSQ(
I highlighted all the LogFC data in row 2 until the average, pressed the closing paren key, and pressed the "enter" key.
In the next empty column to the right of dCIN5_ss_HO, I created the column headers dCIN5_ss_(TIME) as in (3).
In the first cell below the header dCIN5_ss_t15, I typed =SUMSQ(<range of cells for logFC_t15>)-COUNTA(<range of cells for logFC_t15>)*<AvgLogFC_t15>^2 and hit enter.
I extended this formula down for all genes.
I repeated this computation for the t30 through t120 data points. =
In the first column to the right of dCIN5_ss_t120, I created the column header dCIN5_SS_full.
In the first row below this header, I type =sum(<range of cells containing "ss" for each timepoint>) and hit enter.
In the next two columns to the right, I created the headers dCIN5_Fstat and dCIN5_p-value.
In the first cell of the dCIN5_Fstat column, I typed =((20-5)/5)*(<dCIN5_ss_HO>-<dCIN5_SS_full>)/<dCIN5_SS_full> and hit enter.
- I replaced the phrase dCIN5_ss_HO with the cell designation.
- I replaced the phrase <dCIN5_SS_full> with the cell designation.
- I copied this to the whole column.
In the first cell below the dCIN5_p-value header, I typed =FDIST(<dCIN5_Fstat>,5,20-5)
I performed a quick sanity check to see if all of these computations were done correctly.
- I filtered the dCIN5_p-value column so that the p value has to be less than 0.05.
- Before further calculation, I undid this filter.

Calculating the Bonferroni and p value Correction

I labeled the next two columns to the right with the same label, dCIN5_Bonferroni_p-value.
I type the equation =<dCIN5_p-value>*6189, and copied to all genes
I replace any corrected p value that is greater than 1 by the number 1 by typing the following formula into the first cell below the second dCIN5_Bonferroni_p-value header: =IF(dCIN5_Bonferroni_p-value>1,1,dCIN5_Bonferroni_p-value), and copied to all genes.

Calculating the Benjamini & Hochberg p value Correction

I inserted a new worksheet named "b-h_ANOVA".
I copy and paste the "MasterIndex", "ID", and "Standard Name" columns from Master_Sheet_dCIN5 into the first two columns of the new worksheet.
I copied my unadjusted p values from your ANOVA worksheet and pasted it into Column D using "paste values."
I selected all of columns A, B, C, and D. Sort by ascending values on Column D.
I typed the header "Rank" in cell E1 and created a series of numbers in ascending order from 1 to 6189 in this column.
To calculate the Benjamini and Hochberg p value correction, I typed dCIN5_B-H_p-value in cell F1. I typed the following formula in cell F2: =(D2*6189)/E2 and pressed enter. I copied that equation to the entire column.
I typed "dCIN5-H_p-value" into cell G1.
I typed the following formula into cell G2: =IF(F2>1,1,F2) and pressed enter. I copied that equation to the entire column.
I selected columns A through G.
I sorted them by my Column A MasterIndex in ascending order.
I copied column G and used paste values to paste it into the next column on the right of your ANOVA_dCIN5 sheet.

Sanity Check: Number of genes significantly changed

In the ANOVA_dCIN5 worksheet, I filtered the unadjusted p value to display only those with a p value of less than 0.05, 0.01, 0.001, and 0.0001.
I used =SUBTOTAL(3,A:A) to count the total output, then subtracted 1 to get the number of genes that fit the filter.
For the percentage, I used =(100*(<subtotal>-1))/6189). Results as follows:

How many genes have p < 0.05? and what is the percentage (out of 6189)?
- 2290 genes; 37.0%

How many genes have p < 0.01? and what is the percentage (out of 6189)?
- 1380 genes; 22.3%

How many genes have p < 0.001? and what is the percentage (out of 6189)?
- 691 genes; 11.2%

How many genes have p < 0.0001? and what is the percentage (out of 6189)?
- 358 genes; 5.8%

I repeated the above steps for the Bonferroni-corrected p value of less than 0.05, and the Benjamini and Hochberg-corrected p value of less than 0.05. Results are as follows:

How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 6189)?
- 151 genes; 2.4%

How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 6189)?
- 1453 genes; 23.5%

Find NSR1 in your dataset. What is its unadjusted, Bonferroni-corrected, and B-H-corrected p values? What is its average Log fold change at each of the timepoints in the experiment?

Unadjusted p value: 6.37625E-08
Bonferroni-corrected p value: 0.000394626
B-H-corrected p value: 2.19237E-05
Average Log fold change @ 15 minutes: 4.070025
Average Log fold change @ 30 minutes: 3.611475
Average Log fold change @ 60 minutes: 4.2985
Average Log fold change @ 90 minutes: -2.900925
Average Log fold change @ 120 minutes: -0.9315
NSR1 shows increased expression from time 15 minutes through 60 minutes. At 90 minutes, NSR1 expression decreases, and at 120 minutes, the expression of NSR1 remains decreased. There is a significant increase or decrease at at least one of these points.

What is IMD3's unadjusted, Bonferroni-corrected, and B-H-corrected p values? What is IMD3's average Log fold change at each of the timepoints in the experiment?

Unadjusted p value: 0.111670609
Bonferroni-corrected p value: 1
B-H-corrected p value: 0.232000469
Average Log fold change @ 15 minutes: 1.638433333
Average Log fold change @ 30 minutes: -0.100766667
Average Log fold change @ 60 minutes: 1.659233333
Average Log fold change @ 90 minutes: -0.608333333
Average Log fold change @ 120 minutes: -0.168133333

Data & Files

Excel microarray data

p value table slide for ∆CIN5

Conclusion

Acknowledgments

References

Template:Hivanson

Hivanson Week 9

Contents

Purpose

Methods/Results

Statistical Analysis Part 1: ANOVA

Calculating the Bonferroni and p value Correction

Calculating the Benjamini & Hochberg p value Correction

Sanity Check: Number of genes significantly changed

Data & Files

Conclusion

Acknowledgments

References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools