Hivanson Week 9
Jump to navigation
Jump to search
Contents
Purpose
Methods/Results
Strain name: ∆CIN5 strain
Filename: HI_BIOL367_S24_microarray-data_dCIN5.xcls
Number of replicates per strain: 4
Timepoints: 15 minutes, 30 minutes, 60 minutes, 90 minutes, 120 minutes
Statistical Analysis Part 1: ANOVA
- I created a new worksheet and named it "dCIN5_ANOVA"
- I copied all data from the "Master_Sheet" worksheet and pasted it into dCIN5_ANOVA.
- I created five column headers of the form dCIN5_AvgLogFC_(TIME) where (TIME) is 15, 30, 60, 90, and 120.
- In the cell below the dCIN5_AvgLogFC_t15 header, I typed
=AVERAGE(
- Then I highlighted all the data in row 2 associated with t15, pressed the closing paren key, and pressed the "enter" key.
- I extended this formula down for all genes.
- I repeated this averaging process with the t30, t60, t90, and the t120 data.
- In the first empty column to the right of the dCIN5_AvgLogFC_t120 calculation, I created the column header dCIN5_ss_HO.
- In the first cell below this header, I typed
=SUMSQ(
- I highlighted all the LogFC data in row 2 until the average, pressed the closing paren key, and pressed the "enter" key.
- In the next empty column to the right of dCIN5_ss_HO, I created the column headers dCIN5_ss_(TIME) as in (3).
- In the first cell below the header dCIN5_ss_t15, I typed
=SUMSQ(<range of cells for logFC_t15>)-COUNTA(<range of cells for logFC_t15>)*<AvgLogFC_t15>^2
and hit enter. - I extended this formula down for all genes.
- I repeated this computation for the t30 through t120 data points. =
- In the first column to the right of dCIN5_ss_t120, I created the column header dCIN5_SS_full.
- In the first row below this header, I type
=sum(<range of cells containing "ss" for each timepoint>)
and hit enter. - In the next two columns to the right, I created the headers dCIN5_Fstat and dCIN5_p-value.
- In the first cell of the dCIN5_Fstat column, I typed
=((20-5)/5)*(<dCIN5_ss_HO>-<dCIN5_SS_full>)/<dCIN5_SS_full>
and hit enter.- I replaced the phrase dCIN5_ss_HO with the cell designation.
- I replaced the phrase <dCIN5_SS_full> with the cell designation.
- I copied this to the whole column.
- In the first cell below the dCIN5_p-value header, I typed
=FDIST(<dCIN5_Fstat>,5,20-5)
- I performed a quick sanity check to see if all of these computations were done correctly.
- I filtered the dCIN5_p-value column so that the p value has to be less than 0.05.
- Before further calculation, I undid this filter.
Calculating the Bonferroni and p value Correction
- I labeled the next two columns to the right with the same label, dCIN5_Bonferroni_p-value.
- I type the equation
=<dCIN5_p-value>*6189
, and copied to all genes - I replace any corrected p value that is greater than 1 by the number 1 by typing the following formula into the first cell below the second dCIN5_Bonferroni_p-value header:
=IF(dCIN5_Bonferroni_p-value>1,1,dCIN5_Bonferroni_p-value)
, and copied to all genes.
Calculating the Benjamini & Hochberg p value Correction
- I inserted a new worksheet named "b-h_ANOVA".
- I copy and paste the "MasterIndex", "ID", and "Standard Name" columns from Master_Sheet_dCIN5 into the first two columns of the new worksheet.
- I copied my unadjusted p values from your ANOVA worksheet and pasted it into Column D using "paste values."
- I selected all of columns A, B, C, and D. Sort by ascending values on Column D.
- I typed the header "Rank" in cell E1 and created a series of numbers in ascending order from 1 to 6189 in this column.
- To calculate the Benjamini and Hochberg p value correction, I typed dCIN5_B-H_p-value in cell F1. I typed the following formula in cell F2:
=(D2*6189)/E2
and pressed enter. I copied that equation to the entire column. - I typed "dCIN5-H_p-value" into cell G1.
- I typed the following formula into cell G2:
=IF(F2>1,1,F2)
and pressed enter. I copied that equation to the entire column. - I selected columns A through G.
- I sorted them by my Column A MasterIndex in ascending order.
- I copied column G and used paste values to paste it into the next column on the right of your ANOVA_dCIN5 sheet.
Sanity Check: Number of genes significantly changed
- In the ANOVA_dCIN5 worksheet, I filtered the unadjusted p value to display only those with a p value of less than 0.05, 0.01, 0.001, and 0.0001.
- I used
=SUBTOTAL(3,A:A)
to count the total output, then subtracted 1 to get the number of genes that fit the filter. - For the percentage, I used
=(100*(<subtotal>-1))/6189)
. Results as follows:
- How many genes have p < 0.05? and what is the percentage (out of 6189)?
- 2290 genes; 37.0%
- How many genes have p < 0.01? and what is the percentage (out of 6189)?
- 1380 genes; 22.3%
- How many genes have p < 0.001? and what is the percentage (out of 6189)?
- 691 genes; 11.2%
- How many genes have p < 0.0001? and what is the percentage (out of 6189)?
- 358 genes; 5.8%
- I repeated the above steps for the Bonferroni-corrected p value of less than 0.05, and the Benjamini and Hochberg-corrected p value of less than 0.05. Results are as follows:
- How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 6189)?
- 151 genes; 2.4%
- How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 6189)?
- 1453 genes; 23.5%
Find NSR1 in your dataset. What is its unadjusted, Bonferroni-corrected, and B-H-corrected p values? What is its average Log fold change at each of the timepoints in the experiment?
- Unadjusted p value: 6.37625E-08
- Bonferroni-corrected p value: 0.000394626
- B-H-corrected p value: 2.19237E-05
- Average Log fold change @ 15 minutes: 4.070025
- Average Log fold change @ 30 minutes: 3.611475
- Average Log fold change @ 60 minutes: 4.2985
- Average Log fold change @ 90 minutes: -2.900925
- Average Log fold change @ 120 minutes: -0.9315
- NSR1 shows increased expression from time 15 minutes through 60 minutes. At 90 minutes, NSR1 expression decreases, and at 120 minutes, the expression of NSR1 remains decreased. There is a significant increase or decrease at at least one of these points.
What is IMD3's unadjusted, Bonferroni-corrected, and B-H-corrected p values? What is IMD3's average Log fold change at each of the timepoints in the experiment?
- Unadjusted p value: 0.111670609
- Bonferroni-corrected p value: 1
- B-H-corrected p value: 0.232000469
- Average Log fold change @ 15 minutes: 1.638433333
- Average Log fold change @ 30 minutes: -0.100766667
- Average Log fold change @ 60 minutes: 1.659233333
- Average Log fold change @ 90 minutes: -0.608333333
- Average Log fold change @ 120 minutes: -0.168133333
Data & Files
Conclusion
Acknowledgments
References
- Hivanson
- Hivanson Week 1 | Week 1 Assignment
- Hivanson Week 2 | Week 2 Assignment
- IMD3 Hivanson and Nstojan1 Week 3 | Week 3 Assignment
- NeMO Week 4 | Week 4 Assignment
- Hivanson Week 5 | Week 5 Assignment
- Hivanson Week 6 | Week 6 Assignment
- Hivanson Week 8 | Week 8 Assignment
- Hivanson Week 9 | Week 9 Assignment
- Hivanson Week 10 | Week 10 Assignment
- Hivanson Week 12 | Week 12 Assignment
- Hivanson Week 13 | Week 13 Assignment
- Hivanson Week 14 | Week 14 Assignment
- Hivanson Week 15 | Week 15 Assignment
- Main page