Dmadere Week 8
Jump to navigation
Jump to search
Contents
Purpose
Method/Results
Statistical Analysis Part1: ANOVA
- Created a new worksheet named “dCIN5_ANOVA”.
- Copied the first three columns containing the "MasterIndex", "ID", and "Standard Name" from the "Master_Sheet" worksheet for dCIN5 and pasted it into your new worksheet. Copied the columns containing the data for dCIN5 and pasted it into your new worksheet.
- At the top of the first column to the right of data, created five column headers of the form dCIN5_AvgLogFC_(TIME) where (TIME) is 15, 30, etc.
- In the cell below the dCIN5_AvgLogFC_t15 header, typed =AVERAGE(
- Highlighted all the data in row 2 associated with t15, pressed the closing paren key (shift 0), and pressed the "enter" key.
- This cells contain the average of the log fold change data from the first gene at t=15 minutes.
- Click on this cell and position your cursor at the bottom right corner. You should see your cursor change to a thin black plus sign (not a chubby white one). When it does, double click, and the formula will magically be copied to the entire column of 6188 other genes.
- Repeated steps (4) through (8) with the t30, t60, t90, and the t120 data.
- In the first empty column to the right of the dCIN_AvgLogFC_t120 calculation, created the column header dCIN5_ss_HO.
- In the first cell below header, typed =SUMSQ(
- Highlighted all the LogFC data in row 2 (but not the AvgLogFC), pressed the closing paren key (shift 0), and pressed the "enter" key.
- In the next empty column to the right of dCIN5_ss_HO, created the column headers dCIN5_ss_(TIME) as in (3).
- Made a note of how many data points had at each time point for dCIN5. For dCIN5, there are 4. Counted carefully. Also, made a note of the total number of data points. For dCIN5, there was 20.
- Below the header dCIN5_ss_t15, typed =SUMSQ(<range of cells for logFC_t15>)-COUNTA(<range of cells for logFC_t15>)*<AvgLogFC_t15>^2 and hit enter.
- The COUNTA function counts the number of cells in the specified range that have data in them (i.e., does not count cells with missing values).
- The phrase <range of cells for logFC_t15> should be replaced by the data range associated with t15.
- The phrase <AvgLogFC_t15> should be replaced by the cell number in which you computed the AvgLogFC for t15, and the "^2" squares that value.
- Upon completion of this single computation, used the Step (7) trick to copy the formula throughout the column.
- Repeated this computation for the t30 through t120 data points. Obtained the data for each time point, types the right number of data points, and got average from the appropriate cell for each time point, and copied the formula to the whole column for each computation.
- In the first column to the right of dCIN5_ss_t120, created the column header dCIN5_SS_full.
- In the first row below this header, typed =sum(<range of cells containing "ss" for each timepoint>) and pressed enter.
- In the next two columns to the right, created the headers dCIN5_Fstat and dCIN5_p-value.
- Recalled the number of data points from (13): called that total n.
- In the first cell of the dCIN5_Fstat column, typed =((n-5)/5)*(<dCIN5_ss_HO>- <dCIN5_SS_full>)/<dCIN5_SS_full> and pressed enter.
- Didn’t actually type the n but instead used the number from (13). Also note that "5" is the number of timepoints.
- Replaced the phrase dCIN5_ss_HO with the cell designation.
- Replaced the phrase <dCIN5_SS_full> with the cell designation.
- Copied to the whole column.
- In the first cell below the dCIN5_p-value header, typed =FDIST(<dCIN5_Fstat>,5,n-5) replacing the phrase <dCIN5_Fstat> with the cell designation and the "n" as in (13) with the number of data points total. Copied to the whole column.
- Before moved on to the next step, performed a quick sanity check to see if we did all of these computations correctly.
- Clicked on cell A1 and click on the Data tab. Selected the Filter icon (looks like a funnel). Little drop-down arrows appeared at the top of each column. This enabled us to filter the data according to criteria we set.
- Clicked on the drop-down arrow on dCIN5_p-value column. Select "Number Filters". Set a criterion that filtered data so that the p value has to be less than 0.05.
- Excel only displayed the rows that correspond to data meeting that filtering criterion. A number appeared in the lower left hand corner of the window giving you the number of rows that meet that criterion. We will check our results with each other to make sure that the computations were performed correctly.
- Undid any filters that you have applied before making any additional calculations.
Calculated the Bonferroni and p value Correction
- Performed adjustments to the p value to correct for the multiple testing problem. Labeled the next two columns to the right with the same label, dCIN5_Bonferroni_p-value.
- Typed the equation =<dCIN5_p-value>*6189, Upon completion of this single computation, used the Step (10) trick to copy the formula throughout the column.
- Replaced any corrected p value that is greater than 1 by the number 1 by typing the following formula into the first cell below the second dCIN5_Bonferroni_p-value header: =IF(dCIN5_Bonferroni_p-value>1,1,(STRAIN)_Bonferroni_p-value), where "dCIN5_Bonferroni_p-value" referred to the cell in which the first Bonferroni p value computation was made. Used the Step (10) trick to copy the formula throughout the column.
Calculated the Benjamini & Hochberg p value Correction
- Inserted a new worksheet named "dCIN5_ANOVA_B-H".
- Copied and pasted the "MasterIndex", "ID", and "Standard Name" columns from previous worksheet into first two columns of the new worksheet.
- Used Paste special > Paste values. Copied unadjusted p values from ANOVA worksheet and pasted it into Column D.
- Selected all of columns A, B, C, and D. Sorted by ascending values on Column D. Clicked the sort button from A to Z on the toolbar, in the window that appears, sorted by column D, smallest to largest.
- Typed the header "Rank" in cell E1. Created a series of numbers in ascending order from 1 to 6189 in this column. This is the p value rank, smallest to largest. Type "1" into cell E2 and "2" into cell E3. Selected both cells E2 and E3. Double-clicked on the plus sign on the lower right-hand corner of your selection to fill the column with a series of numbers from 1 to 6189.
- Calculated the Benjamini and Hochberg p value correction. Type dCIN5_B-H_p-value in cell F1. Typed the following formula in cell F2: =(D2*6189)/E2 and press enter. Copied that equation to the entire column.
- Typed "dCIN5 _B-H_p-value" into cell G1.
- Typed the following formula into cell G2: =IF(F2>1,1,F2) and pressed enter. Copied that equation to the entire column.
- Selected columns A through G. Sorted them by MasterIndex in Column A in ascending order.
- Copied column G and used Paste special > Paste values to paste it into the next column on the right of your ANOVA sheet.
Clustering and GO Term Enrichment with Stem (Part 2)
Data & Files
DM_dCIN5 Percentages Presentation