Difference between revisions of "Aporras1 Week 8"
From LMU BioDB 2017
(added banjamini calc) |
(→Calculate the Benjamini & Hochberg p value Correction: changed present to past tense and filled STRAIN with dZAP1) |
||
Line 53: | Line 53: | ||
# Replaced any corrected p value that is greater than 1 by the number 1 by typing the following formula into the first cell below the second dZAP1_Bonferroni_p-value header: <code>=IF(dZAP1_Bonferroni_p-value>1,1,dZAP1_Bonferroni_p-value)</code>, where "dZAP1_Bonferroni_p-value" refers to the cell in which the first Bonferroni p value computation was made. Used the Step (7) trick to copy the formula throughout the column. | # Replaced any corrected p value that is greater than 1 by the number 1 by typing the following formula into the first cell below the second dZAP1_Bonferroni_p-value header: <code>=IF(dZAP1_Bonferroni_p-value>1,1,dZAP1_Bonferroni_p-value)</code>, where "dZAP1_Bonferroni_p-value" refers to the cell in which the first Bonferroni p value computation was made. Used the Step (7) trick to copy the formula throughout the column. | ||
− | + | === Calculate the Benjamini & Hochberg p value Correction === | |
− | # | + | # Inserted a new worksheet named "dZAP1_ANOVA_B-H". |
− | # | + | # Copied and pasted the "MasterIndex", "ID", and "Standard Name" columns from the previous worksheet into the first two columns of the new worksheet. |
− | # For the following, | + | # For the following, used Paste special > Paste values. Copied the unadjusted p values from theANOVA worksheet and pasted it into Column D. |
− | # | + | # Selected all of columns A, B, C, and D. Sorted by ascending values on Column D. Clicked the sort button from A to Z on the toolbar, in the window that appeared, sorted by column D, smallest to largest. |
− | # | + | # Typed the header "Rank" in cell E1. Typed "1" into cell E2 and "2" into cell E3. Selected |
− | # | + | both cells E2 and E3. Double-clicked on the plus sign on the lower right-hand corner of the selection to fill the column with a series of numbers from 1 to 6189. |
− | # | + | # Typed dZAP1_B-H_p-value in cell F1. Typed the following formula in cell F2: <code>=(D2*6189)/E2</code> and pressed enter. Copied the equation to the entire column. |
− | # | + | # Typed "dZAP1_B-H_p-value" into cell G1. |
− | # | + | # Typed the following formula into cell G2: <code>=IF(F2>1,1,F2)</code> and pressed enter. Copied the equation to the entire column. |
− | # | + | # Selected columns A through G. Sorted them by the MasterIndex in Column A in ascending order. |
+ | # Copied column G and used Paste special > Paste values to paste it into the next column on the right of the ANOVA sheet. | ||
==Acknowledgements== | ==Acknowledgements== |
Revision as of 04:06, 24 October 2017
User page: Antonio Porras
Assignment page: Week 8
Contents
Electronic Notebook
Statistical Analysis Part 1
The purpose of the witin-stain ANOVA test is to determine if any genes had a gene expression change that was significantly different than zero at any timepoint.
- Downloaded the excel file and renamed it AP dZAP1.
- Created a new worksheet, naming it dZAP1_ANOVA".
- Copied the first three columns containing the "MasterIndex", "ID", and "Standard Name" from the "Master_Sheet" worksheet for the strain and pasted it into the new worksheet dZAP1_ANOVA. Copied the columns containing the data for dZAP1 strain and pasted it into the worksheet.
- At the top of the first column to the right of data, created five column headers of the form dZAP1_AvgLogFC_t(TIME) for times 15, 30, 60, 90, 120.
- In the cell below the dZAP1_AvgLogFC_t15 header, typed
=AVERAGE(
- Then highlighted all the data in row 2 associated with dZAP1 and t15, press the closing parenthesis key,and pressed the "enter" key.
- Clicked on this cell and positioned my cursor at the bottom right corner. Saw my cursor change to a thin black plus sign and double clicked.
- Repeated steps (5) through (7) with the t30, t60, t90, and the t120 data.
- Now in the first empty column to the right of the dZAP1_AvgLogFC_t120 calculation, I created the column header dZAP1_ss_HO.
- In the first cell below this header, I typed
=SUMSQ(
- I then highlighted all the LogFC data in row 2 for dZAP1 for the respective time, excluding the AvgLogFC, pressed the closing parenthesis key, and pressed the "enter" key.
- Clicked on the cell with
=SUMSQ(
data of dZAP1_ss_HO and saw the cursor change to a thin black plus sign and double clicked. - In the next empty column to the right of dZAP1_ss_HO, I created the column headers dZAP1_ss_t(TIME) for times 15, 30, 60, 90, 120.
- I found these data points: t15 had 4 data points, t30 had 4 data points, t60 had 4 data points, t90 had 4 data points, and t120 had 4 data points.
- Counted the total data points to be 20.
- Highlighted the data columns of dZAP1_LogFC_t15 through dZAP1_LogFC_t120 excluding the AvgLogFCs and pressed CTL+F.
- In the "Replace" tab, typed "NA" and clicked "Replace All".
- Found that it replaced 5100 "NA"s.
- In the first cell below the header dZAP1_ss_t15, typed
=SUMSQ(<range of cells for logFC_t15>)-COUNTA(<range of cells for logFC_t15>)*<AvgLogFC_t15>^2
and hit enter.- The
COUNTA
function counts the number of cells in the specified range that have data in them (i.e., does not count cells with missing values). - The phrase <range of cells for logFC_t15> should be replaced by the data range associated with t15.
- The phrase <number of data points> should be replaced by the number of data points for that timepoint (either 3, 4, or 5).
- The phrase <AvgLogFC_t15> should be replaced by the cell number in which you computed the AvgLogFC for t15, and the "^2" squares that value.
- Upon completion of this single computation, used the Step (7) trick to copy the formula throughout the column.
- The
- Repeat this computation for the t30 through t120 data points.
- In the first column to the right of dZAP1_ss_t120, created the column header dZAP_SS_full.
- In the first row below this header, typed
=sum(<range of cells containing "ss" for each timepoint>)
and hit enter. - In the next two columns to the right, created the headers dZAP1_Fstat and dZAP1_p-value.
- Recall the number of data points: n=20
- In the first cell of the dZAP1_Fstat column, typed
=((20-5)/5)*(<dZAP1_ss_HO>-<dZAP1_SS_full>)/<dZAP1_SS_full>
and hit enter.- Replaced the phrase (dZAP1_ss_HO with the cell designation.
- Replaced the phrase <(dZAP1_SS_full> with the cell designation.
- Copied to the whole column using step 7 again.
- In the first cell below the dZAP1_p-value header, typed
=FDIST(<dZAP1_Fstat>,5,n-5)
replacing the phrase <dZAP1_Fstat> with the cell designation and the "n" as in 20. Copied to the whole column. - Before we move on to the next step, I performed a quick sanity check to see if I did all of these computations correctly.
- Clicked on cell A1 and clicked on the Data tab. Selected the Filter icon.
- Clicked on the drop-down arrow on dZAP1_p-value column. Selected "Number Filters". In the window that appeared, set a criterion that would filter the data so that the p value has to be less than 0.05.
- Excel will now only displayed the rows that correspond to data meeting that filtering criterion which was 2485 out of 6189.
Calculate the Bonferroni and p value Correction
- Now we will perform adjustments to the p value to correct for the multiple testing problem. Labeled the next two columns to the right with the same label, dZAP1_Bonferroni_p-value.
- Typed the equation
=<dZAP1_p-value>*6189
, Upon completion of this single computation, used the Step (7) trick to copy the formula throughout the column. - Replaced any corrected p value that is greater than 1 by the number 1 by typing the following formula into the first cell below the second dZAP1_Bonferroni_p-value header:
=IF(dZAP1_Bonferroni_p-value>1,1,dZAP1_Bonferroni_p-value)
, where "dZAP1_Bonferroni_p-value" refers to the cell in which the first Bonferroni p value computation was made. Used the Step (7) trick to copy the formula throughout the column.
Calculate the Benjamini & Hochberg p value Correction
- Inserted a new worksheet named "dZAP1_ANOVA_B-H".
- Copied and pasted the "MasterIndex", "ID", and "Standard Name" columns from the previous worksheet into the first two columns of the new worksheet.
- For the following, used Paste special > Paste values. Copied the unadjusted p values from theANOVA worksheet and pasted it into Column D.
- Selected all of columns A, B, C, and D. Sorted by ascending values on Column D. Clicked the sort button from A to Z on the toolbar, in the window that appeared, sorted by column D, smallest to largest.
- Typed the header "Rank" in cell E1. Typed "1" into cell E2 and "2" into cell E3. Selected
both cells E2 and E3. Double-clicked on the plus sign on the lower right-hand corner of the selection to fill the column with a series of numbers from 1 to 6189.
- Typed dZAP1_B-H_p-value in cell F1. Typed the following formula in cell F2:
=(D2*6189)/E2
and pressed enter. Copied the equation to the entire column. - Typed "dZAP1_B-H_p-value" into cell G1.
- Typed the following formula into cell G2:
=IF(F2>1,1,F2)
and pressed enter. Copied the equation to the entire column. - Selected columns A through G. Sorted them by the MasterIndex in Column A in ascending order.
- Copied column G and used Paste special > Paste values to paste it into the next column on the right of the ANOVA sheet.
Acknowledgements
- Met outside of class and worked alongside inside class with Katie Wright to discuss any questions we had throughout the process of completing the Week 8 assignment.
- Was assisted during the completion of the Week 8 Assignment in class by Dr. Dondi and Dr. Dalquist.
While I worked with the people noted above, this individual journal entry was completed by me and not copied from another source.
Aporras1 (talk) 20:27, 23 October 2017 (PDT)
References
- LMU BioDB 2017. (2017). Week 8. Retrieved October 27, 2017, from https://xmlpipedb.cs.lmu.edu/biodb/fall2017/index.php/Week_8