Aporras1 Week 8
From LMU BioDB 2017
								User page: Antonio Porras
Assignment page: Week 8
Contents
Electronic Notebook
Statistical Analysis Part 1
The purpose of the witin-stain ANOVA test is to determine if any genes had a gene expression change that was significantly different than zero at any timepoint.
- Downloaded the excel file and renamed it AP dZAP1.
 - Created a new worksheet, naming it dZAP1_ANOVA".
 - Copied the first three columns containing the "MasterIndex", "ID", and "Standard Name" from the "Master_Sheet" worksheet for the strain and pasted it into the new worksheet dZAP1_ANOVA. Copied the columns containing the data for dZAP1 strain and pasted it into the worksheet.
 - At the top of the first column to the right of data, created five column headers of the form dZAP1_AvgLogFC_t(TIME) for times 15, 30, 60, 90, 120.
 -  In the cell below the dZAP1_AvgLogFC_t15 header, typed 
=AVERAGE( - Then highlighted all the data in row 2 associated with dZAP1 and t15, press the closing parenthesis key,and pressed the "enter" key.
 - Clicked on this cell and positioned my cursor at the bottom right corner. Saw my cursor change to a thin black plus sign and double clicked.
 - Repeated steps (5) through (7) with the t30, t60, t90, and the t120 data.
 - Now in the first empty column to the right of the dZAP1_AvgLogFC_t120 calculation, I created the column header dZAP1_ss_HO.
 -  In the first cell below this header, I typed 
=SUMSQ( - I then highlighted all the LogFC data in row 2 for dZAP1 for the respective time, excluding the AvgLogFC, pressed the closing parenthesis key, and pressed the "enter" key.
 -  Clicked on the cell with 
=SUMSQ(data of dZAP1_ss_HO and saw the cursor change to a thin black plus sign and double clicked. - In the next empty column to the right of dZAP1_ss_HO, I created the column headers dZAP1_ss_t(TIME) for times 15, 30, 60, 90, 120.
 - I found these data points: t15 had 4 data points, t30 had 4 data points, t60 had 4 data points, t90 had 4 data points, and t120 had 4 data points.
 - Counted the total data points to be 20.
 - Highlighted the data columns of dZAP1_LogFC_t15 through dZAP1_LogFC_t120 excluding the AvgLogFCs and pressed CTL+F.
 - In the "Replace" tab, typed "NA" and clicked "Replace All".
 - Found that it replaced 5100 "NA"s.
 -  In the first cell below the header dZAP1_ss_t15, typed 
=SUMSQ(<range of cells for logFC_t15>)-COUNTA(<range of cells for logFC_t15>)*<AvgLogFC_t15>^2and hit enter.-  The 
COUNTAfunction counts the number of cells in the specified range that have data in them (i.e., does not count cells with missing values). - The phrase <range of cells for logFC_t15> should be replaced by the data range associated with t15.
 - The phrase <number of data points> should be replaced by the number of data points for that timepoint (either 3, 4, or 5).
 - The phrase <AvgLogFC_t15> should be replaced by the cell number in which you computed the AvgLogFC for t15, and the "^2" squares that value.
 - Upon completion of this single computation, used the Step (7) trick to copy the formula throughout the column.
 
 -  The 
 - Repeat this computation for the t30 through t120 data points.
 - In the first column to the right of dZAP1_ss_t120, created the column header dZAP_SS_full.
 -  In the first row below this header, typed 
=sum(<range of cells containing "ss" for each timepoint>)and hit enter. - In the next two columns to the right, created the headers dZAP1_Fstat and dZAP1_p-value.
 - Recall the number of data points: n=20
 -  In the first cell of the dZAP1_Fstat column, typed 
=((20-5)/5)*(<dZAP1_ss_HO>-<dZAP1_SS_full>)/<dZAP1_SS_full>and hit enter.- Replaced the phrase (dZAP1_ss_HO with the cell designation.
 - Replaced the phrase <(dZAP1_SS_full> with the cell designation.
 - Copied to the whole column using step 7 again.
 
 -  In the first cell below the dZAP1_p-value header, typed 
=FDIST(<dZAP1_Fstat>,5,n-5)replacing the phrase <dZAP1_Fstat> with the cell designation and the "n" as in 20. Copied to the whole column. -  Before we move on to the next step, I performed a quick sanity check to see if I did all of these computations correctly.
- Clicked on cell A1 and clicked on the Data tab. Selected the Filter icon.
 - Clicked on the drop-down arrow on dZAP1_p-value column. Selected "Number Filters". In the window that appeared, set a criterion that would filter the data so that the p value has to be less than 0.05.
 - Excel will now only displayed the rows that correspond to data meeting that filtering criterion which was 2485 out of 6189.
 
 
Calculate the Bonferroni and p value Correction
- Now we will perform adjustments to the p value to correct for the multiple testing problem. Labeled the next two columns to the right with the same label, dZAP1_Bonferroni_p-value.
 -  Typed the equation 
=<dZAP1_p-value>*6189, Upon completion of this single computation, used the Step (7) trick to copy the formula throughout the column. -  Replaced any corrected p value that is greater than 1 by the number 1 by typing the following formula into the first cell below the second dZAP1_Bonferroni_p-value header: 
=IF(dZAP1_Bonferroni_p-value>1,1,dZAP1_Bonferroni_p-value), where "dZAP1_Bonferroni_p-value" refers to the cell in which the first Bonferroni p value computation was made. Used the Step (7) trick to copy the formula throughout the column. 
Calculate the Benjamini & Hochberg p value Correction
- Inserted a new worksheet named "dZAP1_ANOVA_B-H".
 - Copied and pasted the "MasterIndex", "ID", and "Standard Name" columns from the previous worksheet into the first two columns of the new worksheet.
 - For the following, used Paste special > Paste values. Copied the unadjusted p values from theANOVA worksheet and pasted it into Column D.
 - Selected all of columns A, B, C, and D. Sorted by ascending values on Column D. Clicked the sort button from A to Z on the toolbar, in the window that appeared, sorted by column D, smallest to largest.
 - Typed the header "Rank" in cell E1. Typed "1" into cell E2 and "2" into cell E3. Selected
 
both cells E2 and E3. Double-clicked on the plus sign on the lower right-hand corner of the selection to fill the column with a series of numbers from 1 to 6189.
-  Typed dZAP1_B-H_p-value in cell F1. Typed the following formula in cell F2: 
=(D2*6189)/E2and pressed enter. Copied the equation to the entire column. - Typed "dZAP1_B-H_p-value" into cell G1.
 -  Typed the following formula into cell G2: 
=IF(F2>1,1,F2)and pressed enter. Copied the equation to the entire column. - Selected columns A through G. Sorted them by the MasterIndex in Column A in ascending order.
 - Copied column G and used Paste special > Paste values to paste it into the next column on the right of the ANOVA sheet.
 
Acknowledgements
- Met outside of class and worked alongside inside class with Katie Wright to discuss any questions we had throughout the process of completing the Week 8 assignment.
 - Was assisted during the completion of the Week 8 Assignment in class by Dr. Dondi and Dr. Dalquist.
 
While I worked with the people noted above, this individual journal entry was completed by me and not copied from another source.
Aporras1 (talk) 20:27, 23 October 2017 (PDT)
References
- LMU BioDB 2017. (2017). Week 8. Retrieved October 27, 2017, from https://xmlpipedb.cs.lmu.edu/biodb/fall2017/index.php/Week_8