Mbalducc Week 8

From LMU BioDB 2017
Revision as of 20:14, 22 October 2017 by Mbalducc (talk | contribs) (edited instructions)
Jump to: navigation, search

Microarray Data Analysis Lab Notebook

I analyzed the strain dCIN5.

5871 replacements were made when I deleted the NAs from the spreadsheet.

Statistical Data Analysis Part 1

  1. Create a new worksheet, naming it dCIN5_ANOVA".
  2. Copy the first three columns containing the "MasterIndex", "ID", and "Standard Name" from the "Master_Sheet" worksheet for the strain dCIN5 and paste it into your new worksheet. Copy the columns containing the data for dCIN5 and paste it into your new worksheet.
  3. At the top of the first column to the right of your data, create five column headers of the form dCIN5_AvgLogFC_(TIME) where (TIME) is 15, 30, 60, 90, and 120.
  4. In the cell below the dCIN5_AvgLogFC_t15 header, type =AVERAGE(D2:G2) and hit enter.
  5. Then highlight all the data in row 2 associated with dCIN5 and t15, press the closing paren key (shift 0),and press the "enter" key.
  6. This cell now contains the average of the log fold change data from the first gene at t=15 minutes.
  7. Click on this cell and position your cursor at the bottom right corner. You should see your cursor change to a thin black plus sign (not a chubby white one). When it does, double click, and the formula will magically be copied to the entire column of 6188 other genes.
  8. Repeat steps (4) through (8) with the t30, t60, t90, and the t120 data.
  9. Now in the first empty column to the right of the dCIN5_AvgLogFC_t120 calculation, create the column header dCIN5_ss_HO.
  10. In the first cell below this header, type =SUMSQ(D2:W2)
  11. Highlight all the LogFC data in row 2 for your dCIN5 (but not the AvgLogFC), press the closing paren key (shift 0),and press the "enter" key.
  12. In the next empty column to the right of dCIN5_ss_HO, create the column headers dCIN5_ss_(TIME) as in (3).
  13. Make a note of how many data points you have at each time point for your strain. For dCIN5, there are 4 data points for each time point. There are 20 total data points.
  14. In the first cell below the header dCIN5_ss_t15, type =SUMSQ(D2:G2)-COUNTA(D2:G2)*X2^2 and hit enter.
    • The COUNTA function counts the number of cells in the specified range that have data in them (i.e., does not count cells with missing values).
    • Upon completion of this single computation, use the Step (7) trick to copy the formula throughout the column.
  15. Repeat this computation for the t30 through t120 data points. Again, be sure to get the data for each time point, type the right number of data points, and get the average from the appropriate cell for each time point, and copy the formula to the whole column for each computation.
  16. In the first column to the right of dCIN5_ss_t120, create the column header dCIN5_SS_full.
  17. In the first row below this header, type =sum(AD2:AH2) and hit enter.
  18. In the next two columns to the right, create the headers dCIN5_Fstat and dCIN5_p-value.
  19. Recall the number of data points from (13): 20.
  20. In the first cell of the dCIN5_Fstat column, type =((20-5)/5)*(AC2-AI2>)/AI2 and hit enter.
    • Copy to the whole column.
  21. In the first cell below the dCIN5_p-value header, type =FDIST(AJ2,5,20-5). Copy to the whole column.
  22. Before we move on to the next step, we will perform a quick sanity check to see if we did all of these computations correctly.
    • Click on cell A1 and click on the Data tab. Select the Filter icon (looks like a funnel). Little drop-down arrows should appear at the top of each column. This will enable us to filter the data according to criteria we set.
    • Click on the drop-down arrow on your dCIN5_p-value column. Select "Number Filters". In the window that appears, set a criterion that will filter your data so that the p value has to be less than 0.05.
    • Excel will now only display the rows that correspond to data meeting that filtering criterion. A number will appear in the lower left hand corner of the window giving you the number of rows that meet that criterion. We will check our results with each other to make sure that the computations were performed correctly.