Ksherbina Week 8

From LMU BioDB 2013
Revision as of 16:54, 17 October 2013 by Ksherbina (Talk | contribs)

Jump to: navigation, search
Katrina Sherbina
Class Page    User Page
Assignment Description Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 15
Class Journal Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9
Individual Journal Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11
Other Week 5: Database Wiki
Final Project Team H(oo)KD Project Page Journal Club Presentation Project Individual Journal

Contents

Lab Journal: Working with the Vibrio cholerae Microarray Data

  • Download the file Merrell_Compiled_Raw_Data_Vibrio.xls from the Sample Microarray Analysis for Vibrio cholerae page.
    • Save the file with the following format for the filename: Merrell_Compiled_Raw_Data_Vibrio_<Initials>_<Date>.xls
  • This file contains the Log2 of Red Dye/Green Dye Normalized Ratio (Median) organized in the following manner:
    • Patient A
      • Sample 1: 24047.xls (A1)
      • Sample 2: 24048.xls (A2)
      • Sample 3: 24213.xls (A3)
      • Sample 4: 24202.xls (A4)
    • Patient B
      • Sample 5: 24049.xls (B1)
      • Sample 6: 24050.xls (B2)
      • Sample 7: 24203.xls (B3)
      • Sample 8: 24204.xls (B4)
    • Patient C
      • Sample 9: 24053.xls (C1)
      • Sample 10: 24054.xls (C2)
      • Sample 11: 24205.xls (C3)
      • Sample 12: 24206.xls (C4)

Normalize the set of microarray chips in the experiment

  • Open the file. Insert a new worksheet and name it "scaled_centered".
  • Go back to the "compiled_raw_data" worksheet. Select All and Copy. Go to your new "scaled_centered" worksheet, click on the upper, left-hand cell (cell A1) and Paste.
  • Insert two rows in between the top row of the column headers and the first row of data.
  • In cell A2, type "Average" and in cell A3, type "StdDev".
  • Compute the average log fold change for each chip, which corresponds to each column of the data. In cell B2, type the following equation:
=AVERAGE(B4:B5224)
and hit Enter.
  • Alternately, to select the range within the parenthesis of the AVERAGE formula, click on the first cell of the range for which the computation will be performed, scroll to the bottom of the the worksheet, and Shift+click on the last cell of the range.
  • Click and hold the lower right hand corner of the cell B2. Then, drag the cursor to the last column for which you would like to compute the average log fold change.
  • Compute the standard deviation of the log fold change ratios for each chip, which corresponds to each column of the data. In cell B3, type the following equation:
=STDEV(B4:B5224)
and hit Enter.
  • Click and hold the lower right hand corner of the cell B3. Then, drag the cursor to the last column for which you would like to compute the standard deviation of the log fold change ratios.
  • Copy and Paste the first column (column A) to the first blank column after the data (column N).
  • Now, the data can be scaled and centered. After column N, label a new column for each chip that will be scaled and centered: A1_scaled_centered, A2_scaled_centered, etc.
  • In cell O4, type the following equation:
=(B4-B$2)/B$3
In this case, the average of the first chip A1 (in cell B2) is subtracted from the data in cell B4. The difference is then divided by the standard deviation of the first chip A1 (in cell B3).
  • Double click on the lower right hand corner of cell O4 to performing scaling and centering for the rest of the chip (the column of the data).
  • Repeat the scaling and centering procedure for the rest of the chips changing the cells in the formula corresponding to the average and the standard deviation for each chip (column of the data) that you scale and center.

Perform statistical analysis of the normalized microarray data

  • Insert a new worksheet and name it "statistics".
  • Go back to the "scaled_centered" worksheet and copy the first column ("ID").
  • Paste the data into the first column of your new "statistics" worksheet.
  • Go back to the "scaled_centered" worksheet and click on the cell "A1_scaled_centered". Then, hold the Shift and Ctrl keys and hit the Right Arrow key to select all of the cells that have the column names for the scaled and centered data. Copy the selection of cells.
  • Go to the "statistics" worksheet and Paste the column names into the columns in first row after the "ID" column.
  • Go back to the "scaled_centered" worksheet and Select All and Copy the scaled and centered data.
  • To do so, click on the first cell of the data (cell O4). Then hold the Shift and Ctrl keys, hit the Right Arrow key, and then hit the Down Arrow key (making sure that you are still holding down the Shift and Ctrl keys). Then, Copy the selection.
  • Go to the "statistics" worksheet and right click on cell B2. Go to the "Paste Special" option. A window will open: click on the radio button for "Values" and click OK. This pastes the data as numerical values rather than equations.
  • To the right of the data you just pasted into the worksheet, type the following headers into the first cell of the next three columns: "Avg_LogFC_A", "Avg_LogFC_B", and "Avg_LogFC_C".
  • Compute the average log fold change for the replicates for each patient A by typing the following equation:
=AVERAGE(B2:E2)
into cell N2 and hit Enter.
  • Double click on the lower right hand corner of cell N2 to compute the average of the replicates for Patient A for the remainder of the genes.
  • Repeat the calculation for Patients B and C in their respective columns.
  • Type the header "Avg_LogFC_all" into the first cell in the next empty column (column Q). Compute the average of the averages by typing the following equation into cell Q2:
=AVERAGE(N2:P2)
and hit Enter.
  • Double click on the lower right hand corner of cell Q2 to compute the average of the averages for the rest of the genes.
  • Now, compute a T statistic to determine how much the average log fold change of all the patients deviates from 0, which corresponds to now change. Type the header "Tstat" into the first cell in the next empty column (column R). Type the following equation into cell R2:
=Q2/(STDEV(N2:P2)/SQRT(COUNT(N2:P2))
and hit Enter. (The command COUNT() counts the number of patients in the experiment.)
  • Double click on the lower right hand corner of cell R2 to compute the T statistic for the remainder of the genes.
  • Now, compute the P value to determine how significant is the deviation of the average log fold change of all the patients from 0. Type the header "Pvalue" into the first cell in the next empty column (column S). Type the following equation into cell S2:
=TDIST(ABS(R2),COUNT(N2:P2)-1,2)
and hit Enter. Here, the command COUNT(N2:P2)-1 computes the degrees of freedom, which is one less the number of replicates. The "2" specifies that a two-tailed distribution is used to compute the p value.
  • Double click on the lower right hand corner of cell R2 to compute the p value for the remainder of the genes.

The resulting Excel file can be downloaded here.

Format the data for GenMAPP

  • Insert a new worksheet and name it "forGenMAPP".
  • Go back to the "statistics" worksheet and Select All and Copy.
  • Go to the "forGenMAPP" worksheet and right click on cell A1. Highlight the "Paste Special..." option and then click on "Paste Special...". A window will open: click on the radio button for "Values" and click OK. This pastes the data as numerical values rather than equations.

The text file to be used for GenMapp can be downloaded here.

Lab Journal: MAPPFinder Analysis of Vibrio cholerae Microarray Data

  1. Installed GenMAPP Classic from this [page] onto my computer.
  2. Downloaded the following data files: Vc-Std_External_20090622.gdb and Vc-Std_External_20101022.gdb.
Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox