Difference between revisions of "Laurmagee: Week 8"

From LMU BioDB 2013
Jump to: navigation, search
(Part 1)
(Part 1)
Line 31: Line 31:
 
*Select the menu item File > Save As, and choose "Text (Tab-delimited) (*.txt)" from the file type drop-down menu.
 
*Select the menu item File > Save As, and choose "Text (Tab-delimited) (*.txt)" from the file type drop-down menu.
  
*[[File:Merrell_Compiled_Raw_Data_Vibrio_LM.xml]]
+
*[[Merrell_Compiled_Raw_Data_Vibrio_LM(1).xls]]
 
*[[File:Merrell_Compiled_Raw_Data_Vibrio_LM(1)-1.txt]]
 
*[[File:Merrell_Compiled_Raw_Data_Vibrio_LM(1)-1.txt]]
  

Revision as of 07:33, 18 October 2013

Part 1

  • First download the following data file: File:Merrell Compiled Raw Data Vibrio.xls
  • Insert a new spreadsheet in the label it "scaled_centered".
  • Select all the data in your previous worksheet and copy it into the new spreadsheet
  • Insert two rows in between the top row of headers and the first data row.
  • In cell A2, type "Average" and in cell A3, type "StdDev".
  • Compute the Average log ratio for each chip (each column of data). In cell B2, type the following equation: =AVERAGE(B4:B5224)
  • Compute the Standard Deviation of the log ratios on each chip (each column of data). In cell B3, type the following equation: =STDEV(B4:B5224)
  • Drag these lines into the rest of the columns, because Excel will automatically change the equation to match your specific column.
  • Copy the column headings for all of your data columns and then paste them to the right of the last data column so that you have a second set of headers above blank colums of cells. Edit the names of the columns so that they now read: A1_scaled_centered, A2_scaled_centered, etc.
  • In cell N4, type the following equation: =(B4-B$2)/B$3 The dollar signs are important because we will be using the same mean (B2) and standard deviation (B3) in all the succeeding columns values as well.
  • Copy and paste the scaling and centering equation for each of the columns of data with the "_scaled_centered" column header. Be sure that your equation is correct for the column you are calculating (i.e. change the values to match the cells with the correct average and standard deviation).
  • Insert a new worksheet and name it "statistics".
  • Go back to the "scaling_centering" worksheet and copy the first column ("ID"). And paste these values into the first column of your new spreadsheet.
  • Go back to the "scaling_centering" worksheet and copy the columns that are designated "_scaled_centered". Click on the B1 cell and select "Paste Special" from the Edit menu. A window will open: click on the radio button for "Values" and click OK.
  • Go to a new column on the right of your worksheet. Type the header "Avg_LogFC_A", "Avg_LogFC_B", and "Avg_LogFC_C" into the top cell of the next three columns.
  • Compute the average log fold change for the replicates for each patient by typing the equation: =AVERAGE(B2:E2) into cell N2. Copy this equation and paste it into the rest of the column.
  • Create the equation for patients B and C and paste it into their respective columns.
  • Now you will compute the average of the averages. Type the header "Avg_LogFC_all" into the first cell in the next empty column.
  • Create the equation that will compute the average of the three previous averages you calculated and paste it into this entire column.
  • Insert a new column labeled "Tstat".
  • Enter the equation: =AVERAGE(N2:P2)/(STDEV(N2:P2)/SQRT(3)) and paste it into all rows in that column.
  • Label the top cell in the next column "Pvalue" and enter the equation: =TDIST(ABS(R2),2,2) paste it into all rows in that column.
  • Insert a new worksheet and name it "forGenMAPP".
  • Go back to the "statistics" worksheet and Select All and Copy.
  • Go to your new sheet and click on cell A1 and select Paste Special, click on the Values radio button, and click OK.
  • Select Columns B through Q (all the fold changes). Select the menu item Format > Cells. Under the number tab, select 2 decimal places. Click OK.
  • Select Columns R and S. Select the menu item Format > Cells. Under the number tab, select 4 decimal places. Click OK.
  • Select Columns N through S and Cut. Select Column B by left-clicking on the "B" at the top of the column. Then right-click on the Column B header and select "Insert Cut Cells". This will insert the data without writing over your existing columns.
  • Insert a column to the right of the "ID" column. Type the header "SystemCode" into the top cell of this column. Fill the entire column (each cell) with the letter "N".
  • Select the menu item File > Save As, and choose "Text (Tab-delimited) (*.txt)" from the file type drop-down menu.
  • To make sure that we performed our data analysis correctly, first open your spreadsheet and go to the "forGenMAPP" tab.
  • Click on cell A1 and select the menu item Data > Filter > Autofilter.
  • Click on the drop-down arrow on your "Pvalue" column. Select "Custom". In the window that appears, set a criterion that will filter your data so that the Pvalue has to be less than 0.05.
  • How many genes have p value < 0.05?
    • 5221 genes.
  • What about p < 0.01?
    • 169 genes.
  • What about p < 0.001?
    • 19 genes.
  • What about p < 0.0001?
    • 1 gene.
  • Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change greater than zero. How many are there?
    • 300 genes
  • Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change less than zero. How many are there?
    • 454 genes
  • What about an average log fold change of > 0.25 and p < 0.05?
    • 281 genes
  • Or an average log fold change of < -0.25 and p < 0.05? (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
    • 442 genes
  • What criteria did Merrell et al. (2002) use to determine a significant gene expression change? How does it compare to our method?
  • Merrell et al. (2002) report that genes with IDs: VC0028, VC0941, VC0869, VC0051, VC0647, VC0468, VC2350, and VCA0583 were all significantly changed in their data. Look these genes up in your spreadsheet. What are their fold changes and p values? Are they significantly changed in our analysis?

Part 2

Laurmagee (talk) 22:56, 17 October 2013 (PDT)

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox