Difference between revisions of "Laurmagee: Week 8"

Revision as of 07:33, 18 October 2013

Part 1

First download the following data file: File:Merrell Compiled Raw Data Vibrio.xls
Insert a new spreadsheet in the label it "scaled_centered".
Select all the data in your previous worksheet and copy it into the new spreadsheet
Insert two rows in between the top row of headers and the first data row.
In cell A2, type "Average" and in cell A3, type "StdDev".
Compute the Average log ratio for each chip (each column of data). In cell B2, type the following equation: =AVERAGE(B4:B5224)
Compute the Standard Deviation of the log ratios on each chip (each column of data). In cell B3, type the following equation: =STDEV(B4:B5224)
Drag these lines into the rest of the columns, because Excel will automatically change the equation to match your specific column.
Copy the column headings for all of your data columns and then paste them to the right of the last data column so that you have a second set of headers above blank colums of cells. Edit the names of the columns so that they now read: A1_scaled_centered, A2_scaled_centered, etc.
In cell N4, type the following equation: =(B4-B$2)/B$3 The dollar signs are important because we will be using the same mean (B2) and standard deviation (B3) in all the succeeding columns values as well.
Copy and paste the scaling and centering equation for each of the columns of data with the "_scaled_centered" column header. Be sure that your equation is correct for the column you are calculating (i.e. change the values to match the cells with the correct average and standard deviation).
Insert a new worksheet and name it "statistics".
Go back to the "scaling_centering" worksheet and copy the first column ("ID"). And paste these values into the first column of your new spreadsheet.
Go back to the "scaling_centering" worksheet and copy the columns that are designated "_scaled_centered". Click on the B1 cell and select "Paste Special" from the Edit menu. A window will open: click on the radio button for "Values" and click OK.
Go to a new column on the right of your worksheet. Type the header "Avg_LogFC_A", "Avg_LogFC_B", and "Avg_LogFC_C" into the top cell of the next three columns.
Compute the average log fold change for the replicates for each patient by typing the equation: =AVERAGE(B2:E2) into cell N2. Copy this equation and paste it into the rest of the column.
Create the equation for patients B and C and paste it into their respective columns.
Now you will compute the average of the averages. Type the header "Avg_LogFC_all" into the first cell in the next empty column.
Create the equation that will compute the average of the three previous averages you calculated and paste it into this entire column.
Insert a new column labeled "Tstat".
Enter the equation: =AVERAGE(N2:P2)/(STDEV(N2:P2)/SQRT(3)) and paste it into all rows in that column.
Label the top cell in the next column "Pvalue" and enter the equation: =TDIST(ABS(R2),2,2) paste it into all rows in that column.
Insert a new worksheet and name it "forGenMAPP".
Go back to the "statistics" worksheet and Select All and Copy.
Go to your new sheet and click on cell A1 and select Paste Special, click on the Values radio button, and click OK.
Select Columns B through Q (all the fold changes). Select the menu item Format > Cells. Under the number tab, select 2 decimal places. Click OK.
Select Columns R and S. Select the menu item Format > Cells. Under the number tab, select 4 decimal places. Click OK.
Select Columns N through S and Cut. Select Column B by left-clicking on the "B" at the top of the column. Then right-click on the Column B header and select "Insert Cut Cells". This will insert the data without writing over your existing columns.
Insert a column to the right of the "ID" column. Type the header "SystemCode" into the top cell of this column. Fill the entire column (each cell) with the letter "N".
Select the menu item File > Save As, and choose "Text (Tab-delimited) (*.txt)" from the file type drop-down menu.

To make sure that we performed our data analysis correctly, first open your spreadsheet and go to the "forGenMAPP" tab.
Click on cell A1 and select the menu item Data > Filter > Autofilter.
Click on the drop-down arrow on your "Pvalue" column. Select "Custom". In the window that appears, set a criterion that will filter your data so that the Pvalue has to be less than 0.05.
How many genes have p value < 0.05?
- 5221 genes.
What about p < 0.01?
- 169 genes.
What about p < 0.001?
- 19 genes.
What about p < 0.0001?
- 1 gene.
Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change greater than zero. How many are there?
- 300 genes
Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change less than zero. How many are there?
- 454 genes
What about an average log fold change of > 0.25 and p < 0.05?
- 281 genes
Or an average log fold change of < -0.25 and p < 0.05? (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
- 442 genes
What criteria did Merrell et al. (2002) use to determine a significant gene expression change? How does it compare to our method?
Merrell et al. (2002) report that genes with IDs: VC0028, VC0941, VC0869, VC0051, VC0647, VC0468, VC2350, and VCA0583 were all significantly changed in their data. Look these genes up in your spreadsheet. What are their fold changes and p values? Are they significantly changed in our analysis?

Part 2

Laurmagee (talk) 22:56, 17 October 2013 (PDT)

@@ Line 31: / Line 31: @@
 *Select the menu item File > Save As, and choose "Text (Tab-delimited) (*.txt)" from the file type drop-down menu.
-*[[File:Merrell_Compiled_Raw_Data_Vibrio_LM.xml]]
+*[[Merrell_Compiled_Raw_Data_Vibrio_LM(1).xls]]
 *[[File:Merrell_Compiled_Raw_Data_Vibrio_LM(1)-1.txt]]

Difference between revisions of "Laurmagee: Week 8"

Revision as of 07:33, 18 October 2013

Part 1

Part 2

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Toolbox