Difference between revisions of "Jwoodlee Week 8"

From LMU BioDB 2015
Jump to: navigation, search
(added beginning of Electronic Lab Notebook)
(Electronic Lab Notebook: added part of the part 1 procedure)
Line 2: Line 2:
 
I downloaded the raw data from the Merrell et al. experiment, found [http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae here].
 
I downloaded the raw data from the Merrell et al. experiment, found [http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae here].
  
I then followed the procedure on that page.
+
I then followed the procedure on that page:
 +
 
 +
===Part 1 Procedure===
 +
Procedure taken from [http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae].  Reworded to reflect what I did.
 +
 
 +
I scaled and centered the data (between chip normalization) by performing the following operations:
 +
Inserted a new Worksheet into the excel file, and named it "scaled_centered".
 +
Went back to the "compiled_raw_data" worksheet, selected all(command A) and copied(command C). Went to the "scaled_centered" worksheet, clicked on the upper, left-hand cell and pasted.
 +
 
 +
Inserted two rows in between the top row of headers and the first data row.
 +
In cell A2, typed "Average" and in cell A3, typed "StdDev".
 +
I then computed the average for the first column.  In cell B2, I typed the following equation:
 +
<code>=AVERAGE(B4:B5224)</code>
 +
and pressed "Enter".
 +
I then computed the standard deviation for the first column. In cell B3, I typed the following equation:
 +
<code>=STDEV(B4:B5224)</code>
 +
and pressed "Enter".
 +
I copied these two equations (cells B2 and B3) and pasted them into the empty cells in the rest of the columns to calculate for the rest of the columns.
 +
 
 +
I copied the column headings for all of my data columns and then pasted them to the right of the last data column so that I had a second set of headers above blank columns of cells. I then edited the names of the columns so that they now read: A1_scaled_centered, A2_scaled_centered, etc.
 +
In cell N4, I typed the following equation:
 +
=(B4-B$2)/B$3
 +
We wanted the data in cell B4 to have the average subtracted from it (cell B2) and be divided by the standard deviation (cell B3).  The dollar signs make sure that the proper cells are always used.
 +
I then copied and pasted this equation into the entire column across all rows.
 +
 
 +
In order to perform statistical analysis on the data, I inserted a new worksheet called "statistics".
 +
Went back to the "scaled_centered" worksheet and copied the ID column and pasted the data from that column into the first column of the statistics worksheet.
 +
 
 +
I went back to the "scaling_centering" worksheet and copied the columns that are designated "scaled_centered".
 +
 
 +
I went back to my new worksheet and clicked on the B1 cell. I then selected "Paste Special" from the Edit menu so I could get the values from the previous worksheet without the equations.
 +
I then deleted rows 2 and 3 that were there previously for the average and standard deviation.
 +
I then added 3 new headers to blank columns to the right: "Avg_LogFC_A", "Avg_LogFC_B", and "Avg_LogFC_C".
 +
 
 +
I then computed the average log fold change for the replicates for each patient by typing the equation:
 +
<code>=AVERAGE(B2:E2)</code> into cell N2. I then pasted this equation into the rest of the column. 
 +
I then created the equations for patients B and C and pasted them into their respective columns.  The equations are: <code>=AVERAGE(J2:M2) </code> and <code> =AVERAGE(F2:I2)</code>
 +
I then got the average of the averages and put them into the next column on the right with the header: "Avg_LogFC_all". The equation is: <code> =AVERAGE(N2:P2)</code>
 +
 
 +
 
 +
I inserted a new column next to the "Avg_LogFC_all" column and labeled the column "Tstat". This will compute a T statistic that tells us whether the scaled and centered average log ratio is significantly different than 0 (no change). Enter the equation:
 +
=AVERAGE(N2:P2)/(STDEV(N2:P2)/SQRT(number of replicates))
 +
(NOTE: in this case the number of replicates is 3. Be careful that you are using the correct number of parentheses.) Copy the equation and paste it into all rows in that column.
 +
Label the top cell in the next column "Pvalue". In the cell below the label, enter the equation:
 +
=TDIST(ABS(R2),degrees of freedom,2)
 +
The number of degrees of freedom is the number of replicates minus one, so in our case there are 2 degrees of freedom. Copy the equation and paste it into all rows in that column.
 +
Calculate the Bonferroni p value Correction
 +
Now we will perform adjustments to the p value to correct for the multiple testing problem. Label the next two columns to the right with the same label, Bonferroni_Pvalue.
 +
Type the equation =S2*5221, Upon completion of this single computation, use the trick to copy the formula throughout the column.
 +
Replace any corrected p value that is greater than 1 by the number 1 by typing the following formula into the first cell below the second Bonferroni_Pvalue header: =IF(T2>1,1,T2). Use the trick to copy the formula throughout the column.
 +
Calculate the Benjamini & Hochberg p value Correction
 +
Insert a new worksheet named "B-H_Pvalue".
 +
Copy and paste the "ID" column from your previous worksheet into the first column of the new worksheet.
 +
Insert a new column on the very left and name it "MasterIndex". We will create a numerical index of genes so that we can always sort them back into the same order.
 +
Type a "1" in cell A2 and a "2" in cell A3.
 +
Select both cells. Hover your mouse over the bottom-right corner of the selection until it makes a thin black + sign. Double-click on the + sign to fill the entire column with a series of numbers from 1 to 5221 (the number of genes on the microarray).
 +
For the following, use Paste special > Paste values. Copy your unadjusted p values from your previous worksheet and paste it into Column C.
 +
Select all of columns A, B, and C. Sort by ascending values on Column C. Click the sort button from A to Z on the toolbar, in the window that appears, sort by column C, smallest to largest.
 +
Type the header "Rank" in cell D1. We will create a series of numbers in ascending order from 1 to 5221 in this column. This is the p value rank, smallest to largest. Type "1" into cell D2 and "2" into cell D3. Select both cells D2 and D3. Double-click on the plus sign on the lower right-hand corner of your selection to fill the column with a series of numbers from 1 to 5221.
 +
Now you can calculate the Benjamini and Hochberg p value correction. Type B-H_Pvalue in cell E1. Type the following formula in cell E2: =(C2*5221)/D2 and press enter. Copy that equation to the entire column.
 +
Type "B-H_Pvalue" into cell F1.
 +
Type the following formula into cell F2: =IF(E2>1,1,E2) and press enter. Copy that equation to the entire column.
 +
Select columns A through F. Now sort them by your MasterIndex in Column A in ascending order.
 +
Copy column F and use Paste special > Paste values to paste it into the next column on the right of your "statistics" sheet.
 +
Prepare file for GenMAPP
 +
Insert a new worksheet and name it "forGenMAPP".
 +
Go back to the "statistics" worksheet and Select All and Copy.
 +
Go to your new sheet and click on cell A1 and select Paste Special, click on the Values radio button, and click OK. We will now format this worksheet for import into GenMAPP.
 +
Select Columns B through Q (all the fold changes). Select the menu item Format > Cells. Under the number tab, select 2 decimal places. Click OK.
 +
Select all the columns containing p values. Select the menu item Format > Cells. Under the number tab, select 4 decimal places. Click OK.
 +
Delete the left-most Bonferroni p value column, preserving the one that shows the result of your "if" statement.
 +
Insert a column to the right of the "ID" column. Type the header "SystemCode" into the top cell of this column. Fill the entire column (each cell) with the letter "N".
 +
Select the menu item File > Save As, and choose "Text (Tab-delimited) (*.txt)" from the file type drop-down menu. Excel will make you click through a couple of warnings because it doesn't like you going all independent and choosing a different file type than the native .xls. This is OK. Your new *.txt file is now ready for import into GenMAPP. But before we do that, we want to know a few things about our data as shown in the next section.
 +
Upload both the .xls and .txt files that you have just created to your journal page in the class wiki. Make sure that your file name is distinct from your other classmates so that nobody overwrites anyone else's file.
  
  

Revision as of 21:58, 25 October 2015

Electronic Lab Notebook

I downloaded the raw data from the Merrell et al. experiment, found here.

I then followed the procedure on that page:

Part 1 Procedure

Procedure taken from [1]. Reworded to reflect what I did.

I scaled and centered the data (between chip normalization) by performing the following operations: Inserted a new Worksheet into the excel file, and named it "scaled_centered". Went back to the "compiled_raw_data" worksheet, selected all(command A) and copied(command C). Went to the "scaled_centered" worksheet, clicked on the upper, left-hand cell and pasted.

Inserted two rows in between the top row of headers and the first data row. In cell A2, typed "Average" and in cell A3, typed "StdDev". I then computed the average for the first column. In cell B2, I typed the following equation: =AVERAGE(B4:B5224) and pressed "Enter". I then computed the standard deviation for the first column. In cell B3, I typed the following equation: =STDEV(B4:B5224) and pressed "Enter". I copied these two equations (cells B2 and B3) and pasted them into the empty cells in the rest of the columns to calculate for the rest of the columns.

I copied the column headings for all of my data columns and then pasted them to the right of the last data column so that I had a second set of headers above blank columns of cells. I then edited the names of the columns so that they now read: A1_scaled_centered, A2_scaled_centered, etc. In cell N4, I typed the following equation: =(B4-B$2)/B$3 We wanted the data in cell B4 to have the average subtracted from it (cell B2) and be divided by the standard deviation (cell B3). The dollar signs make sure that the proper cells are always used. I then copied and pasted this equation into the entire column across all rows.

In order to perform statistical analysis on the data, I inserted a new worksheet called "statistics". Went back to the "scaled_centered" worksheet and copied the ID column and pasted the data from that column into the first column of the statistics worksheet.

I went back to the "scaling_centering" worksheet and copied the columns that are designated "scaled_centered".

I went back to my new worksheet and clicked on the B1 cell. I then selected "Paste Special" from the Edit menu so I could get the values from the previous worksheet without the equations. I then deleted rows 2 and 3 that were there previously for the average and standard deviation. I then added 3 new headers to blank columns to the right: "Avg_LogFC_A", "Avg_LogFC_B", and "Avg_LogFC_C".

I then computed the average log fold change for the replicates for each patient by typing the equation: =AVERAGE(B2:E2) into cell N2. I then pasted this equation into the rest of the column. I then created the equations for patients B and C and pasted them into their respective columns. The equations are: =AVERAGE(J2:M2) and =AVERAGE(F2:I2) I then got the average of the averages and put them into the next column on the right with the header: "Avg_LogFC_all". The equation is: =AVERAGE(N2:P2)


I inserted a new column next to the "Avg_LogFC_all" column and labeled the column "Tstat". This will compute a T statistic that tells us whether the scaled and centered average log ratio is significantly different than 0 (no change). Enter the equation: =AVERAGE(N2:P2)/(STDEV(N2:P2)/SQRT(number of replicates)) (NOTE: in this case the number of replicates is 3. Be careful that you are using the correct number of parentheses.) Copy the equation and paste it into all rows in that column. Label the top cell in the next column "Pvalue". In the cell below the label, enter the equation: =TDIST(ABS(R2),degrees of freedom,2) The number of degrees of freedom is the number of replicates minus one, so in our case there are 2 degrees of freedom. Copy the equation and paste it into all rows in that column. Calculate the Bonferroni p value Correction Now we will perform adjustments to the p value to correct for the multiple testing problem. Label the next two columns to the right with the same label, Bonferroni_Pvalue. Type the equation =S2*5221, Upon completion of this single computation, use the trick to copy the formula throughout the column. Replace any corrected p value that is greater than 1 by the number 1 by typing the following formula into the first cell below the second Bonferroni_Pvalue header: =IF(T2>1,1,T2). Use the trick to copy the formula throughout the column. Calculate the Benjamini & Hochberg p value Correction Insert a new worksheet named "B-H_Pvalue". Copy and paste the "ID" column from your previous worksheet into the first column of the new worksheet. Insert a new column on the very left and name it "MasterIndex". We will create a numerical index of genes so that we can always sort them back into the same order. Type a "1" in cell A2 and a "2" in cell A3. Select both cells. Hover your mouse over the bottom-right corner of the selection until it makes a thin black + sign. Double-click on the + sign to fill the entire column with a series of numbers from 1 to 5221 (the number of genes on the microarray). For the following, use Paste special > Paste values. Copy your unadjusted p values from your previous worksheet and paste it into Column C. Select all of columns A, B, and C. Sort by ascending values on Column C. Click the sort button from A to Z on the toolbar, in the window that appears, sort by column C, smallest to largest. Type the header "Rank" in cell D1. We will create a series of numbers in ascending order from 1 to 5221 in this column. This is the p value rank, smallest to largest. Type "1" into cell D2 and "2" into cell D3. Select both cells D2 and D3. Double-click on the plus sign on the lower right-hand corner of your selection to fill the column with a series of numbers from 1 to 5221. Now you can calculate the Benjamini and Hochberg p value correction. Type B-H_Pvalue in cell E1. Type the following formula in cell E2: =(C2*5221)/D2 and press enter. Copy that equation to the entire column. Type "B-H_Pvalue" into cell F1. Type the following formula into cell F2: =IF(E2>1,1,E2) and press enter. Copy that equation to the entire column. Select columns A through F. Now sort them by your MasterIndex in Column A in ascending order. Copy column F and use Paste special > Paste values to paste it into the next column on the right of your "statistics" sheet. Prepare file for GenMAPP Insert a new worksheet and name it "forGenMAPP". Go back to the "statistics" worksheet and Select All and Copy. Go to your new sheet and click on cell A1 and select Paste Special, click on the Values radio button, and click OK. We will now format this worksheet for import into GenMAPP. Select Columns B through Q (all the fold changes). Select the menu item Format > Cells. Under the number tab, select 2 decimal places. Click OK. Select all the columns containing p values. Select the menu item Format > Cells. Under the number tab, select 4 decimal places. Click OK. Delete the left-most Bonferroni p value column, preserving the one that shows the result of your "if" statement. Insert a column to the right of the "ID" column. Type the header "SystemCode" into the top cell of this column. Fill the entire column (each cell) with the letter "N". Select the menu item File > Save As, and choose "Text (Tab-delimited) (*.txt)" from the file type drop-down menu. Excel will make you click through a couple of warnings because it doesn't like you going all independent and choosing a different file type than the native .xls. This is OK. Your new *.txt file is now ready for import into GenMAPP. But before we do that, we want to know a few things about our data as shown in the next section. Upload both the .xls and .txt files that you have just created to your journal page in the class wiki. Make sure that your file name is distinct from your other classmates so that nobody overwrites anyone else's file.



File:ModifiedDataMerrellJwoodlee.gmf

File:ModifiedDataMerrellJwoodlee-Criterion1-GO filters.xls

File:ModifiedDataMerrellJwoodlee-Criterion1-GO.txt

File:ModifiedDataMerrellJwoodlee.gex

File:ModifiedDataMerrellJwoodlee.EX.txt

File:ModifiedDataMerrellJwoodleeText.txt

File:ModifiedDataMerrellJwoodlee.xls

File:Metal ion binding Jwoodlee.mapp


BIOL 367, Fall 2015, User Page, Team Page

Weekly Assignments Individual Journal Pages Shared Journal Pages