Difference between revisions of "Jwoodlee Week 8"
(→Electronic Lab Notebook: added part of the part 1 procedure) |
(→Part 1 Procedure: added part 1 procedure) |
||
Line 42: | Line 42: | ||
− | I inserted a new column next to the "Avg_LogFC_all" column and labeled the column "Tstat". | + | I inserted a new column next to the "Avg_LogFC_all" column and labeled the column "Tstat". I then entered the equation: |
− | =AVERAGE(N2:P2)/(STDEV(N2:P2)/SQRT(number of replicates)) | + | <code> =AVERAGE(N2:P2)/(STDEV(N2:P2)/SQRT(number of replicates)) </code> |
− | ( | + | (In this case the number of replicates was 3). I spread this equation out to the whole column. |
− | + | I labeled the top cell in the next column "Pvalue". In the cell below the label, I entered the equation: <code>=TDIST(ABS(R2),degrees of freedom,2)</code> | |
− | =TDIST(ABS(R2),degrees of freedom,2) | + | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
+ | The number of degrees of freedom is the number of replicates minus one, so in this case there were 2 degrees of freedom. I then pasted the equation throughout the whole column. | ||
+ | |||
+ | Now I had to adjust the pvalues to account for the multiple testing problem. I labeled the next two columns to the right with the same label, Bonferroni_Pvalue. | ||
+ | I typed the equation <code>=S2*5221</code>, and copied it to the entire first Bonferroni_Pvalue column. | ||
+ | In the second Bonferroni_Pvalue column I replaced any corrected p value that is greater than 1 by the number 1 by typing <code>=IF(T2>1,1,T2)</code> into the first cell below the second Bonferroni_Pvalue header. I then copied that equation into the entire column. | ||
+ | |||
+ | |||
+ | |||
+ | I inserted a new worksheet named "B-H_Pvalue" where the Benjamini and Hochberg Pvalue correction would be calculated. | ||
+ | I copy and pasted the ID column from the previous worksheet into this new one. | ||
+ | I inserted a new column on the very left and name it "MasterIndex". | ||
+ | I typed a "1" in cell A2 and a "2" in cell A3 and used this pattern to fill out the rest of the column with numbers 1-5221. | ||
+ | I pasted the only values(not the equations) of my unadjusted pvalues into my new worksheet at column C. | ||
+ | I then selected all of columns A, B, and C, and sorted by ascending values on Column C. | ||
+ | I added the header "Rank" in cell D1, where a series of numbers in ascending order from 1-5221 will reside. | ||
+ | This is the p value rank, smallest to largest. | ||
+ | I used the same trick as before to populate this column with ascending numbers. | ||
+ | I typed B-H_Pvalue in cell E1, and typed the following formula in cell E2: <code>=(C2*5221)/D2</code> and pressed enter and applied that to the whole column. | ||
+ | I then typed B_HPvalue into cell F1. | ||
+ | Then I added<code> =IF(E2>1,1,E2)</code> into cell F2 and pressed enter and applied it to the whole column. | ||
+ | I then sorted columns A through F in order by the MasterIndex. | ||
+ | I copied column F and used paste special to paste it into the next column on the right of the "statistics" sheet. | ||
+ | |||
+ | Now it was time to prepare the file for GenMapp. | ||
+ | I inserted a new worksheet and named it "forGenMAPP". | ||
+ | I then pasted the statistic worksheet into the forGenMapp worksheet by values. | ||
+ | To prepare the data for GenMapp, I selected Columns B through Q (all the fold changes), selected the menu item Format > Cells, and under the number tab, selected 2 decimal places. | ||
+ | I selected all the columns containing p values. Selected the menu item Format > Cells, and under the number tab, selected 4 decimal places. | ||
+ | I deleted the left-most Bonferroni p value column, and preserved the one that shows the result of the "if" statement. | ||
+ | I inserted a column to the right of the "ID" column, typed the header "SystemCode" into the top cell of this column, and filled the entire column (each cell) with the letter "N". | ||
+ | I selected the menu item File > Save As, and choose "Text (Tab-delimited) (*.txt)" from the file type drop-down menu. This text file would be what I use later on when I am working with GenMapp | ||
Revision as of 22:32, 25 October 2015
Electronic Lab Notebook
I downloaded the raw data from the Merrell et al. experiment, found here.
I then followed the procedure on that page:
Part 1 Procedure
Procedure taken from [1]. Reworded to reflect what I did.
I scaled and centered the data (between chip normalization) by performing the following operations: Inserted a new Worksheet into the excel file, and named it "scaled_centered". Went back to the "compiled_raw_data" worksheet, selected all(command A) and copied(command C). Went to the "scaled_centered" worksheet, clicked on the upper, left-hand cell and pasted.
Inserted two rows in between the top row of headers and the first data row.
In cell A2, typed "Average" and in cell A3, typed "StdDev".
I then computed the average for the first column. In cell B2, I typed the following equation:
=AVERAGE(B4:B5224)
and pressed "Enter".
I then computed the standard deviation for the first column. In cell B3, I typed the following equation:
=STDEV(B4:B5224)
and pressed "Enter".
I copied these two equations (cells B2 and B3) and pasted them into the empty cells in the rest of the columns to calculate for the rest of the columns.
I copied the column headings for all of my data columns and then pasted them to the right of the last data column so that I had a second set of headers above blank columns of cells. I then edited the names of the columns so that they now read: A1_scaled_centered, A2_scaled_centered, etc. In cell N4, I typed the following equation: =(B4-B$2)/B$3 We wanted the data in cell B4 to have the average subtracted from it (cell B2) and be divided by the standard deviation (cell B3). The dollar signs make sure that the proper cells are always used. I then copied and pasted this equation into the entire column across all rows.
In order to perform statistical analysis on the data, I inserted a new worksheet called "statistics". Went back to the "scaled_centered" worksheet and copied the ID column and pasted the data from that column into the first column of the statistics worksheet.
I went back to the "scaling_centering" worksheet and copied the columns that are designated "scaled_centered".
I went back to my new worksheet and clicked on the B1 cell. I then selected "Paste Special" from the Edit menu so I could get the values from the previous worksheet without the equations. I then deleted rows 2 and 3 that were there previously for the average and standard deviation. I then added 3 new headers to blank columns to the right: "Avg_LogFC_A", "Avg_LogFC_B", and "Avg_LogFC_C".
I then computed the average log fold change for the replicates for each patient by typing the equation:
=AVERAGE(B2:E2)
into cell N2. I then pasted this equation into the rest of the column.
I then created the equations for patients B and C and pasted them into their respective columns. The equations are: =AVERAGE(J2:M2)
and =AVERAGE(F2:I2)
I then got the average of the averages and put them into the next column on the right with the header: "Avg_LogFC_all". The equation is: =AVERAGE(N2:P2)
I inserted a new column next to the "Avg_LogFC_all" column and labeled the column "Tstat". I then entered the equation:
=AVERAGE(N2:P2)/(STDEV(N2:P2)/SQRT(number of replicates))
(In this case the number of replicates was 3). I spread this equation out to the whole column.
I labeled the top cell in the next column "Pvalue". In the cell below the label, I entered the equation: =TDIST(ABS(R2),degrees of freedom,2)
The number of degrees of freedom is the number of replicates minus one, so in this case there were 2 degrees of freedom. I then pasted the equation throughout the whole column.
Now I had to adjust the pvalues to account for the multiple testing problem. I labeled the next two columns to the right with the same label, Bonferroni_Pvalue.
I typed the equation =S2*5221
, and copied it to the entire first Bonferroni_Pvalue column.
In the second Bonferroni_Pvalue column I replaced any corrected p value that is greater than 1 by the number 1 by typing =IF(T2>1,1,T2)
into the first cell below the second Bonferroni_Pvalue header. I then copied that equation into the entire column.
I inserted a new worksheet named "B-H_Pvalue" where the Benjamini and Hochberg Pvalue correction would be calculated.
I copy and pasted the ID column from the previous worksheet into this new one.
I inserted a new column on the very left and name it "MasterIndex".
I typed a "1" in cell A2 and a "2" in cell A3 and used this pattern to fill out the rest of the column with numbers 1-5221.
I pasted the only values(not the equations) of my unadjusted pvalues into my new worksheet at column C.
I then selected all of columns A, B, and C, and sorted by ascending values on Column C.
I added the header "Rank" in cell D1, where a series of numbers in ascending order from 1-5221 will reside.
This is the p value rank, smallest to largest.
I used the same trick as before to populate this column with ascending numbers.
I typed B-H_Pvalue in cell E1, and typed the following formula in cell E2: =(C2*5221)/D2
and pressed enter and applied that to the whole column.
I then typed B_HPvalue into cell F1.
Then I added =IF(E2>1,1,E2)
into cell F2 and pressed enter and applied it to the whole column.
I then sorted columns A through F in order by the MasterIndex.
I copied column F and used paste special to paste it into the next column on the right of the "statistics" sheet.
Now it was time to prepare the file for GenMapp. I inserted a new worksheet and named it "forGenMAPP". I then pasted the statistic worksheet into the forGenMapp worksheet by values. To prepare the data for GenMapp, I selected Columns B through Q (all the fold changes), selected the menu item Format > Cells, and under the number tab, selected 2 decimal places. I selected all the columns containing p values. Selected the menu item Format > Cells, and under the number tab, selected 4 decimal places. I deleted the left-most Bonferroni p value column, and preserved the one that shows the result of the "if" statement. I inserted a column to the right of the "ID" column, typed the header "SystemCode" into the top cell of this column, and filled the entire column (each cell) with the letter "N". I selected the menu item File > Save As, and choose "Text (Tab-delimited) (*.txt)" from the file type drop-down menu. This text file would be what I use later on when I am working with GenMapp
File:ModifiedDataMerrellJwoodlee.gmf
File:ModifiedDataMerrellJwoodlee-Criterion1-GO filters.xls
File:ModifiedDataMerrellJwoodlee-Criterion1-GO.txt
File:ModifiedDataMerrellJwoodlee.gex
File:ModifiedDataMerrellJwoodlee.EX.txt
File:ModifiedDataMerrellJwoodleeText.txt
File:ModifiedDataMerrellJwoodlee.xls
File:Metal ion binding Jwoodlee.mapp
BIOL 367, Fall 2015, User Page, Team Page
Weekly Assignments | Individual Journal Pages | Shared Journal Pages |
---|---|---|
|
|
|