Difference between revisions of "Jkuroda Week 8"
From LMU BioDB 2015
(added exception file) |
(gex file) |
||
Line 4: | Line 4: | ||
[[Media:Merrell Compiled Raw Data Vibrio JK 2015-10-20.EX.txt | exception file (2010 version)]] | [[Media:Merrell Compiled Raw Data Vibrio JK 2015-10-20.EX.txt | exception file (2010 version)]] | ||
+ | |||
+ | [[Media:Merrell Compiled Raw Data Vibrio JK 2015-10-20.gex | gex file (2010 version)]] | ||
== Normalized the log ratios for the set of slides in the experiment == | == Normalized the log ratios for the set of slides in the experiment == |
Revision as of 22:50, 20 October 2015
Contents
Normalized the log ratios for the set of slides in the experiment
To scale and center the data (between chip normalization) I performed the following operations:
- Inserted a new Worksheet into my Excel file, and named it "scaled_centered".
- Went back to the "compiled_raw_data" worksheet, Selected All and Copy. Went to my new "scaled_centered" worksheet, clicked on the upper, left-hand cell (cell A1) and Pasted.
- Inserted two rows in between the top row of headers and the first data row.
- In cell A2, typed "Average" and in cell A3, typed "StdDev".
- I computed the Average log ratio for each chip (each column of data). In cell B2, I typed the following equation:
=AVERAGE(B4:B5224)
- and pressed "Enter". Excel computed the average value of the cells specified in the range given inside the parentheses. Instead of typing the cell designations, I clicked on the beginning cell, scrolled down to the bottom of the worksheet, and shift-clicked on the ending cell.
- I then computed the Standard Deviation of the log ratios on each chip (each column of data). In cell B3, I typed the following equation:
=STDEV(B4:B5224)
- and pressed "Enter".
- Excel then did some of the work for me. I copied these two equations (cells B2 and B3) and pasted them into the empty cells in the rest of the columns. Excel automatically changed the equation to match the cell designations for those columns.
- I have now computed the average and standard deviation of the log ratios for each chip. Now I actually did the scaling and centering based on these values.
- I copied the column headings for all of my data columns and pasted them to the right of the last data column so that I had a second set of headers above the blank columns of cells. I edited the names of the columns so that they read as: A1_scaled_centered, A2_scaled_centered, etc.
- In cell N4, I typed the following equation:
=(B4-B$2)/B$3
- In this case, I wanted the data in cell B4 to have the average subtracted from it (cell B2) and be divided by the standard deviation (cell B3). I used the dollar sign symbols in front of the "2" and "3" to tell Excel to always reference that row in the equation, even though I pasted it for the entire column of 5221 genes.
- I copied and pasted this equation into the entire column.
- I then copied and pasted the scaling and centering equation for each of the columns of data with the "_scaled_centered" column header.
Performed statistical analysis on the ratios
I performed this step on the scaled and centered data I produced in the previous step.
- Inserted a new worksheet and named it "statistics".
- Went back to the "scaling_centering" worksheet and copied the first column ("ID").
- Pasted the data into the first column of my "statistics" worksheet.
- Went back to the "scaling_centering" worksheet and coped the columns that were designated "_scaled_centered".
- Went to my new worksheet and clicked on the B1 cell. Selected "Paste Special" from the Edit menu. A window opened: clicked on the button for "Values" and clicked OK. This pasted the numerical result into my new worksheet instead of the equation which must make calculations on the fly.
- Deleted Rows 2 and 3 where it said "Average" and "StDev" so that my data rows with gene IDs were immediately below the header row 1.
- Went to a new column on the right of my worksheet. Typed the header "Avg_LogFC_A", "Avg_LogFC_B", and "Avg_LogFC_C" into the top cell of the next three columns.
- Computed the average log fold change for the replicates for each patient by typing the equation:
=AVERAGE(B2:E2) into cell N2. Copied this equation and pasted it into the rest of the column.
- Created the equation for patients B and C and pasted it into their respective columns.
- Then I computed the average of the averages. Typed the header "Avg_LogFC_all" into the first cell in the next empty column. Created the equation that will compute the average of the three previous averages I calculated and pasted it into this entire column.
- Inserted a new column next to the "Avg_LogFC_all" column that I computed in the previous step. Labeled the column "Tstat". This computed a T statistic that told me whether the scaled and centered average log ratio was significantly different than 0 (no change). Entered the equation:
=AVERAGE(N2:P2)/(STDEV(N2:P2)/SQRT(3))
Copied the equation and pasted it into all rows in that column.
- Labeled the top cell in the next column "Pvalue". In the cell below the label, entered the equation:
=TDIST(ABS(R2),2,2)
Copied the equation and pasted it into all rows in that column.
Calculated the Bonferroni p value Correction
- Next I performed adjustments to the p value to correct for the multiple testing problem. Labeled the next two columns to the right with the same label, Bonferroni_Pvalue.
- Typed the equation
=S2*5221
, Upon completion of this single computation, used the trick to copy the formula throughout the column. - Replaced all corrected p values that were greater than 1 by the number 1 by typing the following formula into the first cell below the second Bonferroni_Pvalue header:
=IF(T2>1,1,T2)
. Used the trick to copy the formula throughout the column.
Calculated the Benjamini & Hochberg p value Correction
- Inserted a new worksheet named "B-H_Pvalue".
- Copied and pasted the "ID" column from my previous worksheet into the first column of the new worksheet.
- Inserted a new column on the very left and named it "MasterIndex". I created a numerical index of genes so that I can always sort them back into the same order.
- Typed a "1" in cell A2 and a "2" in cell A3.
- Selected both cells and filled the entire column with a series of numbers from 1 to 5221.
- Copied my unadjusted p values from my previous worksheet and pasted it into Column C.
- Selected all of columns A, B, and C. Sorted by ascending values on Column C. Clicked the sort button from A to Z on the toolbar and sorted by column C, smallest to largest.
- Typed the header "Rank" in cell D1. I created a series of numbers in ascending order from 1 to 5221 in this column. Typed "1" into cell D2 and "2" into cell D3. Selected both cells D2 and D3 and filled the column with a series of numbers from 1 to 5221.
- Then I calculated the Benjamini and Hochberg p value correction by typing B-H_Pvalue in cell E1. Typed the following formula in cell E2:
=(C2*5221)/D2
and pressed enter. Copied that equation to the entire column. - Typed "B-H_Pvalue" into cell F1.
- Typed the following formula into cell F2:
=IF(E2>1,1,E2)
and pressed enter. Copied that equation to the entire column. - Selected columns A through F. Then I sorted them by my MasterIndex in Column A in ascending order.
- Copied column F and pasted it into the next column on the right of my "statistics" sheet.
Prepare file for GenMAPP
- Inserted a new worksheet and named it "forGenMAPP".
- Went back to the "statistics" worksheet and Selected All and Copied.
- Went to my new sheet and clicked on cell A1 and selected Paste Special, clicked on the Values button, and clicked OK. I then formated this worksheet for import into GenMAPP and:
- Selected Columns B through Q (all the fold changes). Selected the menu item Format > Cells. Under the number tab, selected 2 decimal places. Clicked OK.
- Selected all the columns containing p values. Selected the menu item Format > Cells. Under the number tab, selected 4 decimal places. Clicked OK.
- Deleted the left-most Bonferroni p value column, thus preserving the one that shows the result of my "if" statement.
- Inserted a column to the right of the "ID" column. Typed the header "SystemCode" into the top cell of this column. Filled the entire column with the letter "N".
- Selected the menu item File > Save As, and chose "Text (Tab-delimited) (*.txt)" from the file type drop-down menu. My new *.txt file is now ready for import into GenMAPP.
- Uploaded both the .xls and .txt files that you have just created to my journal page in the class wiki.
Individual Journal Entries
- Week 2
- Week 3
- Week 4
- Week 5
- Week 6
- Week 7
- Week 8
- Week 9
- Week 10
- Week 11
- Week 12
- Week 13
- Week 14
- Week 15