Kzebrows Week 8

From LMU BioDB 2015
Revision as of 17:22, 25 October 2015 by Kzebrows (Talk | contribs) (links to protocol and assignment pages)

Jump to: navigation, search

Electronic Lab Notebook

Statistical Analysis of Vibrio cholerae Microarray Data

The instructions below are adapted from the Sample Microarray Analysis page and the Protocols page. This page is hosted by OpenWetWare.org.

Normalize the log ratios for the set of slides in the experiment

First, I scaled and centered the data. I inserted a new Worksheet into my excel file and named it "scaled_centered". I went back to the "compiled_raw_data" worksheet, selected All and Copy, and went to the new "scaled_centered" worksheet. I copy and pasted by clicking on the upper hand left cell (A1) and pasting. I then inserted two rows in between the top row of headers and the first data row and typed "Average" in cell A2 and "StdDev" in type A3. By using the equation =AVERAGE(B4:B5224) and pressing enter in cell B2 and typing STDEV(B4:B5224) in cell B3, I calculated the average and standard deviation of the log ratios for each chip. I then copy and pasted all columns to the right of the last column and renamed them all AI_scaled_centered, A2_scaled_centered, etc. In cell N4 I typed the equation =(B4-B$2)/(B$3) which is the log change minus the mean divided by the st. dev. The mean and standard deviation values both have dollar signs because we did not want to change the reference mean and standard deviation. I copy and pasted this into the entire column using ctrl + copy and double clicking on the corner to copy the formula across the column, a trick that Dr. Dahlquist taught us that I ended up using often in this assignment. I then used the same scaling and centering equation in all "scaled_centered" columns.

Perform statistical analysis on the ratios

I inserted a new worksheet and named it "statistics" and copied the first ID column from the scaling_centering worksheet. I pasted the data into the new Statistics worksheet and then went back and copied all columns designated "scaled_centered" and pasted them as values into the new worksheet (starting in cell B1). I deleted rows 1 and 2 (average and standard deviation) and added a new column on the right with the headers "Avg_LogFC_A" and then two more columns with the same formula for B and C. I computed the average log fold change for each patient in each of these columns. Then, I computed an average of the patients' averages in the next column called "Avg_LogFC_all" all. I then inserted a new column next to that column and called it Tstat, into which I entered the equation =Average(N2:P2)/(STDEV(N2:P2)/SQRT(3)) and I copied and pasted this into all rows of the column. In this case, the number of replicates was 3. The next column I labeled "Pvalue". I entered the equation <code>=TDIST(ABS(R2),2,2) where the middle 2 is degrees of freedom. I copied and pasted it into all rows of the column.

Calculating the Bonferroni p value Correction

Next, to calculate the Bonferroni p value, I labeled the next two right columns Bonferroni and Bonferroni_Pvalue. I then typed the equation =S2 x 5221 into the column and replaced any corrected p value greater than 1 by the number 1 by using the formula =IF(T2>1,1,T2) in the Bonferroni_Pvalue column.

Calculate the Benjamini & Hochberg p value Correction

I inserted a new worksheet named B-H_Pvalue and copy and pasted the ID column from the statistics worksheet into this one. I inserted a new column on the left and named it MasterIndex. I then typed a 1 in cell A2 and a 2 in cell A3 and selected both cells. By double-clicking the + sign at the bottom right I filled the entire column with a series of numbers. I copied the unadjusted Pvalues from the previous worksheet and pasted them into Column C. I selected columns A, B, and C, and sorted them by ascending value. I ten typed Rank into cell D1 and ranked them from 1 to 5221 into this new column. To calculate the B-H p value I typed the equation =(C2*5221)/D2 into cell E1 and named the column B-H_Pvalue. In cell F2, I typed the formula =IF(E2>1,1,E2) and copied the equation in all of column F under the column heading B-H_Pvalue. I then sorted columns A through F in ascending order. I copied only column F, the B-H P value, and pasted the values into the next column on the right of the statistics worksheet.

Prepare File for GenMAPP

I inserted a new worksheet and named it forGenMAPP. I selected all from the statistics worksheet and copied the values and pasted them into the new worksheet. I selected columns B through Q and formatted them to 2 decimal places. I then selected all p value columns and formatted them to 4 decimal places. I deleted the left-most Bonferroni p value column and inserted a column to the right of the ID column, naming it SystemCode, and filled each column with the letter N.

I saved this as a txt file and as an Excel file in class. NOTE: Unfortunately, only my text file saved accurately. Because I used the txt file for GenMAPP, i was able to carry on for Part 2 of this assignment; however, I had to re-do parts of my Excel file and re-upload it. Below are the text file that I used and the original Excel file, which did not save correctly, as well as the re-done Excel spreadsheet.

File:Kzebrows microarray20151020.txt

File:Kzebrows microarray20151020.xls

File:Kzebrows microarrayanalysis20151025.xls


Part 2

  • I did 2009 and got 772 errors.
  • Mary did 2010 and got 121 errors.