Difference between revisions of "Kzebrows Week 14"

From LMU BioDB 2015
Jump to: navigation, search
(performing stat analysis on ratios, Bonferroni p-value corrections)
(minor heading edits)
Line 1: Line 1:
 
==Electronic Lab Notebook==
 
==Electronic Lab Notebook==
  
===Corrections from talk page===
+
====Corrections from talk page====
 
*Downloaded the most recent version of the file from the [[OTS Files | OTS Files]] page
 
*Downloaded the most recent version of the file from the [[OTS Files | OTS Files]] page
 
*Renamed all columns by replacing LR with LogFC
 
*Renamed all columns by replacing LR with LogFC
Line 16: Line 16:
 
The next portion of the assignment was done by following the instructions in [http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae Sample Microarray Analysis Vibrio cholerae].
 
The next portion of the assignment was done by following the instructions in [http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae Sample Microarray Analysis Vibrio cholerae].
  
===Normalizing the log ratios for the set of slides in the experiment===
+
====Normalizing the log ratios for the set of slides in the experiment====
 
*I inserted a new worksheet and named it Scaled_Centered
 
*I inserted a new worksheet and named it Scaled_Centered
 
*I copied all data from the MasterSheet and pasted it into cell A1 of the Scaled_Centered sheet
 
*I copied all data from the MasterSheet and pasted it into cell A1 of the Scaled_Centered sheet
Line 23: Line 23:
 
*I then copied the column headings for all data columns and pasted them to the right of the last column. Using the copy/paste tool I renamed each column with "_Scaled_Centered" at the end.  
 
*I then copied the column headings for all data columns and pasted them to the right of the last column. Using the copy/paste tool I renamed each column with "_Scaled_Centered" at the end.  
 
*In cell AM4 I typed =(C4-C$2)/C$3 indicating that I wanted data in cell C4 to have the average subtracted from it and then to divide it by the standard deviation. I used the "$" sign to indicate that I did not want the average and standard deviation values to change even when the equation was pasted for the entire column of genes.
 
*In cell AM4 I typed =(C4-C$2)/C$3 indicating that I wanted data in cell C4 to have the average subtracted from it and then to divide it by the standard deviation. I used the "$" sign to indicate that I did not want the average and standard deviation values to change even when the equation was pasted for the entire column of genes.
*I then copy and pasted that equation across the entire column by clikcing on the original cell and double-clicking on the black plus sign. I copy and pasted this equation for each column of the data.
+
*I then copy and pasted that equation across the entire column by clicking on the original cell and double-clicking on the black plus sign. I copy and pasted this equation for each column of the data.
  
===Perform statistical analysis on the ratios===
+
====Perform statistical analysis on the ratios====
 
*I inserted a new worksheet and named it "Statistics"
 
*I inserted a new worksheet and named it "Statistics"
 
*I copied and pasted the ID column from the ScalingCentering worksheet into the first column of the new worksheet
 
*I copied and pasted the ID column from the ScalingCentering worksheet into the first column of the new worksheet
Line 37: Line 37:
 
*I inserted a new column and called it "Pvalue". I entered the equation =TDIST(ABS(AA2),5,2) for AA2, the T-stat column and 5 degrees of freedom.
 
*I inserted a new column and called it "Pvalue". I entered the equation =TDIST(ABS(AA2),5,2) for AA2, the T-stat column and 5 degrees of freedom.
  
===Calculate Bonferroni p value correction===
+
====Calculate Bonferroni p value correction====
 
*I inserted a new column to the right with the label Bonferroni_Pvalue (did this twice)
 
*I inserted a new column to the right with the label Bonferroni_Pvalue (did this twice)
 
*In the first Bonferroni_Pvalue column I typed =AB2*3926 and copied this equation for the entire column.
 
*In the first Bonferroni_Pvalue column I typed =AB2*3926 and copied this equation for the entire column.
 
*In order to replace any corrected P value that was greater than 1 with the number 1, I typed the equation =IF(AC2>1,1,AC2) and pasted it through the entire second Bonferroni_Pvalue column.  
 
*In order to replace any corrected P value that was greater than 1 with the number 1, I typed the equation =IF(AC2>1,1,AC2) and pasted it through the entire second Bonferroni_Pvalue column.  
 
*I then saved the most updated version of the file to my ThawSpace to resume working on it on Thursday.
 
*I then saved the most updated version of the file to my ThawSpace to resume working on it on Thursday.

Revision as of 00:43, 2 December 2015

Electronic Lab Notebook

Corrections from talk page

  • Downloaded the most recent version of the file from the OTS Files page
  • Renamed all columns by replacing LR with LogFC
  • Re-named Sheet 1 "CompiledRawData" and copied all of the data from it and pasted it into Sheet 2, which I re-named "MasterSheet":
  • Deleted all ID columns except for column A, which I re-named "ID"
  • Inserted a column to the left of column B and re-named it "MasterIndex"
  • Typed "1" and cell B2 and 2 and Cell B3 and selected both cells. I double-clicked on the + sign to fill the entire column with numbers 1-4208.
  • Selected the data and sorted A-->Z on the "ID" column
  • Deleted rows that have "Empty" or "Blank####" ID. This left me with 3,926 files (3,927 minus 1 header row).
  • Sorted by MasterIndex column to put IDs back in original order, smallest to largest.
  • Replaced "Error" with nothing and got 585 replacements.
  • Copy and pasted data into Sheet 3, which I re-named "ScalingCentering".

The next portion of the assignment was done by following the instructions in Sample Microarray Analysis Vibrio cholerae.

Normalizing the log ratios for the set of slides in the experiment

  • I inserted a new worksheet and named it Scaled_Centered
  • I copied all data from the MasterSheet and pasted it into cell A1 of the Scaled_Centered sheet
  • I inserted two rows in between the top row of headers and first data row. I named cell A2 "Average" and named cell A3 "StdDev".
  • In cell C2 I typed =AVERAGE(C4:C3929) and in cell C3 I typed =STDEV(C4:C3929). I pressed enter and copied this equation across the rest of the columns through column AL.
  • I then copied the column headings for all data columns and pasted them to the right of the last column. Using the copy/paste tool I renamed each column with "_Scaled_Centered" at the end.
  • In cell AM4 I typed =(C4-C$2)/C$3 indicating that I wanted data in cell C4 to have the average subtracted from it and then to divide it by the standard deviation. I used the "$" sign to indicate that I did not want the average and standard deviation values to change even when the equation was pasted for the entire column of genes.
  • I then copy and pasted that equation across the entire column by clicking on the original cell and double-clicking on the black plus sign. I copy and pasted this equation for each column of the data.

Perform statistical analysis on the ratios

  • I inserted a new worksheet and named it "Statistics"
  • I copied and pasted the ID column from the ScalingCentering worksheet into the first column of the new worksheet
  • I copied all Scaled_Centered columns from the ScalingCentering worksheet and pasted the values into column B1 of the new sheet
  • Deleted "Average" and "StDev" columns
  • Decided to divide and conquer with the rest of the statistical testing. I did RX and Erich did RP. I deleted Erich's RP columns to make my calculations easier.
  • Inserted column to the right and typed headers Avg_LogFC RX-0.5-10, Avg_LogFC RX-0.5-30, Avg_LogFC RX-0.5-60, Avg_LogFC RX-1-10, Avg_LogFC RX-1-30, and Avg_LogFC RX-1-60 into the top cells of the next 6 columns.
  • Computed the average log fold change per treatment and time period by typing the following equations: =AVERAGE(B2:D2), =AVERAGE(E2:G2), =AVERAGE(H2:J2), =AVERAGE(K2:M2), =AVERAGE(N2:P2), and =AVERAGE(Q2:S2) below each corresponding Avg_LogFC column. I copied and pasted these equations for the whole column.
  • I then inserted a new column to the right and named it "Avg_LogFC_all" and typed the equation =AVERAGE(T2:Y2).
  • I inserted a new column next to Avg_LogFC_all and labeled it "Tstat". Into column AA2 I typed =AVERAGE(T2:Y2)/(STDEV(T2:Y2)/SQRT(6)), indicating 6 replicates. I copied this equation into the whole column.
  • I inserted a new column and called it "Pvalue". I entered the equation =TDIST(ABS(AA2),5,2) for AA2, the T-stat column and 5 degrees of freedom.

Calculate Bonferroni p value correction

  • I inserted a new column to the right with the label Bonferroni_Pvalue (did this twice)
  • In the first Bonferroni_Pvalue column I typed =AB2*3926 and copied this equation for the entire column.
  • In order to replace any corrected P value that was greater than 1 with the number 1, I typed the equation =IF(AC2>1,1,AC2) and pasted it through the entire second Bonferroni_Pvalue column.
  • I then saved the most updated version of the file to my ThawSpace to resume working on it on Thursday.