Difference between revisions of "Eyanosch Week 14"

From LMU BioDB 2015
Jump to: navigation, search
(set up a layout for this weeks electronic lab notebook)
 
(Electronic lab notebook: added the notes for this past weeks notebook, the steps taken to where we are now.)
Line 3: Line 3:
 
==Electronic lab notebook==
 
==Electronic lab notebook==
  
Dividing the Statistical analysis by the two types of drugs (Kristin is doing RX and Erich is doing RP) Since we performed the same tasks for the two different types of drugs, I borrowed Kristin's electronic lab notebook layout.  
+
Dividing the Statistical analysis by the two types of drugs (Kristin is doing RX and I am doing RP) Since we performed the same tasks for the two different types of drugs, I borrowed Kristin's electronic lab notebook layout and some information.  
  
 
====Corrections from talk page====
 
====Corrections from talk page====
 +
*Renamed the columns by replacing LR with LogFC, Re-named Sheet 1 "MasterSheet".
 +
*Renamed column A "ID"
 +
*Inserted a ccolumn to the left of Column B and re-named it "MasterIndex". Typed "1" and cell B2 and 2 and Cell B3 and selected both cells. Filled in the column with numbers 1-4224
 +
*Selected the data and sorted from A to Z on the "ID" column. Any rows that had "Empty" or "Blank#### IDs were deleted
 +
*Sorted by the MasterIndex column to put the IDs back in order from 1 to 3926(new total # of files)
 +
*585 cells with "Error" messages replaced with nothing (left them empty)
 +
*copied into Sheet 2 that was named "scalingCentering'
 +
 +
Followed the instructions from [http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae Sample Microarray Analysis Vibrio cholerae].
  
 
====Normalizing the log ratios for the set of slides in the experiment====
 
====Normalizing the log ratios for the set of slides in the experiment====
 +
*Copied all the data from MasterSheet and named cell A2 "Average" and named cell A3 "StdDev".
 +
*In cell C2 I typed =AVERAGE(C4:C3929) and in cell C3 I typed =STDEV(C4:C3929). I pressed enter and copied this equation across the rest of the columns through column AL.
 +
*I then copied the column headings for all data columns and pasted them to the right of the last column, I renamed each column with "_Scaled_Centered" at the end
 +
*In cell AM4 I typed =(C4-C$2)/C$3 indicating that I wanted data in cell C4 to have the average subtracted from it and then to divide it by the standard deviation.
 +
*I then copy and pasted that equation across the entire column. I copied and pasted this equation for each column of the data.
 +
 +
====Perform statistical analysis on the ratios====
 +
*Created a new sheet called "Statistics" and copied the information over
 +
*I copied and pasted the ID column from the ScalingCentering worksheet into the first column of the new worksheet, copied all Scaled_Centered columns from the ScalingCentering worksheet, pasted the values into column B1 of the new sheet, and deleted "Average" and "StDev" columns
 +
*Here is when we divided the work, Kristin worked on RX and I worked on RP. (I left the RX data in my excel sheet but only worked on RP)
 +
*Inserted column to the right and typed headers Avg_LogFC RP-0.5-10, Avg_LogFC RP-0.5-30, Avg_LogFC RP-0.5-60, Avg_LogFC RP-1-10, Avg_LogFC RP-1-30, and Avg_LogFC RP-1-60 into the top cells of the next 6 columns.
 +
*Computed the average log fold change per treatment and time period by typing the following equations: =AVERAGE(T2:V2), =AVERAGE(AB2:AD2), =AVERAGE(AJ2:AL2), =AVERAGE(AR2:AT2)), =AVERAGE(AZ2:BB2), and =AVERAGE(BH2:BJ2) below each corresponding Avg_LogFC column. Copied and pasted the equations throughout the entire column.
 +
*I then inserted a new column to the right and named it "Avg_LogFC_all" and typed the equation =AVERAGE(T2:Y2).
 +
*I inserted a new "Tstat" column next to Avg_LogFC. Into column AA2 I typed =AVERAGE(BE2:BG2)/(STDEV(BE2:BG2)/SQRT(6)), indicating 6 replicates. I copied this equation into the whole column. I also used this format for the Tstats of the other RP trials.
 +
*I inserted a new column and called it "Pvalue". I entered the equation =TDIST(ABS(BI2),5,2) for BI2, the T-stat column and 5 degrees of freedom.
  
 
====Calculate Bonferroni p value correction====
 
====Calculate Bonferroni p value correction====
 +
*Inserted a new column to the right labeled Bonferroni_Pvalue (did this twice)
 +
*In the first Bonferroni_Pvalue column I typed =BJ2*3928 and copied this equation for the entire column.
 +
*In order to replace any corrected P value that was greater than 1 with the number 1, I typed the equation =IF(BK2>1,1,BK2) and pasted it through the entire second Bonferroni_Pvalue column.
 +
*I then saved the most updated version of the file to my ThawSpace and completed the work on thursday.
  
 
====Analysis corrections====
 
====Analysis corrections====
 +
After uploading my 12/1 file to the [[OTS Files | OTS Files]] page and consulting with Dr. Dahlquist, I realized that I needed to make several corrections. Originally, I calculated the T-stat, Pvalue, Bonferroni P value, and adjusted Bonferroni P value using a LogFC_all column, which was incorrect. Because this experiment involved multiple treatments (of which I was analyzing 6 for the RX samples) I needed to calculate those four things for all 6 treatments (RX-0.5-10, RX-0.5-30, RX-0.5-60, RX-1-10, RX-1-30, and RX-1-60). I inserted columns for all four items for all 6 treatments I was looking at, and following the same procedure as before, I calculated the T-stat, P value, Bonferroni P value, and adjusted Bonferroni P value for each treatment. Here are some sample equations for the RX-0.5-10 treatment:
 +
*Column Z named Tstat_LogFC RX-0.5-10: =AVERAGE(B2:D2)/(STDEV(B2:D2)/SQRT(3)), denoting that I want a T statistic representative of all three RX-0.5-10 replicates.
 +
*Column AA named Pvalue_LogFC RX-0.5-10: =TDIST(ABS(Z2),2,2), indicating that i wanted the P value calculated from the T statistic column.
 +
*Column AB named Bonferroni_Pvalue_LogFC RX-0.5-10: =AA2*3926
 +
*Column AC named Bonferroni_Pvalue_LogFC RX-0.5-10: =IF(AB2>1,1,AB2) in reference to the first Bonferroni P value column.
 +
 +
Dr Dahlquist also made a change to the file where she replaced all of the nothing boxes with a space/character/space. The amount of replacements was still the same as before but this was done to insure that the analysis was as accurate as possible. I proceeded to do a sanity check before moving on to the Benjamini-Hochberg corrections.
 +
*This was taken from Kristens page, she talked with Dr. Dhalquist about the necessary corrections to complete our portion of the project.
  
 
====Prepare file for GenMAPP====
 
====Prepare file for GenMAPP====
 +
*I inserted a new worksheet and named it "forGenMAPP."
 +
*I selected all from the Statistics worksheet and pasted the values on the new sheet.
 +
*I selected all fold changes and changed them to 2 decimal places by selecting Format > Cells.
 +
*I selected all columns with p values including the T stat column and changed them to 4 decimal places by selecting Format > Cells.
 +
*I deleted the left-most Bonferroni p value columns (the ones without the "IF" statements) from each treatment.
 +
*I inserted a column to the right of the ID column and named it "SystemCode". I filled the whole column with the letter N by typing N in the first cell after the heading and double-clicking on the little black cross.
 +
*I selected File > Save As "Text (Tab-delimited) (*.txt). I clicked through the different warnings and uploaded both files to the team's [[OTS Files | Wiki page]]. (CompiledRaw_data_GenMAPP_ready)
  
 
====Sanity Check====
 
====Sanity Check====
 
+
Next I performed a sanity check, I opened the "forGenMAPP" tab and selected "Custom Filter" from the filter options. I then performed the following procedure for all 6 RX treatments.
====Benjamini & Hochberg p value correction====
+
*Filtered P value column by less than 0.05, 0.01, and 0.001. Recorded results and percentages out of 3,926.
 +
*Filtered Bonferroni p value column by less than 0.05 and recorded results and percentages out of 3,926. The table can be located on my updated excel file from 12/3 found on the team Wiki.
 +
The RP tests that showed the most gene manipulations were the 1 MIC at 60 minutes and 30 minutes with 64.32% and 64.34% respectively.  The next closest was RP at 0.5 MIC at the 30 minute marker with 41.14%.
  
 
====Running GenMAPP and MAPPFinder====
 
====Running GenMAPP and MAPPFinder====
 +
*upon running genmapp and producing the GO tree. A runtime error occurs ("Run-time error '53': File not found") when clicking on the GO terms.
  
 
====Instructions from Dr. Dahlquist====
 
====Instructions from Dr. Dahlquist====
Line 25: Line 71:
 
*Decreased LogFC<-0.25 and p<0.05
 
*Decreased LogFC<-0.25 and p<0.05
 
*Do Benjamini-Hochberg correction for the treatment with the most significant genes out of all 6
 
*Do Benjamini-Hochberg correction for the treatment with the most significant genes out of all 6
 +
*need to do this by tomorrow (the Benjamini-Hochberg correction)
  
 
+
All the current files are located under RP (Erich) of the [[OTS Files | Wiki page]]
*3,926 records left after deleting blanks and empty ID's
+
*585 cells with "Error" messages replaced with nothing (left them empty)
+
 
+
created MasterIndex Sheet and imported the data from Sheet CompiledRawData
+
added a MasterIndex Column with values 1-4224
+
 
+
*created new statistics worksheet and imported the data from the ID column and the _Scaled_Centered Columns
+
*Then the Standard deviation row and average are deleted
+
 
+
Dividing the Statistical analysis by the two types of drugs (Kristin is doing RX and Erich is doing RP)
+
There will be 6 averages (3 for the 1 x MIC at time points 10, 30, 60) and (3 for the 0.5 x MIC at time points 10, 30, 60)
+
 
+
 
+
 
+
48 replacements (replacing periods with the letters  pt)
+
 
+
Number of errors--> 416
+

Revision as of 07:47, 8 December 2015

Erich Yanoschik

Electronic lab notebook

Dividing the Statistical analysis by the two types of drugs (Kristin is doing RX and I am doing RP) Since we performed the same tasks for the two different types of drugs, I borrowed Kristin's electronic lab notebook layout and some information.

Corrections from talk page

  • Renamed the columns by replacing LR with LogFC, Re-named Sheet 1 "MasterSheet".
  • Renamed column A "ID"
  • Inserted a ccolumn to the left of Column B and re-named it "MasterIndex". Typed "1" and cell B2 and 2 and Cell B3 and selected both cells. Filled in the column with numbers 1-4224
  • Selected the data and sorted from A to Z on the "ID" column. Any rows that had "Empty" or "Blank#### IDs were deleted
  • Sorted by the MasterIndex column to put the IDs back in order from 1 to 3926(new total # of files)
  • 585 cells with "Error" messages replaced with nothing (left them empty)
  • copied into Sheet 2 that was named "scalingCentering'

Followed the instructions from Sample Microarray Analysis Vibrio cholerae.

Normalizing the log ratios for the set of slides in the experiment

  • Copied all the data from MasterSheet and named cell A2 "Average" and named cell A3 "StdDev".
  • In cell C2 I typed =AVERAGE(C4:C3929) and in cell C3 I typed =STDEV(C4:C3929). I pressed enter and copied this equation across the rest of the columns through column AL.
  • I then copied the column headings for all data columns and pasted them to the right of the last column, I renamed each column with "_Scaled_Centered" at the end
  • In cell AM4 I typed =(C4-C$2)/C$3 indicating that I wanted data in cell C4 to have the average subtracted from it and then to divide it by the standard deviation.
  • I then copy and pasted that equation across the entire column. I copied and pasted this equation for each column of the data.

Perform statistical analysis on the ratios

  • Created a new sheet called "Statistics" and copied the information over
  • I copied and pasted the ID column from the ScalingCentering worksheet into the first column of the new worksheet, copied all Scaled_Centered columns from the ScalingCentering worksheet, pasted the values into column B1 of the new sheet, and deleted "Average" and "StDev" columns
  • Here is when we divided the work, Kristin worked on RX and I worked on RP. (I left the RX data in my excel sheet but only worked on RP)
  • Inserted column to the right and typed headers Avg_LogFC RP-0.5-10, Avg_LogFC RP-0.5-30, Avg_LogFC RP-0.5-60, Avg_LogFC RP-1-10, Avg_LogFC RP-1-30, and Avg_LogFC RP-1-60 into the top cells of the next 6 columns.
  • Computed the average log fold change per treatment and time period by typing the following equations: =AVERAGE(T2:V2), =AVERAGE(AB2:AD2), =AVERAGE(AJ2:AL2), =AVERAGE(AR2:AT2)), =AVERAGE(AZ2:BB2), and =AVERAGE(BH2:BJ2) below each corresponding Avg_LogFC column. Copied and pasted the equations throughout the entire column.
  • I then inserted a new column to the right and named it "Avg_LogFC_all" and typed the equation =AVERAGE(T2:Y2).
  • I inserted a new "Tstat" column next to Avg_LogFC. Into column AA2 I typed =AVERAGE(BE2:BG2)/(STDEV(BE2:BG2)/SQRT(6)), indicating 6 replicates. I copied this equation into the whole column. I also used this format for the Tstats of the other RP trials.
  • I inserted a new column and called it "Pvalue". I entered the equation =TDIST(ABS(BI2),5,2) for BI2, the T-stat column and 5 degrees of freedom.

Calculate Bonferroni p value correction

  • Inserted a new column to the right labeled Bonferroni_Pvalue (did this twice)
  • In the first Bonferroni_Pvalue column I typed =BJ2*3928 and copied this equation for the entire column.
  • In order to replace any corrected P value that was greater than 1 with the number 1, I typed the equation =IF(BK2>1,1,BK2) and pasted it through the entire second Bonferroni_Pvalue column.
  • I then saved the most updated version of the file to my ThawSpace and completed the work on thursday.

Analysis corrections

After uploading my 12/1 file to the OTS Files page and consulting with Dr. Dahlquist, I realized that I needed to make several corrections. Originally, I calculated the T-stat, Pvalue, Bonferroni P value, and adjusted Bonferroni P value using a LogFC_all column, which was incorrect. Because this experiment involved multiple treatments (of which I was analyzing 6 for the RX samples) I needed to calculate those four things for all 6 treatments (RX-0.5-10, RX-0.5-30, RX-0.5-60, RX-1-10, RX-1-30, and RX-1-60). I inserted columns for all four items for all 6 treatments I was looking at, and following the same procedure as before, I calculated the T-stat, P value, Bonferroni P value, and adjusted Bonferroni P value for each treatment. Here are some sample equations for the RX-0.5-10 treatment:

  • Column Z named Tstat_LogFC RX-0.5-10: =AVERAGE(B2:D2)/(STDEV(B2:D2)/SQRT(3)), denoting that I want a T statistic representative of all three RX-0.5-10 replicates.
  • Column AA named Pvalue_LogFC RX-0.5-10: =TDIST(ABS(Z2),2,2), indicating that i wanted the P value calculated from the T statistic column.
  • Column AB named Bonferroni_Pvalue_LogFC RX-0.5-10: =AA2*3926
  • Column AC named Bonferroni_Pvalue_LogFC RX-0.5-10: =IF(AB2>1,1,AB2) in reference to the first Bonferroni P value column.

Dr Dahlquist also made a change to the file where she replaced all of the nothing boxes with a space/character/space. The amount of replacements was still the same as before but this was done to insure that the analysis was as accurate as possible. I proceeded to do a sanity check before moving on to the Benjamini-Hochberg corrections.

  • This was taken from Kristens page, she talked with Dr. Dhalquist about the necessary corrections to complete our portion of the project.

Prepare file for GenMAPP

  • I inserted a new worksheet and named it "forGenMAPP."
  • I selected all from the Statistics worksheet and pasted the values on the new sheet.
  • I selected all fold changes and changed them to 2 decimal places by selecting Format > Cells.
  • I selected all columns with p values including the T stat column and changed them to 4 decimal places by selecting Format > Cells.
  • I deleted the left-most Bonferroni p value columns (the ones without the "IF" statements) from each treatment.
  • I inserted a column to the right of the ID column and named it "SystemCode". I filled the whole column with the letter N by typing N in the first cell after the heading and double-clicking on the little black cross.
  • I selected File > Save As "Text (Tab-delimited) (*.txt). I clicked through the different warnings and uploaded both files to the team's Wiki page. (CompiledRaw_data_GenMAPP_ready)

Sanity Check

Next I performed a sanity check, I opened the "forGenMAPP" tab and selected "Custom Filter" from the filter options. I then performed the following procedure for all 6 RX treatments.

  • Filtered P value column by less than 0.05, 0.01, and 0.001. Recorded results and percentages out of 3,926.
  • Filtered Bonferroni p value column by less than 0.05 and recorded results and percentages out of 3,926. The table can be located on my updated excel file from 12/3 found on the team Wiki.

The RP tests that showed the most gene manipulations were the 1 MIC at 60 minutes and 30 minutes with 64.32% and 64.34% respectively. The next closest was RP at 0.5 MIC at the 30 minute marker with 41.14%.

Running GenMAPP and MAPPFinder

  • upon running genmapp and producing the GO tree. A runtime error occurs ("Run-time error '53': File not found") when clicking on the GO terms.

Instructions from Dr. Dahlquist

  • Increased LogFC>0.25 and p<0.05
  • Decreased LogFC<-0.25 and p<0.05
  • Do Benjamini-Hochberg correction for the treatment with the most significant genes out of all 6
  • need to do this by tomorrow (the Benjamini-Hochberg correction)

All the current files are located under RP (Erich) of the Wiki page