Eyanosch Week 14
Contents
- 1 Erich Yanoschik
- 2 Electronic lab notebook
- 2.1 Corrections from talk page
- 2.2 Normalizing the log ratios for the set of slides in the experiment
- 2.3 Perform statistical analysis on the ratios
- 2.4 Calculate Bonferroni p value correction
- 2.5 Analysis corrections
- 2.6 Prepare file for GenMAPP
- 2.7 Sanity Check
- 2.8 Running GenMAPP and MAPPFinder
- 2.9 Instructions from Dr. Dahlquist
Erich Yanoschik
Electronic lab notebook
Dividing the Statistical analysis by the two types of drugs (Kristin is doing RX and I am doing RP) Since we performed the same tasks for the two different types of drugs, I borrowed Kristin's electronic lab notebook layout and some information.
Corrections from talk page
- Renamed the columns by replacing LR with LogFC, Re-named Sheet 1 "MasterSheet".
- Renamed column A "ID"
- Inserted a ccolumn to the left of Column B and re-named it "MasterIndex". Typed "1" and cell B2 and 2 and Cell B3 and selected both cells. Filled in the column with numbers 1-4224
- Selected the data and sorted from A to Z on the "ID" column. Any rows that had "Empty" or "Blank#### IDs were deleted
- Sorted by the MasterIndex column to put the IDs back in order from 1 to 3926(new total # of files)
- 585 cells with "Error" messages replaced with nothing (left them empty)
- copied into Sheet 2 that was named "scalingCentering'
Followed the instructions from Sample Microarray Analysis Vibrio cholerae.
Normalizing the log ratios for the set of slides in the experiment
- Copied all the data from MasterSheet and named cell A2 "Average" and named cell A3 "StdDev".
- In cell C2 I typed =AVERAGE(C4:C3929) and in cell C3 I typed =STDEV(C4:C3929). I pressed enter and copied this equation across the rest of the columns through column AL.
- I then copied the column headings for all data columns and pasted them to the right of the last column, I renamed each column with "_Scaled_Centered" at the end
- In cell AM4 I typed =(C4-C$2)/C$3 indicating that I wanted data in cell C4 to have the average subtracted from it and then to divide it by the standard deviation.
- I then copy and pasted that equation across the entire column. I copied and pasted this equation for each column of the data.
Perform statistical analysis on the ratios
- Created a new sheet called "Statistics" and copied the information over
- I copied and pasted the ID column from the ScalingCentering worksheet into the first column of the new worksheet, copied all Scaled_Centered columns from the ScalingCentering worksheet, pasted the values into column B1 of the new sheet, and deleted "Average" and "StDev" columns
- Here is when we divided the work, Kristin worked on RX and I worked on RP. (I left the RX data in my excel sheet but only worked on RP)
- Inserted column to the right and typed headers Avg_LogFC RP-0.5-10, Avg_LogFC RP-0.5-30, Avg_LogFC RP-0.5-60, Avg_LogFC RP-1-10, Avg_LogFC RP-1-30, and Avg_LogFC RP-1-60 into the top cells of the next 6 columns.
- Computed the average log fold change per treatment and time period by typing the following equations: =AVERAGE(T2:V2), =AVERAGE(AB2:AD2), =AVERAGE(AJ2:AL2), =AVERAGE(AR2:AT2)), =AVERAGE(AZ2:BB2), and =AVERAGE(BH2:BJ2) below each corresponding Avg_LogFC column. Copied and pasted the equations throughout the entire column.
- I then inserted a new column to the right and named it "Avg_LogFC_all" and typed the equation =AVERAGE(T2:Y2).
- I inserted a new "Tstat" column next to Avg_LogFC. Into column AA2 I typed =AVERAGE(BE2:BG2)/(STDEV(BE2:BG2)/SQRT(6)), indicating 6 replicates. I copied this equation into the whole column. I also used this format for the Tstats of the other RP trials.
- I inserted a new column and called it "Pvalue". I entered the equation =TDIST(ABS(BI2),5,2) for BI2, the T-stat column and 5 degrees of freedom.
Calculate Bonferroni p value correction
- Inserted a new column to the right labeled Bonferroni_Pvalue (did this twice)
- In the first Bonferroni_Pvalue column I typed =BJ2*3928 and copied this equation for the entire column.
- In order to replace any corrected P value that was greater than 1 with the number 1, I typed the equation =IF(BK2>1,1,BK2) and pasted it through the entire second Bonferroni_Pvalue column.
- I then saved the most updated version of the file to my ThawSpace and completed the work on thursday.
Analysis corrections
After uploading my 12/1 file to the OTS Files page and consulting with Dr. Dahlquist, I realized that I needed to make several corrections. Originally, I calculated the T-stat, Pvalue, Bonferroni P value, and adjusted Bonferroni P value using a LogFC_all column, which was incorrect. Because this experiment involved multiple treatments (of which I was analyzing 6 for the RX samples) I needed to calculate those four things for all 6 treatments (RX-0.5-10, RX-0.5-30, RX-0.5-60, RX-1-10, RX-1-30, and RX-1-60). I inserted columns for all four items for all 6 treatments I was looking at, and following the same procedure as before, I calculated the T-stat, P value, Bonferroni P value, and adjusted Bonferroni P value for each treatment. Here are some sample equations for the RX-0.5-10 treatment:
- Column Z named Tstat_LogFC RX-0.5-10: =AVERAGE(B2:D2)/(STDEV(B2:D2)/SQRT(3)), denoting that I want a T statistic representative of all three RX-0.5-10 replicates.
- Column AA named Pvalue_LogFC RX-0.5-10: =TDIST(ABS(Z2),2,2), indicating that i wanted the P value calculated from the T statistic column.
- Column AB named Bonferroni_Pvalue_LogFC RX-0.5-10: =AA2*3926
- Column AC named Bonferroni_Pvalue_LogFC RX-0.5-10: =IF(AB2>1,1,AB2) in reference to the first Bonferroni P value column.
Dr Dahlquist also made a change to the file where she replaced all of the nothing boxes with a space/character/space. The amount of replacements was still the same as before but this was done to insure that the analysis was as accurate as possible. I proceeded to do a sanity check before moving on to the Benjamini-Hochberg corrections.
- This was taken from Kristens page, she talked with Dr. Dhalquist about the necessary corrections to complete our portion of the project.
Prepare file for GenMAPP
- I inserted a new worksheet and named it "forGenMAPP."
- I selected all from the Statistics worksheet and pasted the values on the new sheet.
- I selected all fold changes and changed them to 2 decimal places by selecting Format > Cells.
- I selected all columns with p values including the T stat column and changed them to 4 decimal places by selecting Format > Cells.
- I deleted the left-most Bonferroni p value columns (the ones without the "IF" statements) from each treatment.
- I inserted a column to the right of the ID column and named it "SystemCode". I filled the whole column with the letter N by typing N in the first cell after the heading and double-clicking on the little black cross.
- I selected File > Save As "Text (Tab-delimited) (*.txt). I clicked through the different warnings and uploaded both files to the team's Wiki page. (CompiledRaw_data_GenMAPP_ready)
Sanity Check
Next I performed a sanity check, I opened the "forGenMAPP" tab and selected "Custom Filter" from the filter options. I then performed the following procedure for all 6 RX treatments.
- Filtered P value column by less than 0.05, 0.01, and 0.001. Recorded results and percentages out of 3,926.
- Filtered Bonferroni p value column by less than 0.05 and recorded results and percentages out of 3,926. The table can be located on my updated excel file from 12/3 found on the team Wiki.
The RP tests that showed the most gene manipulations were the 1 MIC at 60 minutes and 30 minutes with 64.32% and 64.34% respectively. The next closest was RP at 0.5 MIC at the 30 minute marker with 41.14%.
Running GenMAPP and MAPPFinder
- upon running genmapp and producing the GO tree. A runtime error occurs ("Run-time error '53': File not found") when clicking on the GO terms.
Instructions from Dr. Dahlquist
- Increased LogFC>0.25 and p<0.05
- Decreased LogFC<-0.25 and p<0.05
- Do Benjamini-Hochberg correction for the treatment with the most significant genes out of all 6
- need to do this by tomorrow (the Benjamini-Hochberg correction)
All the current files are located under RP (Erich) of the Wiki page
Class Journals
Weekly Assignments
Personal Journal
Eyanosch Week 14
Electronic Notes (E-notes)
Class (personal) Notes