New Wiki Page
From LMU BioDB 2013
Revision as of 21:28, 18 May 2015 by Kevinmcgee (Talk | contribs)
Contents |
5/18/2015
Microarray Data analysis Workflow
- Set browser to send downloads to Desktop
- Followed the Protocal found on OpenWetWare:
Installing R 3.1.0 and the limma package
The following protocol was developed to normalize GCAT and Ontario DNA microarray chip data from the Dahlquist lab using the R Statistical Software and the limma package (part of the Bioconductor Project).
- The normalization procedure has been verified to work with version 3.1.0 of R released in April 2014 (link to download site) and and version 3.20.1 of the limma package ( direct link to download zipped file) on the Windows 7 platform.
- Note that using other versions of R or the limma package might give different results.
- Note also that using the 32-bit versus the 64-bit versions of R 3.1.0 will give different results for the normalization out in the 10-13 or 10-14 decimal place. The Dahlquist Lab is standardizing on using the 64-bit version of R.
- To install R for the first time, download and run the installer from the link above, accepting the default installation.
- To use the limma package, unzip the file and place the contents into a folder called "limma" in the library directory of the R program. If you accept the default location, that will be C:\Program Files\R\R-3.1.0\library (this will be different on the computers in S120 since you do not have administrator rights).
Running the Normalization Scripts
- Create a folder on your Desktop to store your files for the microarray analysis procedure.
- Download the zipped file that contains the
.gpr
files and save it to this folder (or move it if it saved in a different folder).- Unzip this file using 7-zip. Right-click on the file and select the menu item, "7-zip > Extract Here".
- Download the GCAT_Targets.csv file and Ontario_Targets_wt-dCIN5-dGLN3-dHAP4-dHMO1-dSWI4-dZAP1-Spar_20150514.csv files and save them to this folder (or move them if they saved to a different folder).
- Download the Ontario_Chip_Within-Array_Normalization_modified_20150514.R script and save (or move) it to this folder.
- Download the Within-Array_Normalization_GCAT_and_Merged_Ontario-GCAT_Between-Chip_Normalization_modified_20150514.R script and save (or move) it to this folder.
Within Array Normalization for the Ontario Chips
- Launch R x64 3.1.0 (make sure you are using the 64-bit version).
- Change the directory to the folder containing the targets file and the GPR files for the Ontario chips by selecting the menu item File > Change dir... and clicking on the appropriate directory. You will need to click on the + sign to drill down to the right directory. Once you have selected it, click OK.
- In R, select the menu item File > Source R code..., and select the Ontario_Chip_Within-Array_Normalization_modified_20150514.R script.
- You will be prompted by an Open dialog for the Ontario targets file. Select the file Ontario_Targets_wt-dCIN5-dGLN3-dHAP4-dHMO1-dSWI4-dZAP1-Spar_20150514.csv and click Open.
- Wait while R processes your files.
Within Array Normalization for the GCAT Chips and Between Array Normalization for All Chips
- These instructions assume that you have just completed the Within Array Normalization for the Ontario Chips in the section above.
- In R, select the menu item File > Source R code..., and select the Within-Array_Normalization_GCAT_and_Merged_Ontario-GCAT_Between-Chip_Normalization_modified_20150514.R script.
- You will be prompted by an Open dialog for the GCAT targets file. Select the file GCAT_Targets.csv and click Open.
- Wait while R processes your files.
- When the processing has finished, you will find two files called GCAT_and_Ontario_Within_Array_Normalization.csv and GCAT_and_Ontario_Final_Normalized_Data.csv in the same folder.
- Save these files to LionShare and/or to a flash drive.
Visualizing the Normalized Data
Create MA Plots and Box Plots for the GCAT Chips
Input the following code, line by line, into the main R window. Press the enter key after each block of code.
GCAT.GeneList<-RGG$genes$ID
lg<-log2((RGG$R-RGG$Rb)/(RGG$G-RGG$Gb))
- If you get a message saying "NaNs produced" this is OK, proceed to the next step.
r0<-length(lg[1,]) rx<-tapply(lg[,1],as.factor(GCAT.GeneList),mean) r1<-length(rx) MM<-matrix(nrow=r1,ncol=r0)
for(i in 1:r0) {MM[,i]<-tapply(lg[,i],as.factor(GCAT.GeneList),mean)}
MC<-matrix(nrow=r1,ncol=r0)
for(i in 1:r0) {MC[,i]<-dw[i]*MM[,i]}
MCD<-as.data.frame(MC) colnames(MCD)<-chips rownames(MCD)<-gcatID
la<-(1/2*log2((RGG$R-RGG$Rb)*(RGG$G-RGG$Gb)))
- If you get these Warning messages, it's OK:
- 1: In (RGG$R - RGG$Rb) * (RGG$G - RGG$Gb) :
- NAs produced by integer overflow
- 2: NaNs produced
r2<-length(la[1,]) ri<-tapply(la[,1],as.factor(GCAT.GeneList),mean) r3<-length(ri) AG<-matrix(nrow=r3,ncol=r2)
for(i in 1:r2) {AG[,i]<-tapply(la[,i],as.factor(GCAT.GeneList),mean)}
par(mfrow=c(3,3))
for(i in 1:r2) {plot(AG[,i],MC[,i],main=chips[i],xlab='A',ylab='M',ylim=c(-5,5),xlim=c(0,15))} browser()
- Maximize the window in which the graphs have appeared. Save the graphs as a JPEG (File>Save As>JPEG>100% quality...). Once the graphs have been saved, close the window. To continue with the rest of the code, press Enter.
- To make sure that you save the clearest image, do not scroll in the window because a grey bar will appear if you do so.
- The next set of code is for the generation of the GCAT boxplots for the wild-type data.
x0<-tapply(MAG$A[,1],as.factor(MAG$genes$ID),mean) y0<-length(MAG$A[1,]) x1<-length(x0) AAG<-matrix(nrow=x1,ncol=y0)
for(i in 1:y0) {AAG[,i]<-tapply(MAG$A[,i],as.factor(MAG$genes$ID),mean)}
par(mfrow=c(3,3))
for(i in 1:y0) {plot(AAG[,i],MG2[,i],main=chips[i],xlab='A',ylab='M',ylim=c(-5,5),xlim=c(0,15))} browser()
- Maximize the window in which the graphs have appeared. Save the graphs as a JPEG (File>Save As>JPEG>100% quality...). Once the graphs have been saved, close the window. To continue with the rest of the code, press Enter.
par(mfrow=c(1,3))
boxplot(MCD,main="Before Normalization",ylab='Log Fold Change',ylim=c(-5,5),xaxt='n')
axis(1,at=xy.coords(chips)$x,tick=TRUE,labels=FALSE)
text(xy.coords(chips)$x-1,par('usr')[3]-0.6,labels=chips,srt=45,cex=0.9,xpd=TRUE)
boxplot(MG2,main='After Within Array Normalization',ylab='Log Fold Change',ylim=c(-5,5),xaxt='n')
axis(1,at=xy.coords(chips)$x,labels=FALSE)
text(xy.coords(chips)$x-1,par('usr')[3]-0.6,labels=chips,srt=45,cex=0.9,xpd=TRUE)
boxplot(MAD[,Gtop$MasterList],main='After Between Array Normalization',ylab='Log Fold Change',ylim=c(-5,5),xaxt='n')
axis(1, at=xy.coords(chips)$x,labels=FALSE)
text(xy.coords(chips)$x-1,par('usr')[3]-0.6,labels=chips,srt=45,cex=0.9,xpd=TRUE)
- Maximize the window in which the plots have appeared. You may not want to actually maximize them because you might lose the labels on the x axis, but make them as large as you can. Save the plots as a JPEG (File>Save As>JPEG>100% quality...). Once the graphs have been saved, close the window.
Create MA Plots and Box Plots for the Ontario Chips
Input the following code, line by line, into the main R window. Press the enter key after each block of code.
Ontario.GeneList<-RGO$genes$Name
lr<-log2((RGO$R-RGO$Rb)/(RGO$G-RGO$Gb))
- Warning message: "NaNs produced" is OK.
z0<-length(lr[1,]) v0<-tapply(lr[,1],as.factor(Ontario.GeneList),mean) z1<-length(v0) MT<-matrix(nrow=z1,ncol=z0)
for(i in 1:z0) {MT[,i]<-tapply(lr[,i],as.factor(Ontario.GeneList),mean)}
MI<-matrix(nrow=z1,ncol=z0)
for(i in 1:z0) {MI[,i]<-ds[i]*MT[,i]}
MID<-as.data.frame(MI) colnames(MID)<-headers rownames(MID)<-ontID
ln<-(1/2*log2((RGO$R-RGO$Rb)*(RGO$G-RGO$Gb)))
- Warning messages are OK:
- 1: In (RGO$R - RGO$Rb) * (RGO$G - RGO$Gb) :
- NAs produced by integer overflow
- 2: NaNs produced
z2<-length(ln[1,]) zi<-tapply(ln[,1],as.factor(Ontario.GeneList),mean) z3<-length(zi) AO<-matrix(nrow=z3,ncol=z2)
for(i in 1:z0) {AO[,i]<-tapply(ln[,i],as.factor(Ontario.GeneList),mean)}
strains<-c('wt','dCIN5','dGLN3','dHAP4','dHMO1','dSWI4','dZAP1','Spar')
- After entering the call browser() below, maximize the window in which the graphs have appeared. Save the graphs as a JPEG (File>Save As>JPEG>100% quality...). Once the graphs have been saved, close the window and press Enter for the next set of graphs to appear.
- The last graph to appear will be the spar graphs.
- The graphs generated from this code are the before Ontario chips
- Be sure to save the 9 graphs before moving on to the next step
for (i in 1:length(strains)) { st<-strains[i] lt<-which(Otargets$Strain %in% st) if (st=='wt') { par(mfrow=c(3,5)) } else { par(mfrow=c(4,5)) } for (i in lt) { plot(AO[,i],MI[,i],main=headers[i],xlab="A",ylab="M",ylim=c(-5,5),xlim=c(0,15)) } browser() }
- To continue generating plots, press enter.
j0<-tapply(MAO$A[,1],as.factor(MAO$genes[,5]),mean) k0<-length(MAO$A[1,]) j1<-length(j0) AAO<-matrix(nrow=j1,ncol=k0)
for(i in 1:k0) {AAO[,i]<-tapply(MAO$A[,i],as.factor(MAO$genes[,5]),mean)}
- Remember, that after entering the call readline('Press Enter to continue'), maximize the window in which the graphs have appeared. Save the graphs as a JPEG (File>Save As>JPEG>100% quality...). Once the graphs have been saved, close the window and press Enter for the next set of graphs to appear.
- Again, the last graphs to appear will be the spar graphs.
- These graphs that are produced are for the after Ontario chips
- Again, be sure to save 9 graphs before moving on to the next part of the code.
for (i in 1:length(strains)) { st<-strains[i] lt<-which(Otargets$Strain %in% st) if (st=='wt') { par(mfrow=c(3,5)) } else { par(mfrow=c(4,5)) } for (i in lt) { plot(AAO[,i],MD2[,i],main=headers[i],xlab="A",ylab="M",ylim=c(-5,5),xlim=c(0,15)) } browser() }
- To continue generating plots, press enter.
for (i in 1:length(strains)) { par(mfrow=c(1,3)) st<-strains[i] lt<-which(Otargets$Strain %in% st) if (st=='wt') { xcoord<-xy.coords(lt)$x-1 fsize<-0.9 } else { xcoord<-xy.coords(lt)$x-1.7 fsize<-0.8 } boxplot(MID[,lt],main='Before Normalization',ylab='Log Fold Change',ylim=c(-5,5),xaxt='n') axis(1,at=xy.coords(lt)$x,labels=FALSE) text(xcoord,par('usr')[3]-0.65,labels=headers[lt],srt=45,cex=fsize,xpd=TRUE) boxplot(MD2[,lt],main='After Within Array Normalization',ylab='Log Fold Change',ylim=c(-5,5),xaxt='n') axis(1,at=xy.coords(lt)$x,labels=FALSE) text(xcoord,par('usr')[3]-0.65,labels=headers[lt],srt=45,cex=fsize,xpd=TRUE) ft<-Otargets$MasterList[which(Otargets$Strain %in% st)] boxplot(MAD[,ft],main='After Between Array Normalization',ylab='Log Fold Change',ylim=c(-5,5),xaxt='n') axis(1,at=xy.coords(lt)$x,labels=FALSE) text(xcoord,par('usr')[3]-0.65,labels=headers[lt],srt=45,cex=fsize,xpd=TRUE) browser() }
- To continue generating the box plots, press enter.
- You will have to save 9 plots before you have completed the procedure. The last box plot is for spar.
- Warnings are OK.
- Zip the files of the plots together and upload to LionShare and/or save to a flash drive.