Overview of Microarray Data Analysis

Lab Journal

Part 1

Accessed BIOL398-01:Bioinformatics Laboratory
Downloaded the Merrell_Compiled_Raw_Data_Vibrio.xls file to Desktop.
File renamed Merrell_Compiled_Raw_Data_Vibrio_SL_10102013.xls to reflect ownership and date
Inserted a new Worksheet into Excel file, and named it "scaled_centered"
Copied data from "compiled_raw_data" worksheet into "scaled_centered" worksheet
Inserted two rows in between the top row of headers and the first data row.
- Typed "Average" in cell A2 and "StdDev" in cell A3
Typed "=AVERAGE(B4:B5224)" in cell B2.
Typed "=STDEV(B4:B5224)" in cell B3.
Copied both equations in cells B2 and B3 and pasted them into the empty cells in the rest of the columns.
Copied the column headings for all data columns and then pasted them to the right of the last data column.
Edited the names of the columns to A1_scaled_centered, A2_scaled_centered, etc.
Typed "=(B4-B$2)/B$3" in cell N4.
Copied and pasted equation into the entire column.
- Copied and pasted the scaling and centering equation for each of the columns of data with the "_scaled_centered" column header.
Created new worksheet "statistics".
Copied the first column ("ID") of "scaling_centering" worksheet and pasted the data into the first column of "statistics" worksheet.
Copied the columns that are designated "_scaled_centered" of "scaling_centering" worksheet
Clicked on the B1 cell. Selected "Paste Special" from the Edit menu. Clicked on the radio button for "Values" and clicked OK.
Typed the header "Avg_LogFC_A", "Avg_LogFC_B", and "Avg_LogFC_C" into the top cell of the next three columns.
Typed "=AVERAGE(B2:E2)" into cell N2. Copied equation and pasted it into the rest of the column.
Typed equation for Patients B & C and pasted it into their columns
Typed the header "Avg_LogFC_all" into the first cell in the next empty column. Created equation to compute the average of the three previous averages and pasted it into entire column.
Inserted a new column next to the "Avg_LogFC_all" column. Labeled the column "Tstat". Typed "=AVERAGE(N2:P2)/(STDEV(N2:P2)/SQRT(number of replicates))" Copied the equation and pasted it into all rows in that column.
Labeled the top cell in the next column "Pvalue". In the cell below the label typed "=TDIST(ABS(R2),degrees of freedom,2)" Copied the equation and pasted it into all rows in that column.
Created new worksheet "forGenMAPP".
Selected All and Copy on the "statistics" worksheet
Clicked on cell A1 of "forGenMAPP" and selected Paste Special, clicked on the Values radio button, and clicked OK.
Selected Columns B through Q (all the fold changes). Selected the menu item Format > Cells. Under the number tab, selected 2 decimal places. Clicked OK.
Selected Columns R and S. Selected the menu item Format > Cells. Under the number tab, selected 4 decimal places. Clicked OK.
Selected Columns N through S and Cut. Selected Column B by left-clicking on the "B" at the top of the column. Then right-clicked on the Column B header and selected "Insert Cut Cells"
Deleted Rows 2 and 3 where it says "Average" and "StDev".
Inserted a column to the right of the "ID" column. Typed the header "SystemCode" into the top cell of this column. Filled the entire column (each cell) with the letter "N".
Selected the menu item File > Save As, and chose "Text (Tab-delimited) (*.txt)" from the file type drop-down menu.
Uploaded both the .xls and .txt files to journal page in the class wiki.

Part 2

Launched GenMAPP
Downloaded new Vc-Std_External_20101022.gdb Gene Database using this link to the XMLPipeDB SourceForge Download page.
Clicked on the link for the Gene Database, downloaded the file, and saved it into the folder C:\GenMAPP 2 Data\Gene Databases, and extracted it.

Launched the GenMAPP Program.
Looked in the lower, left-hand corner of the main GenMAPP Drafting Board window to see the name of the Gene Database that is loaded.
Selected the Data menu from the main Drafting Board window and chose Expression Dataset Manager from the drop-down list.
Selected New Dataset from the Expression Datasets menu. Selected the tab-delimited text file (.txt) in the procedure above from the file dialog box that appears.
Allowed the Expression Dataset Manager to convert data.
After a few minutes, the converted dataset was active in the Expression Dataset Manager window and the file was saved in the same folder the raw data file was in, named the same except with a .gex extension
A message appeared saying that the Expression Dataset Manager could not convert one or more lines of data.
- 121 Errors were detected. This was far less severe then the 722 errors that was found in my partner's database. This was likely due to the fact that my partner was using an older version of the Vibrio cholerae database that was not recently proofread.
Uploaded exceptions file: EX.txt to wiki page.
Customize the new Expression Dataset by creating new Color Sets which contain the instructions to GenMAPP for displaying data on MAPPs.
- Red = Increased expression, Blue = Decreased expression, Gray = No change, White = No data
Selected "Avg_LogFC_all" for Gene Value.
Activated the Criteria Builder by clicking the "New" button.
Enter a name for the criterion in the Label in Legend field.
Created and named two criteria by entering the name of the criteria and choosing a color. Created two criteria with "increased" colored red and "decreased" colored blue.
Set increasing results as AvgLogFC change > 0.25 and a p-value less than 0.05 {AvgLogFC change > 0.25 and a p-value less than 0.05}
Set decreasing results as AvgLogFC change < -0.25 and a p-value less than 0.05 {([AvgLogFC_all]<-0.25 AND [Pvalue]<0.05)}
Selected Save from Expression Dataset menu, saved as .gex file
Launched MAPPFinder
Chose "calculate new results"
Chose "find file" and selected the saved .gex file from previous steps
Selected "increase" criteria in right-hand box and checked boxes for "Gene Ontology" and "p-value"
Clicked "browse" and saved file
Clicked "run MAPPFinder"
Clicked "show ranked list"
Top Ten Gene Ontology Terms:
- Branched chain family amino acid metabolic process
- Branched chain family amino acid biosynthetic process
- IMP metabolic process
- IMP biosynthetic process
- Purine nucleoside monophosphate metabolic process
- Purine ribonucleoside monophosphate biosynthetic process
- Purine ribonucleoside monophosphate metabolic process
- Purine nucleoside monophosphate biosynthetic process
- ‘de novo’ IMP biosynthetic process
- Arginine metabolic process
These search results were different from what my partner had come up with. This was most likely due to the fact that my database was more up-to-date and therefore had more updated expression data.
Clicked on the button "Collapse the Tree" in the main MAPPFinder Browser window. Searched for the genes that were mentioned by Merrell et al. (2002), VC0028, VC0941, VC0869, VC0051, VC0647, VC0468, VC2350, and VCA0583.
Typed the identifier for one of these genes into the MAPPFinder browser gene ID search field. Chose "OrderedLocusNames" from the drop-down menu to the right of the search field.
Clicked on the GeneID Search button. The GO term(s) that are associated with that gene will be highlighted in blue. Listed the GO terms associated with each of those genes in your individual journal.
GO Terms Search Results:
- VC0028: Branched chain family amino acid biosynthetic process, Cellular amino acid biosynthetic process, Metabolic process, Metal ion binding, Iron-Sulfur cluster binding, 4 iron, 4 sulfur cluster binding, Catalytic activity, Lyase activity, Dihydroxy-acid dehydratase activity
- VC0941: Glycine Metabolic Process, L-serine Metabolic Process, One-Carbon Metabolic Process, Cytoplasm, Pyridoxal Phosphate Binding, Catalytic Activity, Transferase Activity, Glycine Hydroxymethyltransferase Activity
- VC0869: Glutamine Metabolic Process, Purine Nucleotide Biosynthetic Process, 'de novo' IMP Biosynthetic Process, Cytoplasm, Nucleotide Binding, ATP binding, Catalytic Activity, Ligase Activity, Phosphoribosylformyglycinamidine Synthase Activity
- VC0051: Purine Nucleotide Biosynthetic Process, 'de novo' IMP Bisynthetic Process, Nucleotide Binding, ATP Binding, Catalytic Activity, Lyase Activity, Carboxy-lyase Activity, Phosphoribosylaminoimidazole
- VC0647: mRNA Catabolic Process, RNA Processing, Cytoplasm, Mitochondrion, RNA Binding, 3'-5'-exoribonuclease Activity, Transferase Activity, Nucleotidyltransferase Activity, Polyribonucleotide Nucleotidyltransferase Activity