Revision as of 11:44, 18 October 2013

Overview of Microarray Data Analysis

Lab Journal

Part 1

Accessed BIOL398-01:Bioinformatics Laboratory
Downloaded the Merrell_Compiled_Raw_Data_Vibrio.xls file to Desktop.
File renamed Merrell_Compiled_Raw_Data_Vibrio_SL_10102013.xls to reflect ownership and date
Inserted a new Worksheet into Excel file, and named it "scaled_centered"
Copied data from "compiled_raw_data" worksheet into "scaled_centered" worksheet
Inserted two rows in between the top row of headers and the first data row.
- Typed "Average" in cell A2 and "StdDev" in cell A3
Typed "=AVERAGE(B4:B5224)" in cell B2.
Typed "=STDEV(B4:B5224)" in cell B3.
Copied both equations in cells B2 and B3 and pasted them into the empty cells in the rest of the columns.
Copied the column headings for all data columns and then pasted them to the right of the last data column.
Edited the names of the columns to A1_scaled_centered, A2_scaled_centered, etc.
Typed "=(B4-B$2)/B$3" in cell N4.
Copied and pasted equation into the entire column.
- Copied and pasted the scaling and centering equation for each of the columns of data with the "_scaled_centered" column header.
Created new worksheet "statistics".
Copied the first column ("ID") of "scaling_centering" worksheet and pasted the data into the first column of "statistics" worksheet.
Copied the columns that are designated "_scaled_centered" of "scaling_centering" worksheet
Clicked on the B1 cell. Selected "Paste Special" from the Edit menu. Clicked on the radio button for "Values" and clicked OK.
Typed the header "Avg_LogFC_A", "Avg_LogFC_B", and "Avg_LogFC_C" into the top cell of the next three columns.
Typed "=AVERAGE(B2:E2)" into cell N2. Copied equation and pasted it into the rest of the column.
Typed equation for Patients B & C and pasted it into their columns
Typed the header "Avg_LogFC_all" into the first cell in the next empty column. Created equation to compute the average of the three previous averages and pasted it into entire column.
Inserted a new column next to the "Avg_LogFC_all" column. Labeled the column "Tstat". Typed "=AVERAGE(N2:P2)/(STDEV(N2:P2)/SQRT(number of replicates))" Copied the equation and pasted it into all rows in that column.
Labeled the top cell in the next column "Pvalue". In the cell below the label typed "=TDIST(ABS(R2),degrees of freedom,2)" Copied the equation and pasted it into all rows in that column.
Created new worksheet "forGenMAPP".
Selected All and Copy on the "statistics" worksheet
Clicked on cell A1 of "forGenMAPP" and selected Paste Special, clicked on the Values radio button, and clicked OK.
Selected Columns B through Q (all the fold changes). Selected the menu item Format > Cells. Under the number tab, selected 2 decimal places. Clicked OK.
Selected Columns R and S. Selected the menu item Format > Cells. Under the number tab, selected 4 decimal places. Clicked OK.
Selected Columns N through S and Cut. Selected Column B by left-clicking on the "B" at the top of the column. Then right-clicked on the Column B header and selected "Insert Cut Cells"
Deleted Rows 2 and 3 where it says "Average" and "StDev".
Inserted a column to the right of the "ID" column. Typed the header "SystemCode" into the top cell of this column. Filled the entire column (each cell) with the letter "N".
Selected the menu item File > Save As, and chose "Text (Tab-delimited) (*.txt)" from the file type drop-down menu.
Uploaded both the .xls and .txt files to journal page in the class wiki.

Part 2

Launched GenMAPP
Downloaded new Vc-Std_External_20101022.gdb Gene Database using this link to the XMLPipeDB SourceForge Download page.
Clicked on the link for the Gene Database, downloaded the file, and saved it into the folder C:\GenMAPP 2 Data\Gene Databases, and extracted it.

Launched the GenMAPP Program.
Looked in the lower, left-hand corner of the main GenMAPP Drafting Board window to see the name of the Gene Database that is loaded.
Selected the Data menu from the main Drafting Board window and chose Expression Dataset Manager from the drop-down list.
Selected New Dataset from the Expression Datasets menu. Selected the tab-delimited text file (.txt) in the procedure above from the file dialog box that appears.
Allowed the Expression Dataset Manager to convert data.
After a few minutes, the converted dataset was active in the Expression Dataset Manager window and the file was saved in the same folder the raw data file was in, named the same except with a .gex extension
A message appeared saying that the Expression Dataset Manager could not convert one or more lines of data.
- 121 Errors were detected. This was far less severe then the 722 errors that was found in my partner's database. This was likely due to the fact that my partner was using an older version of the Vibrio cholerae database that was not recently proofread.
Uploaded exceptions file: EX.txt to wiki page.
Customize the new Expression Dataset by creating new Color Sets which contain the instructions to GenMAPP for displaying data on MAPPs.
- Red = Increased expression, Blue = Decreased expression, Gray = No change, White = No data
Selected "Avg_LogFC_all" for Gene Value.
Activated the Criteria Builder by clicking the "New" button.
Enter a name for the criterion in the Label in Legend field.
Created and named two criteria by entering the name of the criteria and choosing a color. Created two criteria with "increased" colored red and "decreased" colored blue.
Set increasing results as AvgLogFC change > 0.25 and a p-value less than 0.05 {AvgLogFC change > 0.25 and a p-value less than 0.05}
Set decreasing results as AvgLogFC change < -0.25 and a p-value less than 0.05 {([AvgLogFC_all]<-0.25 AND [Pvalue]<0.05)}
Selected Save from Expression Dataset menu, saved as .gex file

Files

Part 1

Merrell_Compiled_Raw_Data_Vibrio_SL_10102013.txt

Merrell_Compiled_Raw_Data_Vibrio_SL_10102013.xls

Part 2

Increased_SL-Criterion0-GO.txt

Increased_SL-Criterion0-GO.xls

Merrell_Compiled_Raw_Data_Vibrio_SL_10102013.EX.txt

Merrell_Compiled_Raw_Data_Vibrio_SL_10102013.gex

Merrell_Compiled_Raw_Data_Vibrio_SL_10102013.gmf

@@ Line 56: / Line 56: @@
 **121 Errors were detected.  This was far less severe then the 722 errors that was found in my partner's database.  This was likely due to the fact that my partner was using an older version of the Vibrio cholerae database that was not recently proofread.
 *Uploaded exceptions file: EX.txt to wiki page.
+*Customize the new Expression Dataset by creating new Color Sets which contain the instructions to GenMAPP for displaying data on MAPPs.
+**Red = Increased expression, Blue = Decreased expression, Gray = No change, White = No data
+*Selected "Avg_LogFC_all" for Gene Value.
+*Activated the Criteria Builder by clicking the "New" button.
+*Enter a name for the criterion in the Label in Legend field.
+*Created and named two criteria by entering the name of the criteria and choosing a color. Created two criteria with "increased" colored red and "decreased" colored blue.
+*Set increasing results as AvgLogFC change > 0.25 and a p-value less than 0.05 {AvgLogFC change > 0.25 and a p-value less than 0.05}
+*Set decreasing results as AvgLogFC change < -0.25 and a p-value less than 0.05 {([AvgLogFC_all]<-0.25 AND [Pvalue]<0.05)}
+*Selected Save from Expression Dataset menu, saved as .gex file

Difference between revisions of "Stephen Louie Week 8"

Revision as of 11:44, 18 October 2013

Contents

Overview of Microarray Data Analysis

Lab Journal

Part 1

Part 2

Files

Part 1

Part 2

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Toolbox