Kmeilak Week 8
From LMU BioDB 2013
Overview of Microarray Data Analysis
Electronic Lab Notebook
10/10/13
- Downloaded Merrill Compiled Raw Data file from Sample Microarray Analysis for Vibrio cholerae page
- Saved as Merrell_Compiled_Raw_Data_Vibrio_KM_20131010.xls
- Opened file in excel; created second worksheet and named it scaled_centered
- Copied all data from compiled_raw_data worksheet into scaled_centered worksheet
- Inserted two rows underneath header row (ID, A1, etc)
- Calculated average and standard deviation for each column {i.e. =AVERAGE(B4:B5224); =STDEV(B4:B5224)} by typing function into appropriate labeled row and copying and pasting formulas across all columns.
- Calculated the scaled centered values by subtracting the average value for each column from the value in each and dividing by the standard deviation {i.e. (=B4-B$2)/B$3}
- Inserted a new worksheet and named it "statistics".
- Copied and pasted all of scaled_centered worksheet into statistics worksheet (note: did paste special values only).
- Added three new columns: "Avg_LogFC_A", "Avg_LogFC_B", "Avg_LogFC_C"
- Computed average log fold change {i.e. =AVERAGE(B2:E2)} for all patients
- Computed average of averages of three patients in new column titled "Avg_LogFC_all"
- Created a new column titled "Tstat" in order to run a T test using the following equation {=AVERAGE(N2:P2)/(STDEV(N2:P2)/SQRT(number of replicates))}. The T test was run in order to see which, if any, of the scaled and centered average log ratios are significantly different from 0 (no change).
- Created a new column titled "Pvalue". Calculated P value using the following equation {=TDIST(ABS(R2),degrees of freedom,2)}
- Created a new worksheet titled "forGENMAPP".
- Copied and pasted everything in "statistics" worksheet into "forGENMAPP" worksheet (note: did paste special values only).
- Selected all fold changes and formatted cells under number tab to 2 decimal places.
- Columns R and S were set to 4 decimal places in the same manner
- Columns N through S were cut and inserted next to column B
- Deleted rows "Average" and "StDev"
- Added "SystemCode" column to the right of "ID" column and put "N" as value for all rows.
- Saved as Tab-delimited Text file.
10/15/13
- Launched GenMAPP
- Selected Data > Choose Gene Database and selected Vc-Std_External_20090622.gdb Gene Database (2009). (note: this had to be downloaded from [XMLPipeDB Download Page] and then extracted)
- Selected the Data menu then the Expression Dataset Manager which opened the Expression Dataset Manager window.
- Selected "new dataset", then selected the Tab-delimited Text file from previous day.
- The Data Type Specification window appeared. Did not select any columns as containing character data.
- Allowed the Expression Dataset Manager to convert the data. 772 errors were recorded by the completion of the conversion. I resulted in far more errors than my partner (she had 121 errors). This is most likely due to my use of an older database and her use of a newer database. Because her database was newer and more updated, it contained more of the known genes for V. cholera than mine, and therefore she resulted in fewer errors.
- Created a Color Set for the Expression Database (pink = increased expression; green = decreased expression; gray = no change; white = no data)
- Used Avg_LogFC_all as the gene value.
- Clicked the new button to activate the Criteria Builder.
- Created and named two criteria by entering the name of the criteria and choosing a color. The two criteria created were "increased" colored pink and "decreased" colored green.
Top 10 Gene Ontology terms
- macromolecule metabolic process
- cellular macromolecule metabolic process
- marcomolecule biosynthesis process
- biopolymer metabolic process
- cell projection organization
- branched chain family amino acid metabolic process
- amino acid metabolic process
- cellular amino acid and derivative metabolic process
- cellular nitrogen compound metabolic process
- cellular amine metabolic process
Questions
1.