Difference between revisions of "Kmeilak Week 8"

From LMU BioDB 2013
Jump to: navigation, search
(Overview of Microarray Data Analysis)
Line 40: Line 40:
 
*Clicked the new button to activate the Criteria Builder.
 
*Clicked the new button to activate the Criteria Builder.
 
*Created and named two criteria by entering the name of the criteria and choosing a color. The two criteria created were "increased" colored pink and "decreased" colored green.
 
*Created and named two criteria by entering the name of the criteria and choosing a color. The two criteria created were "increased" colored pink and "decreased" colored green.
*
+
*Selected increasing results which had an AvgLogFC change > 0.25 and a p-value less than 0.05 {AvgLogFC change > 0.25 and a p-value less than 0.05}
 +
*Selected decreasing results which had an AvgLogFC change < -0.25 and a p-value less than 0.05 {([AvgLogFC_all]<-0.25 AND [Pvalue]<0.05)}
 +
*Selected Save from Expression Dataset menu, saved as .gex file
 +
*Launched MAPPFinder
 +
*Chose "calculate new results"
 +
*Chose "find file" and selected the saved .gex file from previous steps
 +
*Selected "increase" and checked boxes for "Gene Ontology" and "p-value"
 +
*Clicked "browse" and saved file
 +
*Clicked "run MAPPFinder"
 +
*Clicked "show ranked list" (seen below). These results are different from my partner's who used the more recently updated version of the database most likely because of new information about old processes as well as the incorporation of new processes which may demonstrate a higher level of significance than some of those found in the 2009 version of the database.
  
 
Top 10 Gene Ontology terms
 
Top 10 Gene Ontology terms
Line 53: Line 62:
 
#cellular nitrogen compound metabolic process
 
#cellular nitrogen compound metabolic process
 
#cellular amine metabolic process
 
#cellular amine metabolic process
 +
 +
*
  
 
Questions
 
Questions

Revision as of 05:13, 17 October 2013

Overview of Microarray Data Analysis

Electronic Lab Notebook

10/10/13

  • Downloaded Merrill Compiled Raw Data file from Sample Microarray Analysis for Vibrio cholerae page
  • Saved as Merrell_Compiled_Raw_Data_Vibrio_KM_20131010.xls
  • Opened file in excel; created second worksheet and named it scaled_centered
  • Copied all data from compiled_raw_data worksheet into scaled_centered worksheet
  • Inserted two rows underneath header row (ID, A1, etc)
  • Calculated average and standard deviation for each column {i.e. =AVERAGE(B4:B5224); =STDEV(B4:B5224)} by typing function into appropriate labeled row and copying and pasting formulas across all columns.
  • Calculated the scaled centered values by subtracting the average value for each column from the value in each and dividing by the standard deviation {i.e. (=B4-B$2)/B$3}
  • Inserted a new worksheet and named it "statistics".
  • Copied and pasted all of scaled_centered worksheet into statistics worksheet (note: did paste special values only).
  • Added three new columns: "Avg_LogFC_A", "Avg_LogFC_B", "Avg_LogFC_C"
  • Computed average log fold change {i.e. =AVERAGE(B2:E2)} for all patients
  • Computed average of averages of three patients in new column titled "Avg_LogFC_all"
  • Created a new column titled "Tstat" in order to run a T test using the following equation {=AVERAGE(N2:P2)/(STDEV(N2:P2)/SQRT(number of replicates))}. The T test was run in order to see which, if any, of the scaled and centered average log ratios are significantly different from 0 (no change).
  • Created a new column titled "Pvalue". Calculated P value using the following equation {=TDIST(ABS(R2),degrees of freedom,2)}
  • Created a new worksheet titled "forGENMAPP".
  • Copied and pasted everything in "statistics" worksheet into "forGENMAPP" worksheet (note: did paste special values only).
  • Selected all fold changes and formatted cells under number tab to 2 decimal places.
  • Columns R and S were set to 4 decimal places in the same manner
  • Columns N through S were cut and inserted next to column B
  • Deleted rows "Average" and "StDev"
  • Added "SystemCode" column to the right of "ID" column and put "N" as value for all rows.
  • Saved as Tab-delimited Text file.

10/15/13

  • Launched GenMAPP
  • Selected Data > Choose Gene Database and selected Vc-Std_External_20090622.gdb Gene Database (2009). (note: this had to be downloaded from [XMLPipeDB Download Page] and then extracted)
  • Selected the Data menu then the Expression Dataset Manager which opened the Expression Dataset Manager window.
  • Selected "new dataset", then selected the Tab-delimited Text file from previous day.
  • The Data Type Specification window appeared. Did not select any columns as containing character data.
  • Allowed the Expression Dataset Manager to convert the data. 772 errors were recorded by the completion of the conversion. I resulted in far more errors than my partner (she had 121 errors). This is most likely due to my use of an older database and her use of a newer database. Because her database was newer and more updated, it contained more of the known genes for V. cholera than mine, and therefore she resulted in fewer errors.
  • Created a Color Set for the Expression Database (pink = increased expression; green = decreased expression; gray = no change; white = no data)
  • Used Avg_LogFC_all as the gene value.
  • Clicked the new button to activate the Criteria Builder.
  • Created and named two criteria by entering the name of the criteria and choosing a color. The two criteria created were "increased" colored pink and "decreased" colored green.
  • Selected increasing results which had an AvgLogFC change > 0.25 and a p-value less than 0.05 {AvgLogFC change > 0.25 and a p-value less than 0.05}
  • Selected decreasing results which had an AvgLogFC change < -0.25 and a p-value less than 0.05 {([AvgLogFC_all]<-0.25 AND [Pvalue]<0.05)}
  • Selected Save from Expression Dataset menu, saved as .gex file
  • Launched MAPPFinder
  • Chose "calculate new results"
  • Chose "find file" and selected the saved .gex file from previous steps
  • Selected "increase" and checked boxes for "Gene Ontology" and "p-value"
  • Clicked "browse" and saved file
  • Clicked "run MAPPFinder"
  • Clicked "show ranked list" (seen below). These results are different from my partner's who used the more recently updated version of the database most likely because of new information about old processes as well as the incorporation of new processes which may demonstrate a higher level of significance than some of those found in the 2009 version of the database.

Top 10 Gene Ontology terms

  1. macromolecule metabolic process
  2. cellular macromolecule metabolic process
  3. marcomolecule biosynthesis process
  4. biopolymer metabolic process
  5. cell projection organization
  6. branched chain family amino acid metabolic process
  7. amino acid metabolic process
  8. cellular amino acid and derivative metabolic process
  9. cellular nitrogen compound metabolic process
  10. cellular amine metabolic process

Questions

1.

Files

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox