Rlegaspi Week 15

From LMU BioDB 2015
Jump to: navigation, search
HeavyMetal.jpg

Shewanella oneidensis

Our Gene Database Testing Report

Group Paper - File:Final Report 20151218 2 HMH.docx

Group Members

Important Links

Our Files

Our Deliverables

Gene Database Project Links
Overview Deliverables Reference Format Guilds Project Manager GenMAPP User Quality Assurance Coder
Teams Heavy Metal HaterZ The Class Whoopers GÉNialOMICS Oregon Trail Survivors
Individual Journal Entries
Mary Alverson Week 11 Week 12 Week 14 Week 15
Emily Simso Week 11 Week 12 Week 14 Week 15
Ron Legaspi Week 11 Week 12 Week 14 Week 15
Josh Kuroda Week 11 Week 12 Week 14 Week 15

Goals for Week 15

Data Preparation and Statistical Analysis for GenMAPP

  1. Consult with Dr. Dahlquist on how to process the data (normalization, statistics)
  2. Perform the statistical analysis in Excel.
  3. Format the gene expression data for import into GenMAPP.

*Similar to goals from Week 14.

Summary of Progress and Procedure

Compiling Raw Data and Statistical Analysis

December 8, 2015

  • Calculated averages from the split data
    • Discovered that there are a total of 5408 genes.
  • Calculated biological averages of each time point
  • Calculated AverageLogRatio comparing C5, C20, and C60 to C0 and F5, F20, and F60 to C60
    • Subtracted not divided due to log space
  • Performed TTest on the above relationships to get the Pvalue
  • Performed Bonferroni
  • Performed Benjamini & Hochberg
  • Excel file after all of these procedures were uploaded to XMLPipeDB and link to file is as follows: File:StatisticalAnalysis Shewanella RARL 20151207 HMH.xlsx

December 10, 2015

Preparing compiled raw data for GenMAPP and creation of a .txt file

File:StatisticalAnalysis Shewanella RARL 20151210 HMH.xlsx

.txt file: File:CompiledRawDataforGenMAPP Shewanella RARL 20151210 HMH.txt

Sanity Check

Importance of Sanity Check (from | DNA Microarray Analysis Activity: In summary, the p value cut-off should not be thought of as some magical number at which data becomes "significant". Instead, it is a moveable confidence level. If we want to be very confident of our data, use a small p value cut-off. If we are OK with being less confident about a gene expression change and want to include more genes in our analysis, we can use a larger p value cut-off. For the GenMAPP analysis below, we will use the fold change cut-off of greater than 0.25 or less than -0.25 and the unadjusted p value cut off of p < 0.05 for our analysis because we want to include several hundred genes in our analysis. (Note: The "AvgLogRatio" tells us the size of the gene expression change and in which direction. Positive values are increases relative to the control; negative values are decreases relative to the control.

  • C5 and C0
    • How many genes have p value < 0.05? and what is the percentage (out of 5408)?
      • 344 genes, 6.36%
    • What about p < 0.01? and what is the percentage (out of 5408)?
      • 94 genes, 1.74%
    • What about p < 0.001? and what is the percentage (out of 5408)?
      • 18 genes, 0.33%
    • What about p < 0.0001? and what is the percentage (out of 5408)?
      • 5 genes, 0.09%
    • How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?
      • 2 genes, 0.04%
    • How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?
      • 2 genes, 0.037%
    • Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)
      • 180 genes, 3.33%
    • Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)
      • 164 genes, 3.03%
    • What about an average log fold change of > 0.25 and p < 0.05? (and %)
      • 161 genes, 2.98%
    • Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
      • 149 genes, 2.76%
  • C20 and C0
    • How many genes have p value < 0.05? and what is the percentage (out of 5408)?
      • 868 genes, 16.05%
    • What about p < 0.01? and what is the percentage (out of 5408)?
      • 342 genes, 6.32%
    • What about p < 0.001? and what is the percentage (out of 5408)?
      • 79 genes, 1.46%
    • What about p < 0.0001? and what is the percentage (out of 5408)?
      • 14 genes, 0.26%
    • How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?
      • 1 gene, 0.01%
    • How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?
      • 34 genes, 0.63%
    • Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)
      • 452 genes,
    • Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)
      • 416 genes,
    • What about an average log fold change of > 0.25 and p < 0.05? (and %)
      • 437 genes,
    • Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
      • 405 genes,
  • C60 and C0
    • How many genes have p value < 0.05? and what is the percentage (out of 5408)?
      • 1017 genes, 18.81%
    • What about p < 0.01? and what is the percentage (out of 5408)?
      • 471 genes, 8.71%
    • What about p < 0.001? and what is the percentage (out of 5408)?
      • 163 genes, 3.01%
    • What about p < 0.0001? and what is the percentage (out of 5408)?
      • 53 genes, 0.98%
    • How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?
      • 13 genes, 0.24%
    • How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?
      • 229 genes, 4.23%
    • Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)
      • 487 genes, 9.01%
    • Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)
      • 530 genes, 9.80%
    • What about an average log fold change of > 0.25 and p < 0.05? (and %)
      • 475 genes, 8.78%
    • Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
      • 513 genes, 9.49%
  • F5 and C60
    • How many genes have p value < 0.05? and what is the percentage (out of 5408)?
      • 969 genes, 17.92%
    • What about p < 0.01? and what is the percentage (out of 5408)?
      • 315 genes, 5.82%
    • What about p < 0.001? and what is the percentage (out of 5408)?
      • 40 genes, 0.74%
    • What about p < 0.0001? and what is the percentage (out of 5408)?
      • 7 genes, 0.13%
    • How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?
      • 1 gene, 0.01%
    • How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?
      • 4 genes, 0.07%
    • Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)
      • 479 genes, 8.86%
    • Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)
      • 490 genes, 9.06%
    • What about an average log fold change of > 0.25 and p < 0.05? (and %)
      • 441 genes, 8.15%
    • Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
      • 431 genes, 7.97%
  • F20 and C60
    • How many genes have p value < 0.05? and what is the percentage (out of 5408)?
      • 1838 genes, 33.99%
    • What about p < 0.01? and what is the percentage (out of 5408)?
      • 892 genes, 16.49%
    • What about p < 0.001? and what is the percentage (out of 5408)?
      • 239 genes, 4.42%
    • What about p < 0.0001? and what is the percentage (out of 5408)?
      • 54 genes, 1.00%
    • How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?
      • 10 genes, 0.18%
    • How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?
      • 707 genes, 13.07%
    • Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)
      • 826 genes, 15.27%
    • Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)
      • 1012 genes, 18.71%
    • What about an average log fold change of > 0.25 and p < 0.05? (and %)
      • 788 genes, 14.57%
    • Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
      • 963 genes, 17.81%
  • F60 and C60
    • How many genes have p value < 0.05? and what is the percentage (out of 5408)?
      • 2070 genes, 38.28%
    • What about p < 0.01? and what is the percentage (out of 5408)?
      • 1140 genes, 21.08%
    • What about p < 0.001? and what is the percentage (out of 5408)?
      • 387 genes, 7.16%
    • What about p < 0.0001? and what is the percentage (out of 5408)?
      • 120 genes, 2.22%
    • How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?
      • 33 genes, 0.61%
    • How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?
      • 1193 genes, 22.06%
    • Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)
      • 870 genes, 16.09%
    • Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)
      • 1200 genes, 22.19%
    • What about an average log fold change of > 0.25 and p < 0.05? (and %)
      • 828 genes, 15.31%
    • Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
      • 1146 genes, 21.19%

External Links

Ron Legaspi
BIOL 367, Fall 2015

Assignment Links
Individual Weekly Journals
Shared Weekly Journals