Difference between revisions of "Rlegaspi Week 15"
 (→Sanity Check:  Finishing Sanity check of shewanella data and p values to check genes significantly changed, still need to input percentage calculations and make sense of all the values.)  | 
				 (→Compiling Raw Data and Statistical Analysis:  Fixed formatting of bullet points)  | 
				||
| Line 28: | Line 28: | ||
.txt file: [[File:CompiledRawDataforGenMAPP Shewanella RARL 20151210 HMH.txt]]  | .txt file: [[File:CompiledRawDataforGenMAPP Shewanella RARL 20151210 HMH.txt]]  | ||
| − | + | === Sanity Check ===  | |
Importance of Sanity Check (from [[http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae | DNA Microarray Analysis Activity]]: In summary, the p value cut-off should not be thought of as some magical number at which data becomes "significant". Instead, it is a moveable confidence level. If we want to be very confident of our data, use a small p value cut-off. If we are OK with being less confident about a gene expression change and want to include more genes in our analysis, we can use a larger p value cut-off. For the GenMAPP analysis below, we will use the fold change cut-off of greater than 0.25 or less than -0.25 and the unadjusted p value cut off of p < 0.05 for our analysis because we want to include several hundred genes in our analysis. (Note: The "AvgLogRatio" tells us the size of the gene expression change and in which direction. Positive values are increases relative to the control; negative values are decreases relative to the control.  | Importance of Sanity Check (from [[http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae | DNA Microarray Analysis Activity]]: In summary, the p value cut-off should not be thought of as some magical number at which data becomes "significant". Instead, it is a moveable confidence level. If we want to be very confident of our data, use a small p value cut-off. If we are OK with being less confident about a gene expression change and want to include more genes in our analysis, we can use a larger p value cut-off. For the GenMAPP analysis below, we will use the fold change cut-off of greater than 0.25 or less than -0.25 and the unadjusted p value cut off of p < 0.05 for our analysis because we want to include several hundred genes in our analysis. (Note: The "AvgLogRatio" tells us the size of the gene expression change and in which direction. Positive values are increases relative to the control; negative values are decreases relative to the control.  | ||
*'''C5 and C0'''  | *'''C5 and C0'''  | ||
| Line 41: | Line 41: | ||
**How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?  | **How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?  | ||
***2 genes, 0.04%  | ***2 genes, 0.04%  | ||
| − | How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?  | + | **How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?  | 
***2 genes, 0.037%  | ***2 genes, 0.037%  | ||
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)  | **Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)  | ||
| Line 62: | Line 62: | ||
**How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?  | **How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?  | ||
***1 gene, 0.01%  | ***1 gene, 0.01%  | ||
| − | How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?  | + | **How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?  | 
***34 genes, 0.63%  | ***34 genes, 0.63%  | ||
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)  | **Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)  | ||
| Line 83: | Line 83: | ||
**How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?  | **How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?  | ||
***13 genes, 0.24%  | ***13 genes, 0.24%  | ||
| − | How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?  | + | **How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?  | 
***229 genes, 4.23%  | ***229 genes, 4.23%  | ||
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)  | **Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)  | ||
| Line 104: | Line 104: | ||
**How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?  | **How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?  | ||
***1 gene, 0.01%  | ***1 gene, 0.01%  | ||
| − | How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?  | + | **How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?  | 
***4 genes, 0.07%  | ***4 genes, 0.07%  | ||
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)  | **Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)  | ||
| Line 125: | Line 125: | ||
**How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?  | **How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?  | ||
***10 genes, 0.18%  | ***10 genes, 0.18%  | ||
| − | How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?  | + | **How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?  | 
***707 genes, 13.07%  | ***707 genes, 13.07%  | ||
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)  | **Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)  | ||
| Line 146: | Line 146: | ||
**How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?  | **How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?  | ||
***33 genes, 0.61%  | ***33 genes, 0.61%  | ||
| − | How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?  | + | **How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?  | 
***1193 genes, 22.06%  | ***1193 genes, 22.06%  | ||
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)  | **Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)  | ||
Revision as of 02:04, 12 December 2015
Contents
Shewanella oneidensis
Our Gene Database Testing Report
Group Paper - File:Final Report 20151218 2 HMH.docx
Group Members
- Coder: Mary Alverson
 - GenMAPP User & Project Manager: Ron Legaspi
 - Quality Assurance: Josh Kuroda
 - GenMAPP User: Emily Simso
 
Important Links
Our Files
Our Deliverables
| Gene Database Project Links | |||||||
|---|---|---|---|---|---|---|---|
| Overview | Deliverables | Reference Format | Guilds | Project Manager | GenMAPP User | Quality Assurance | Coder | 
| Teams | Heavy Metal HaterZ | The Class Whoopers | GÉNialOMICS | Oregon Trail Survivors | |||
| Individual Journal Entries | ||||
|---|---|---|---|---|
| Mary Alverson | Week 11 | Week 12 | Week 14 | Week 15 | 
| Emily Simso | Week 11 | Week 12 | Week 14 | Week 15 | 
| Ron Legaspi | Week 11 | Week 12 | Week 14 | Week 15 | 
| Josh Kuroda | Week 11 | Week 12 | Week 14 | Week 15 | 
Goals for Week 15
Data Preparation and Statistical Analysis for GenMAPP
- Consult with Dr. Dahlquist on how to process the data (normalization, statistics)
 - Perform the statistical analysis in Excel.
 - Format the gene expression data for import into GenMAPP.
 
*Similar to goals from Week 14.
Summary of Progress and Procedure
Compiling Raw Data and Statistical Analysis
December 8, 2015
- Calculated averages from the split data
- Discovered that there are a total of 5408 genes.
 
 - Calculated biological averages of each time point
 - Calculated AverageLogRatio comparing C5, C20, and C60 to C0 and F5, F20, and F60 to C60
- Subtracted not divided due to log space
 
 - Performed TTest on the above relationships to get the Pvalue
 - Performed Bonferroni
 - Performed Benjamini & Hochberg
 - Excel file after all of these procedures were uploaded to XMLPipeDB and link to file is as follows: File:StatisticalAnalysis Shewanella RARL 20151207 HMH.xlsx
 
December 10, 2015
Preparing compiled raw data for GenMAPP and creation of a .txt file
File:StatisticalAnalysis Shewanella RARL 20151210 HMH.xlsx
.txt file: File:CompiledRawDataforGenMAPP Shewanella RARL 20151210 HMH.txt
Sanity Check
Importance of Sanity Check (from [| DNA Microarray Analysis Activity]: In summary, the p value cut-off should not be thought of as some magical number at which data becomes "significant". Instead, it is a moveable confidence level. If we want to be very confident of our data, use a small p value cut-off. If we are OK with being less confident about a gene expression change and want to include more genes in our analysis, we can use a larger p value cut-off. For the GenMAPP analysis below, we will use the fold change cut-off of greater than 0.25 or less than -0.25 and the unadjusted p value cut off of p < 0.05 for our analysis because we want to include several hundred genes in our analysis. (Note: The "AvgLogRatio" tells us the size of the gene expression change and in which direction. Positive values are increases relative to the control; negative values are decreases relative to the control.
- C5 and C0
- How many genes have p value < 0.05? and what is the percentage (out of 5408)?
- 344 genes, 6.36%
 
 - What about p < 0.01? and what is the percentage (out of 5408)?
- 94 genes, 1.74%
 
 - What about p < 0.001? and what is the percentage (out of 5408)?
- 18 genes, 0.33%
 
 - What about p < 0.0001? and what is the percentage (out of 5408)?
- 5 genes, 0.09%
 
 - How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?
- 2 genes, 0.04%
 
 - How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?
- 2 genes, 0.037%
 
 - Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)
- 180 genes, 3.33%
 
 - Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)
- 164 genes, 3.03%
 
 - What about an average log fold change of > 0.25 and p < 0.05? (and %)
- 161 genes, 2.98%
 
 - Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
- 149 genes, 2.76%
 
 
 - How many genes have p value < 0.05? and what is the percentage (out of 5408)?
 - C20 and C0
- How many genes have p value < 0.05? and what is the percentage (out of 5408)?
- 868 genes, 16.05%
 
 - What about p < 0.01? and what is the percentage (out of 5408)?
- 342 genes, 6.32%
 
 - What about p < 0.001? and what is the percentage (out of 5408)?
- 79 genes, 1.46%
 
 - What about p < 0.0001? and what is the percentage (out of 5408)?
- 14 genes, 0.26%
 
 - How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?
- 1 gene, 0.01%
 
 - How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?
- 34 genes, 0.63%
 
 - Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)
- 452 genes,
 
 - Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)
- 416 genes,
 
 - What about an average log fold change of > 0.25 and p < 0.05? (and %)
- 437 genes,
 
 - Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
- 405 genes,
 
 
 - How many genes have p value < 0.05? and what is the percentage (out of 5408)?
 - C60 and C0
- How many genes have p value < 0.05? and what is the percentage (out of 5408)?
- 1017 genes, 18.81%
 
 - What about p < 0.01? and what is the percentage (out of 5408)?
- 471 genes, 8.71%
 
 - What about p < 0.001? and what is the percentage (out of 5408)?
- 163 genes, 3.01%
 
 - What about p < 0.0001? and what is the percentage (out of 5408)?
- 53 genes, 0.98%
 
 - How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?
- 13 genes, 0.24%
 
 - How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?
- 229 genes, 4.23%
 
 - Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)
- 487 genes,
 
 - Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)
- 530 genes,
 
 - What about an average log fold change of > 0.25 and p < 0.05? (and %)
- 475 genes,
 
 - Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
- 513 genes,
 
 
 - How many genes have p value < 0.05? and what is the percentage (out of 5408)?
 - F5 and C60
- How many genes have p value < 0.05? and what is the percentage (out of 5408)?
- 969 genes, 17.92%
 
 - What about p < 0.01? and what is the percentage (out of 5408)?
- 315 genes, 5.82%
 
 - What about p < 0.001? and what is the percentage (out of 5408)?
- 40 genes, 0.74%
 
 - What about p < 0.0001? and what is the percentage (out of 5408)?
- 7 genes, 0.13%
 
 - How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?
- 1 gene, 0.01%
 
 - How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?
- 4 genes, 0.07%
 
 - Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)
- 479 genes,
 
 - Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)
- 490 genes,
 
 - What about an average log fold change of > 0.25 and p < 0.05? (and %)
- 441 genes,
 
 - Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
- 431 genes,
 
 
 - How many genes have p value < 0.05? and what is the percentage (out of 5408)?
 - F20 and C60
- How many genes have p value < 0.05? and what is the percentage (out of 5408)?
- 1838 genes, 33.99%
 
 - What about p < 0.01? and what is the percentage (out of 5408)?
- 892 genes, 16.49%
 
 - What about p < 0.001? and what is the percentage (out of 5408)?
- 239 genes, 4.42%
 
 - What about p < 0.0001? and what is the percentage (out of 5408)?
- 54 genes, 1.00%
 
 - How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?
- 10 genes, 0.18%
 
 - How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?
- 707 genes, 13.07%
 
 - Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)
- 826 genes,
 
 - Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)
- 1012 genes,
 
 - What about an average log fold change of > 0.25 and p < 0.05? (and %)
- 788 genes,
 
 - Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
- 963 genes,
 
 
 - How many genes have p value < 0.05? and what is the percentage (out of 5408)?
 - F60 and C60
- How many genes have p value < 0.05? and what is the percentage (out of 5408)?
- 2070 genes, 38.28%
 
 - What about p < 0.01? and what is the percentage (out of 5408)?
- 1140 genes, 21.08%
 
 - What about p < 0.001? and what is the percentage (out of 5408)?
- 387 genes, 7.16%
 
 - What about p < 0.0001? and what is the percentage (out of 5408)?
- 120 genes, 2.22%
 
 - How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?
- 33 genes, 0.61%
 
 - How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?
- 1193 genes, 22.06%
 
 - Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)
- 870 genes,
 
 - Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)
- 1200 genes,
 
 - What about an average log fold change of > 0.25 and p < 0.05? (and %)
- 828 genes,
 
 - Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
- 1146 genes,
 
 
 - How many genes have p value < 0.05? and what is the percentage (out of 5408)?
 
External Links
 Ron Legaspi
 BIOL 367, Fall 2015
Assignment Links
- Week 1 Assignment
 - Week 2 Assignment
 - Week 3 Assignment
 - Week 4 Assignment
 - Week 5 Assignment
 - Week 6 Assignment
 - Week 7 Assignment
 - Week 8 Assignment
 - Week 9 Assignment
 - Week 10 Assignment
 - Week 11 Assignment
 - Week 12 Assignment
 - Week 14 Assignment
 - Week 15 Assignment
 
Individual Weekly Journals
- Individual Journal Week 1 - This is my User Page
 - Individual Journal Week 2
 - Individual Journal Week 3
 - Individual Journal Week 4
 - Individual Journal Week 5
 - Individual Journal Week 6
 - Individual Journal Week 7
 - Individual Journal Week 8
 - Individual Journal Week 9
 - Individual Journal Week 10
 - Individual Journal Week 11
 - Individual Journal Week 12
 - Individual Journal Week 14
 - Individual Journal Week 15
 
