Difference between revisions of "Vpachec3 Week 14"

Revision as of 20:06, 5 December 2015

Thursday,December 3

Kevin and I looked at Dr.Dahlquist's feedback on our spreadsheet so far.
Based on what was said we used the file Raw_compiled_data_KD20151124.xls to continue the process.This was because Dr. Dahlquist made the point that our chip has each gene spotted in quadruplicate. These are considered technical replicates and they should be averaged before doing any further analysis.
Thus, we took the average of the 4 technical replicates with the average function on excel.We selected the gene for each replicate individually. The function looked like

=AVERAGE(C2,M2,W2,AG2)

We then clicked on the small black square at the bottom of the cell to have the function repeat and adjust for the remaining cells.

- Then you need to average the averages for biofilm and for tobramycin. (It doesn't make sense to average biofilm and tobramycin together since they are separate treatments).
- Because your reference sample is genomic DNA and not RNA, you need to then take the ratio of the averages for the biofilm and tobramycin samples to get the ratio of tobramycin to control (tobramycin over biofilm). Because the numbers are in log space, you will subtract the biofilm average from the tobramycin average to get this number.
- You will conduct a two-sample t test comparing the 5 biofilm samples to the 3 tobramycin samples using the TTEST function in Excel, not the equation we did for Vibrio. It will directly compute the p value.
- Then you can compute the Bonferroni and Benjamini and Hochberg corrected p values like you did in the Vibrio exercise.

Sanity Check: Number of genes significantly changed

Before we move on to the GenMAPP/MAPPFinder analysis, we want to perform a sanity check to make sure that we performed our data analysis correctly. We are going to find out the number of genes that are significantly changed at various p value cut-offs and also compare our data analysis with the published results of Merrell et al. (2002).

Open your spreadsheet and go to the "forGenMAPP" tab.
Click on cell A1 and select the menu item Data > Filter > Autofilter. Little drop-down arrows should appear at the top of each column. This will enable us to filter the data according to criteria we set.
Click on the drop-down arrow on your "Pvalue" column. Select "Custom". In the window that appears, set a criterion that will filter your data so that the Pvalue has to be less than 0.05.
- How many genes have p value < 0.05? and what is the percentage (out of 7251)?
  - 4318 genes which is 60%
- What about p < 0.01? and what is the percentage (out of 7251)?
  - 2971 genes which is 41%
- What about p < 0.001? and what is the percentage (out of 7251)?
  - 1460 genes which is 20%
- What about p < 0.0001? and what is the percentage (out of 7251)?
  - 645 genes which is 9%

When we use a p value cut-off of p < 0.05, what we are saying is that you would have seen a gene expression change that deviates this far from zero less than 5% of the time.
We have just performed 5221 T tests for significance. Another way to state what we are seeing with p < 0.05 is that we would expect to see this magnitude of a gene expression change in about 5% of our T tests, or 261 times. (Test your understanding: http://xkcd.com/882/.) Since we have more than 261 genes that pass this cut off, we know that some genes are significantly changed. However, we don't know which ones. To apply a more stringent criterion to our p values, we performed the Bonferroni and Benjamini and Hochberg corrections to these unadjusted p values. The Bonferroni correction is very stringent. The Benjamini-Hochberg correction is less stringent. To see this relationship, filter your data to determine the following:
- How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 7251)?
  - 179 genes which is 2.4%
- How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 7251)?
  - 605 genes which is 8.3%

In summary, the p value cut-off should not be thought of as some magical number at which data becomes "significant". Instead, it is a moveable confidence level. If we want to be very confident of our data, use a small p value cut-off. If we are OK with being less confident about a gene expression change and want to include more genes in our analysis, we can use a larger p value cut-off.

The "biofilm_tobramycin_ratio" tells us the size of the gene expression change and in which direction. Positive values are increases relative to the control; negative values are decreases relative to the control.
- Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "biofilm_tobramycin_ratio" column to show all genes with an average log fold change greater than zero.

How many are there? (and %)

- - 3279 genes which is 45%

- Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "biofilm_tobramycin_ratio" column to show all genes with an average log fold change less than zero.

How many are there? (and %)

- - 3127 genes which is 43%

- What about an average log fold change of > 0.25 and p < 0.05? (and %)
  - 1613 genes which is 22%

- Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
  - 1519 genes which is 21%

Sanity Check: Compare individual genes with known data

Merrell et al. (2002) report that genes with IDs: VC0028, VC0941, VC0869, VC0051, VC0647, VC0468, VC2350, and VCA0583 were all significantly changed in their data. Look these genes up in your spreadsheet. What are their fold changes and p values? Are they significantly changed in our analysis?

VC0028

Fold Change:1.65, 1.27

P-Value: first entry = 0.0474, 0.0692

Significance: statistically significant, not statistically significant

VC0941

Fold Change:0.09, -0.28

P-Value: 0.6759, 0.1636

Significance:not statistically significant, not statistically significant

VC0869

Fold Change :1.59, 1.95, 2.20, 1.50, 2.12

P-Value:0.0463,0.0227,0.0020,0.0174,0.0200

Significance:significant,significant,significant,significant,significant

VC0051

Fold Change:1.92, 1.89

P-Value:0.0139,0.0160

Significance:statistically significant,statistically significant

VC0468

Fold Change: -0.17

P-Value: 0.3350

Significance: not statistically significant

VC2350

Fold Change: -2.40

P-Value: 0.0130

Significance: statistically significant

VCA0583

Fold Change: 1.06

P-Value: 0.1011

Significance: not statistically significant

Links

Vpachec3 User Page

@@ Line 46: / Line 46: @@
 ** Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "biofilm_tobramycin_ratio" column to show all genes with an average log fold change greater than zero.
 '''''How many are there? (and %)'''''
+***3279 genes which is 45%
 ** Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "biofilm_tobramycin_ratio" column to show all genes with an average log fold change less than zero.
 '''''How many are there? (and %)'''''
+***3127 genes which is 43%
 ** '''''What about an average log fold change of > 0.25 and p < 0.05? (and %)'''''
+***1613 genes which is 22%
 ** '''''Or an average log fold change of < -0.25 and p < 0.05? (and %)'''''  (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
+***1519 genes which is 21%
 ===Sanity Check:  Compare individual genes with known data===

Difference between revisions of "Vpachec3 Week 14"

Revision as of 20:06, 5 December 2015

Contents

Thursday,December 3

Sanity Check: Number of genes significantly changed

Sanity Check: Compare individual genes with known data

Links

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools