Difference between revisions of "Kevin Wyllie Week 14"

Revision as of 19:58, 7 December 2015

Electronic Lab Notebook

Statistical Analysis and Formatting

We looked at Dr. Dahlquist's comments on our prior worksheet to make a second attempt at processing and formatting for GenMAPP. Our biggest error in the the previous attempt was forgetting to account for the fact that each gene was spotted in (technical) quadruplicates.

Dr. Dahlquist prepared a sheet for us to start with, which separated each quadruplicate of the genome across columns. So the ID column for the first technical replicate was in column A, followed by fold change data for one of the four technical replicates within each of the biological replicates, for the (untreated) biofilm and tobramycin-treated biofilm samples. Next was the second ID column (identical to the first) in column J, followed by the second set of technical replicates within each biological replicate. These sheet was named "quadruplicated_spots_separated".
After all four sets of technical replicates, yet another ID column (column AK) was added, following by the averaged fold change values across each technical replicate (for each biological replicate - columns AL through AS).
- For these averages the =AVERAGE(*four technical replicates*) command was used and pasted through the entirety of each column. The *four technical replicates* is a placeholder for the cell coordinates corresponding to each technical replicate within a biological replicate.
- Because all five untreated and all three treated samples will be averaged later in the protocol, the word "consolidated" (instead of "average") was used in the headings for these columns. For example, the column heading for the first biological untreated sample was "consolidated_Biofilm_1_scaled_centered_4".
Next, a new sheet was created, named "statistics". Pasted into this sheet (using the "Paste Values" function) were the "consolidated" columns (preceded by the ID column). Thus, the ID's were in column A while the consolidated fold changes for each biological replicate were in columns B through J. The average across the five untreated samples were calculated in column K (using the previously mentioned command), with the header "AVG_Biofilm_scaled_centered". In column L, the same was done using the three treated samples.
In column M, the ratios between the averages for treated and untreated samples were calculated. However, because these fold changes are in log space the fold changes for the treated samples were subtracted from that of the untreated samples. The header for this column was "Biofilm_Tobramycin_ratio".
- An example command for the gene in 2, is =L2-K2.
Next, the p-value's for each fold change ratio was calculated in column M. A type-3, 2-tailed T-test was used.
- The command for the first gene (row 2) is =TTEST(C2:G2,H2:J2,2,3). As before, this was pasted through all genes.
The next sheet was named "bonferroni_pval". Pasted into this sheet were the gene ID's (column A), the biofilm-tobramycin fold change ratios (column B) and the previously calculated P-values (column C).
The header for column D was "bonferroni_p_value," and in this column were the Bonferroni adjusted P-values. To calculate this value, each previous P-value was multiplied by 7,251 (the number of genes on the sheet).
- For example, the command for the first gene (cell D2) was =C2*7251.
Because this yields many P-values which are over 1 (which statistically speaking, should be impossible), a second "bonferroni_p_value" column was added next to the previous one (column E). In this column, P-values over 1 were replaced with a 1.
- The command for cell E2 was =IF(D2>1,1,D2).
The next sheet was named "BH_pval".
First, the same information pasted into the "bonferroni_p_value" sheet was pasted into this new sheet.
A column was then inserted between columns A and B. The header for this new column (now column B), was "Master Index." With the gene ID's in numerical order, an ascending count was added to the Master Index column. This was done by adding values 1 and 2 to cells B2 and B3 respectively, highlighting these two cells, and right-clicking on the black square at the bottom-right of the highlighted region. This will automatically fill in the ascending count.
A filter was then added to the P-value column (these are the unadjusted P-values, not the previously calculated Bonferroni-adjusted ones), and was used to sort this column in ascending order.

Sanity Check

There are 7251 genes in the sheet.

How many genes have p value < 0.05? And what is the percentage?
- 4318, ~60%
What about p < 0.01?
- 2971, ~41%
What about p < 0.001?
- 1460, ~20%
What about p < 0.0001?
- 645, ~9%

How many genes are p < 0.05 for the Bonferroni-corrected p value?
- 179, ~2.4%
How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value?
- 609, ~8.4%

"Pvalue" < 0.05, and "Biofilm_Tobramycin_ratio" > 0.
"Pvalue" < 0.05, and "Biofilm_Tobramycin_ratio" > 0.

This is a more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology:

"Biofilm_Tobramycin_ratio" > 0.25 and "Pvalue" < 0.05.
"Biofilm_tobramycin_ratio" < -0.25 and "Pvalue" < 0.05.

Links

Weekly Group Assignments

Shared Group Journals

Project Links

Team Members

User:kwyllie

Kevin Wyllie Week 2 (See the original assignment and class journal.)
Kevin Wyllie Week 3 (See the original assignment and class journal.)
Kevin Wyllie Week 4 (See the original assignment and class journal.)
Kevin Wyllie Week 5 (See the original assignment and class journal.)
Kevin Wyllie Week 6 (See the original assignment and class journal.)
Kevin Wyllie Week 7 (See the original assignment and class journal.)
Kevin Wyllie Week 8 (See the original assignment and class journal.)
Kevin Wyllie Week 9 (See the original assignment and class journal.)
Kevin Wyllie Week 10 (See the original assignment.)
Kevin Wyllie Week 11 (See the original assignment.)
Kevin Wyllie Week 12 (See the original assignment.)
Kevin Wyllie Week 14 (See the original assignment.)
Kevin Wyllie Week 15 (See the original assignment.)

@@ Line 3: / Line 3: @@
 === Statistical Analysis and Formatting ===
-'''We looked at Dr. Dahlquist's comments on our prior worksheet to make a second attempt at processing and formatting for GenMAPP. Primarily, our biggest error in the the previous attempt was forgetting to account for the fact that each gene was spotted in (technical) quadruplicates.'''
+'''We looked at Dr. Dahlquist's comments on our prior worksheet to make a second attempt at processing and formatting for GenMAPP. Our biggest error in the the previous attempt was forgetting to account for the fact that each gene was spotted in (technical) quadruplicates.'''
 # Dr. Dahlquist prepared a sheet for us to start with, which separated each quadruplicate of the genome across columns. So the ID column for the first technical replicate was in column A, followed by fold change data for one of the four technical replicates within each of the biological replicates, for the (untreated) biofilm and tobramycin-treated biofilm samples. Next was the second ID column (identical to the first) in column J, followed by the second set of technical replicates within each biological replicate. These sheet was named "'''quadruplicated_spots_separated'''".
@@ Line 10: / Line 10: @@
 #* Because all five untreated and all three treated samples will be averaged later in the protocol, the word "consolidated" (instead of "average") was used in the headings for these columns. For example, the column heading for the first biological untreated sample was "consolidated_Biofilm_1_scaled_centered_4".
 # Next, a new sheet was created, named "'''statistics'''". Pasted into this sheet (using the "Paste Values" function) were the "consolidated" columns (preceded by the ID column). Thus, the ID's were in column A while the consolidated fold changes for each biological replicate were in columns B through J. The average across the five untreated samples were calculated in column K (using the previously mentioned command), with the header "AVG_Biofilm_scaled_centered". In column L, the same was done using the three treated samples.
-# In column M, the ratios between the averages for treated and untreated samples were calculated. However, because these fold changes are in log space the fold changes for the treated samples were ''subtracted'' from that of the untreated samples.
+# In column M, the ratios between the averages for treated and untreated samples were calculated. However, because these fold changes are in log space the fold changes for the treated samples were ''subtracted'' from that of the untreated samples. The header for this column was "Biofilm_Tobramycin_ratio".
 #* An example command for the gene in 2, is <code>=L2-K2</code>.
-# Next, the p-value's for each fold change ratio was calculated. A type-3, 2-tailed T-test was used.
+# Next, the p-value's for each fold change ratio was calculated in column M. A type-3, 2-tailed T-test was used.
 #* The command for the first gene (row 2) is <code>=TTEST(C2:G2,H2:J2,2,3)</code>. As before, this was pasted through all genes.
-# The next sheet was named "'''bonferroni_pval'''".
+# The next sheet was named "'''bonferroni_pval'''". Pasted into this sheet were the gene ID's (column A), the biofilm-tobramycin fold change ratios (column B) and the previously calculated P-values (column C).
+# The header for column D was "bonferroni_p_value," and in this column were the Bonferroni adjusted P-values. To calculate this value, each previous P-value was multiplied by 7,251 (the number of genes on the sheet).
+#* For example, the command for the first gene (cell D2) was <code>=C2*7251</code>.
+# Because this yields many P-values which are over 1 (which statistically speaking, should be impossible), a second "bonferroni_p_value" column was added next to the previous one (column E). In this column, P-values over 1 were replaced with a 1.
+#* The command for cell E2 was <code>=IF(D2>1,1,D2)</code>.
+# The next sheet was named "'''BH_pval'''".
+# First, the same information pasted into the "bonferroni_p_value" sheet was pasted into this new sheet.
+# A column was then inserted between columns A and B. The header for this new column (now column B), was "Master Index." With the gene ID's in numerical order, an ascending count was added to the Master Index column. This was done by adding values 1 and 2 to cells B2 and B3 respectively, highlighting these two cells, and right-clicking on the black square at the bottom-right of the highlighted region. This will automatically fill in the ascending count.
+# A filter was then added to the P-value column (these are the unadjusted P-values, not the previously calculated Bonferroni-adjusted ones), and was used to sort this column in ascending order.
 === Sanity Check ===

Difference between revisions of "Kevin Wyllie Week 14"

Revision as of 19:58, 7 December 2015

Contents

Electronic Lab Notebook

Statistical Analysis and Formatting

Sanity Check

Links

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools