LMU BioDB 2015 - User contributions [en]

Rlegaspi Week 15

2015-12-12T22:19:20Z

Rlegaspi: /* Sanity Check */ Edited link to microarray analysis procedure for sanity check and did some stylistic changes to some text.

{{Heavy Metal HaterZ}}

= Goals for Week 15 =
== Data Preparation and Statistical Analysis for GenMAPP ==
# Consult with Dr. Dahlquist on how to process the data (normalization, statistics)
# Perform the statistical analysis in Excel.
# Format the gene expression data for import into GenMAPP.
<nowiki>*</nowiki>Similar to goals from Week 14.

= Summary of Progress and Procedure =
== Compiling Raw Data and Statistical Analysis ==
=== December 8, 2015 ===
*Calculated averages from the split data
**Discovered that there are a total of '''5408''' genes.
*Calculated biological averages of each time point
*Calculated AverageLogRatio comparing C5, C20, and C60 to C0 and F5, F20, and F60 to C60
**Subtracted not divided due to log space
*Performed TTest on the above relationships to get the Pvalue
*Performed Bonferroni
*Performed Benjamini & Hochberg
*Excel file after all of these procedures were uploaded to XMLPipeDB and link to file is as follows: [[File:StatisticalAnalysis Shewanella RARL 20151207 HMH.xlsx]]

=== December 10, 2015 ===
Preparing compiled raw data for GenMAPP and creation of a .txt file

[[File:StatisticalAnalysis Shewanella RARL 20151210 HMH.xlsx]]

.txt file: [[File:CompiledRawDataforGenMAPP Shewanella RARL 20151210 HMH.txt]]

=== Sanity Check ===
Importance of Sanity Check (from [http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae | DNA Microarray Analysis Activity]: In summary, the p value cut-off should not be thought of as some magical number at which data becomes "significant". Instead, it is a moveable confidence level. If we want to be very confident of our data, use a small p value cut-off. If we are OK with being less confident about a gene expression change and want to include more genes in our analysis, we can use a larger p value cut-off. For the GenMAPP analysis below, we will use the fold change cut-off of greater than 0.25 or less than -0.25 and the unadjusted p value cut off of p < 0.05 for our analysis because we want to include several hundred genes in our analysis. (Note: The "AvgLogRatio" tells us the size of the gene expression change and in which direction. Positive values are increases relative to the control; negative values are decreases relative to the control.
*'''C5 and C0'''
**''How many genes have p value < 0.05? and what is the percentage (out of 5408)?''
***344 genes, 6.36%
**''What about p < 0.01? and what is the percentage (out of 5408)?''
***94 genes, 1.74%
**''What about p < 0.001? and what is the percentage (out of 5408)?''
***18 genes, 0.33%
**''What about p < 0.0001? and what is the percentage (out of 5408)?''
***5 genes, 0.09%
**''How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?''
***2 genes, 0.04%
**''How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?''
***2 genes, 0.037%
**''Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)''
***180 genes, 3.33%
**''Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)''
***164 genes, 3.03%
**''What about an average log fold change of > 0.25 and p < 0.05? (and %)''
***161 genes, 2.98%
**''Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)''
***149 genes, 2.76%
*'''C20 and C0'''
**''How many genes have p value < 0.05? and what is the percentage (out of 5408)?''
***868 genes, 16.05%
**''What about p < 0.01? and what is the percentage (out of 5408)?''
***342 genes, 6.32%
**''What about p < 0.001? and what is the percentage (out of 5408)?''
***79 genes, 1.46%
**''What about p < 0.0001? and what is the percentage (out of 5408)?''
***14 genes, 0.26%
**''How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?''
***1 gene, 0.01%
**''How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?''
***34 genes, 0.63%
**''Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)''
***452 genes,
**''Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)''
***416 genes,
**''What about an average log fold change of > 0.25 and p < 0.05? (and %)''
***437 genes,
**''Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)''
***405 genes,
*'''C60 and C0'''
**''How many genes have p value < 0.05? and what is the percentage (out of 5408)?''
***1017 genes, 18.81%
**''What about p < 0.01? and what is the percentage (out of 5408)?''
***471 genes, 8.71%
**''What about p < 0.001? and what is the percentage (out of 5408)?''
***163 genes, 3.01%
**''What about p < 0.0001? and what is the percentage (out of 5408)?''
***53 genes, 0.98%
**''How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?''
***13 genes, 0.24%
**''How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?''
***229 genes, 4.23%
**''Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)''
***487 genes, 9.01%
**''Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)''
***530 genes, 9.80%
**''What about an average log fold change of > 0.25 and p < 0.05? (and %)''
***475 genes, 8.78%
**''Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)''
***513 genes, 9.49%
*'''F5 and C60'''
**''How many genes have p value < 0.05? and what is the percentage (out of 5408)?''
***969 genes, 17.92%
**''What about p < 0.01? and what is the percentage (out of 5408)?''
***315 genes, 5.82%
**''What about p < 0.001? and what is the percentage (out of 5408)?''
***40 genes, 0.74%
**''What about p < 0.0001? and what is the percentage (out of 5408)?''
***7 genes, 0.13%
**''How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?''
***1 gene, 0.01%
**''How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?''
***4 genes, 0.07%
**''Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)''
***479 genes, 8.86%
**''Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)''
***490 genes, 9.06%
**''What about an average log fold change of > 0.25 and p < 0.05? (and %)''
***441 genes, 8.15%
**''Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)''
***431 genes, 7.97%
*'''F20 and C60'''
**''How many genes have p value < 0.05? and what is the percentage (out of 5408)?''
***1838 genes, 33.99%
**''What about p < 0.01? and what is the percentage (out of 5408)?''
***892 genes, 16.49%
**''What about p < 0.001? and what is the percentage (out of 5408)?''
***239 genes, 4.42%
**''What about p < 0.0001? and what is the percentage (out of 5408)?''
***54 genes, 1.00%
**''How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?''
***10 genes, 0.18%
**''How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?''
***707 genes, 13.07%
**''Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)''
***826 genes, 15.27%
**''Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)''
***1012 genes, 18.71%
**''What about an average log fold change of > 0.25 and p < 0.05? (and %)''
***788 genes, 14.57%
**''Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)''
***963 genes, 17.81%
*'''F60 and C60'''
**''How many genes have p value < 0.05? and what is the percentage (out of 5408)?''
***2070 genes, 38.28%
**''What about p < 0.01? and what is the percentage (out of 5408)?''
***1140 genes, 21.08%
**''What about p < 0.001? and what is the percentage (out of 5408)?''
***387 genes, 7.16%
**''What about p < 0.0001? and what is the percentage (out of 5408)?''
***120 genes, 2.22%
**''How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?''
***33 genes, 0.61%
**''How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?''
***1193 genes, 22.06%
**''Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)''
***870 genes, 16.09%
**''Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)''
***1200 genes, 22.19%
**''What about an average log fold change of > 0.25 and p < 0.05? (and %)''
***828 genes, 15.31%
**''Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)''
***1146 genes, 21.19%

= External Links =
{{Template:Rlegaspi}}

Rlegaspi Week 15

2015-12-12T22:15:07Z

Rlegaspi: /* Sanity Check */ Inputted the second part of answer for questions (percentages)

{{Heavy Metal HaterZ}}

= Goals for Week 15 =
== Data Preparation and Statistical Analysis for GenMAPP ==
# Consult with Dr. Dahlquist on how to process the data (normalization, statistics)
# Perform the statistical analysis in Excel.
# Format the gene expression data for import into GenMAPP.
<nowiki>*</nowiki>Similar to goals from Week 14.

= Summary of Progress and Procedure =
== Compiling Raw Data and Statistical Analysis ==
=== December 8, 2015 ===
*Calculated averages from the split data
**Discovered that there are a total of '''5408''' genes.
*Calculated biological averages of each time point
*Calculated AverageLogRatio comparing C5, C20, and C60 to C0 and F5, F20, and F60 to C60
**Subtracted not divided due to log space
*Performed TTest on the above relationships to get the Pvalue
*Performed Bonferroni
*Performed Benjamini & Hochberg
*Excel file after all of these procedures were uploaded to XMLPipeDB and link to file is as follows: [[File:StatisticalAnalysis Shewanella RARL 20151207 HMH.xlsx]]

=== December 10, 2015 ===
Preparing compiled raw data for GenMAPP and creation of a .txt file

[[File:StatisticalAnalysis Shewanella RARL 20151210 HMH.xlsx]]

.txt file: [[File:CompiledRawDataforGenMAPP Shewanella RARL 20151210 HMH.txt]]

=== Sanity Check ===
Importance of Sanity Check (from [[http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae | DNA Microarray Analysis Activity]]: In summary, the p value cut-off should not be thought of as some magical number at which data becomes "significant". Instead, it is a moveable confidence level. If we want to be very confident of our data, use a small p value cut-off. If we are OK with being less confident about a gene expression change and want to include more genes in our analysis, we can use a larger p value cut-off. For the GenMAPP analysis below, we will use the fold change cut-off of greater than 0.25 or less than -0.25 and the unadjusted p value cut off of p < 0.05 for our analysis because we want to include several hundred genes in our analysis. (Note: The "AvgLogRatio" tells us the size of the gene expression change and in which direction. Positive values are increases relative to the control; negative values are decreases relative to the control.
*'''C5 and C0'''
**How many genes have p value < 0.05? and what is the percentage (out of 5408)?
***344 genes, 6.36%
**What about p < 0.01? and what is the percentage (out of 5408)?
***94 genes, 1.74%
**What about p < 0.001? and what is the percentage (out of 5408)?
***18 genes, 0.33%
**What about p < 0.0001? and what is the percentage (out of 5408)?
***5 genes, 0.09%
**How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?
***2 genes, 0.04%
**How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?
***2 genes, 0.037%
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)
***180 genes, 3.33%
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)
***164 genes, 3.03%
**What about an average log fold change of > 0.25 and p < 0.05? (and %)
***161 genes, 2.98%
**Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
***149 genes, 2.76%
*'''C20 and C0'''
**How many genes have p value < 0.05? and what is the percentage (out of 5408)?
***868 genes, 16.05%
**What about p < 0.01? and what is the percentage (out of 5408)?
***342 genes, 6.32%
**What about p < 0.001? and what is the percentage (out of 5408)?
***79 genes, 1.46%
**What about p < 0.0001? and what is the percentage (out of 5408)?
***14 genes, 0.26%
**How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?
***1 gene, 0.01%
**How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?
***34 genes, 0.63%
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)
***452 genes,
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)
***416 genes,
**What about an average log fold change of > 0.25 and p < 0.05? (and %)
***437 genes,
**Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
***405 genes,
*'''C60 and C0'''
**How many genes have p value < 0.05? and what is the percentage (out of 5408)?
***1017 genes, 18.81%
**What about p < 0.01? and what is the percentage (out of 5408)?
***471 genes, 8.71%
**What about p < 0.001? and what is the percentage (out of 5408)?
***163 genes, 3.01%
**What about p < 0.0001? and what is the percentage (out of 5408)?
***53 genes, 0.98%
**How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?
***13 genes, 0.24%
**How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?
***229 genes, 4.23%
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)
***487 genes, 9.01%
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)
***530 genes, 9.80%
**What about an average log fold change of > 0.25 and p < 0.05? (and %)
***475 genes, 8.78%
**Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
***513 genes, 9.49%
*'''F5 and C60'''
**How many genes have p value < 0.05? and what is the percentage (out of 5408)?
***969 genes, 17.92%
**What about p < 0.01? and what is the percentage (out of 5408)?
***315 genes, 5.82%
**What about p < 0.001? and what is the percentage (out of 5408)?
***40 genes, 0.74%
**What about p < 0.0001? and what is the percentage (out of 5408)?
***7 genes, 0.13%
**How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?
***1 gene, 0.01%
**How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?
***4 genes, 0.07%
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)
***479 genes, 8.86%
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)
***490 genes, 9.06%
**What about an average log fold change of > 0.25 and p < 0.05? (and %)
***441 genes, 8.15%
**Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
***431 genes, 7.97%
*'''F20 and C60'''
**How many genes have p value < 0.05? and what is the percentage (out of 5408)?
***1838 genes, 33.99%
**What about p < 0.01? and what is the percentage (out of 5408)?
***892 genes, 16.49%
**What about p < 0.001? and what is the percentage (out of 5408)?
***239 genes, 4.42%
**What about p < 0.0001? and what is the percentage (out of 5408)?
***54 genes, 1.00%
**How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?
***10 genes, 0.18%
**How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?
***707 genes, 13.07%
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)
***826 genes, 15.27%
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)
***1012 genes, 18.71%
**What about an average log fold change of > 0.25 and p < 0.05? (and %)
***788 genes, 14.57%
**Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
***963 genes, 17.81%
*'''F60 and C60'''
**How many genes have p value < 0.05? and what is the percentage (out of 5408)?
***2070 genes, 38.28%
**What about p < 0.01? and what is the percentage (out of 5408)?
***1140 genes, 21.08%
**What about p < 0.001? and what is the percentage (out of 5408)?
***387 genes, 7.16%
**What about p < 0.0001? and what is the percentage (out of 5408)?
***120 genes, 2.22%
**How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?
***33 genes, 0.61%
**How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?
***1193 genes, 22.06%
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)
***870 genes, 16.09%
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)
***1200 genes, 22.19%
**What about an average log fold change of > 0.25 and p < 0.05? (and %)
***828 genes, 15.31%
**Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
***1146 genes, 21.19%

= External Links =
{{Template:Rlegaspi}}

Rlegaspi Week 15

2015-12-12T02:04:29Z

Rlegaspi: /* Compiling Raw Data and Statistical Analysis */ Fixed formatting of bullet points

{{Heavy Metal HaterZ}}

= Goals for Week 15 =
== Data Preparation and Statistical Analysis for GenMAPP ==
# Consult with Dr. Dahlquist on how to process the data (normalization, statistics)
# Perform the statistical analysis in Excel.
# Format the gene expression data for import into GenMAPP.
<nowiki>*</nowiki>Similar to goals from Week 14.

= Summary of Progress and Procedure =
== Compiling Raw Data and Statistical Analysis ==
=== December 8, 2015 ===
*Calculated averages from the split data
**Discovered that there are a total of '''5408''' genes.
*Calculated biological averages of each time point
*Calculated AverageLogRatio comparing C5, C20, and C60 to C0 and F5, F20, and F60 to C60
**Subtracted not divided due to log space
*Performed TTest on the above relationships to get the Pvalue
*Performed Bonferroni
*Performed Benjamini & Hochberg
*Excel file after all of these procedures were uploaded to XMLPipeDB and link to file is as follows: [[File:StatisticalAnalysis Shewanella RARL 20151207 HMH.xlsx]]

=== December 10, 2015 ===
Preparing compiled raw data for GenMAPP and creation of a .txt file

[[File:StatisticalAnalysis Shewanella RARL 20151210 HMH.xlsx]]

.txt file: [[File:CompiledRawDataforGenMAPP Shewanella RARL 20151210 HMH.txt]]

=== Sanity Check ===
Importance of Sanity Check (from [[http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae | DNA Microarray Analysis Activity]]: In summary, the p value cut-off should not be thought of as some magical number at which data becomes "significant". Instead, it is a moveable confidence level. If we want to be very confident of our data, use a small p value cut-off. If we are OK with being less confident about a gene expression change and want to include more genes in our analysis, we can use a larger p value cut-off. For the GenMAPP analysis below, we will use the fold change cut-off of greater than 0.25 or less than -0.25 and the unadjusted p value cut off of p < 0.05 for our analysis because we want to include several hundred genes in our analysis. (Note: The "AvgLogRatio" tells us the size of the gene expression change and in which direction. Positive values are increases relative to the control; negative values are decreases relative to the control.
*'''C5 and C0'''
**How many genes have p value < 0.05? and what is the percentage (out of 5408)?
***344 genes, 6.36%
**What about p < 0.01? and what is the percentage (out of 5408)?
***94 genes, 1.74%
**What about p < 0.001? and what is the percentage (out of 5408)?
***18 genes, 0.33%
**What about p < 0.0001? and what is the percentage (out of 5408)?
***5 genes, 0.09%
**How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?
***2 genes, 0.04%
**How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?
***2 genes, 0.037%
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)
***180 genes, 3.33%
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)
***164 genes, 3.03%
**What about an average log fold change of > 0.25 and p < 0.05? (and %)
***161 genes, 2.98%
**Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
***149 genes, 2.76%
*'''C20 and C0'''
**How many genes have p value < 0.05? and what is the percentage (out of 5408)?
***868 genes, 16.05%
**What about p < 0.01? and what is the percentage (out of 5408)?
***342 genes, 6.32%
**What about p < 0.001? and what is the percentage (out of 5408)?
***79 genes, 1.46%
**What about p < 0.0001? and what is the percentage (out of 5408)?
***14 genes, 0.26%
**How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?
***1 gene, 0.01%
**How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?
***34 genes, 0.63%
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)
***452 genes,
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)
***416 genes,
**What about an average log fold change of > 0.25 and p < 0.05? (and %)
***437 genes,
**Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
***405 genes,
*'''C60 and C0'''
**How many genes have p value < 0.05? and what is the percentage (out of 5408)?
***1017 genes, 18.81%
**What about p < 0.01? and what is the percentage (out of 5408)?
***471 genes, 8.71%
**What about p < 0.001? and what is the percentage (out of 5408)?
***163 genes, 3.01%
**What about p < 0.0001? and what is the percentage (out of 5408)?
***53 genes, 0.98%
**How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?
***13 genes, 0.24%
**How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?
***229 genes, 4.23%
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)
***487 genes,
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)
***530 genes,
**What about an average log fold change of > 0.25 and p < 0.05? (and %)
***475 genes,
**Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
***513 genes,
*'''F5 and C60'''
**How many genes have p value < 0.05? and what is the percentage (out of 5408)?
***969 genes, 17.92%
**What about p < 0.01? and what is the percentage (out of 5408)?
***315 genes, 5.82%
**What about p < 0.001? and what is the percentage (out of 5408)?
***40 genes, 0.74%
**What about p < 0.0001? and what is the percentage (out of 5408)?
***7 genes, 0.13%
**How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?
***1 gene, 0.01%
**How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?
***4 genes, 0.07%
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)
***479 genes,
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)
***490 genes,
**What about an average log fold change of > 0.25 and p < 0.05? (and %)
***441 genes,
**Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
***431 genes,
*'''F20 and C60'''
**How many genes have p value < 0.05? and what is the percentage (out of 5408)?
***1838 genes, 33.99%
**What about p < 0.01? and what is the percentage (out of 5408)?
***892 genes, 16.49%
**What about p < 0.001? and what is the percentage (out of 5408)?
***239 genes, 4.42%
**What about p < 0.0001? and what is the percentage (out of 5408)?
***54 genes, 1.00%
**How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?
***10 genes, 0.18%
**How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?
***707 genes, 13.07%
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)
***826 genes,
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)
***1012 genes,
**What about an average log fold change of > 0.25 and p < 0.05? (and %)
***788 genes,
**Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
***963 genes,
*'''F60 and C60'''
**How many genes have p value < 0.05? and what is the percentage (out of 5408)?
***2070 genes, 38.28%
**What about p < 0.01? and what is the percentage (out of 5408)?
***1140 genes, 21.08%
**What about p < 0.001? and what is the percentage (out of 5408)?
***387 genes, 7.16%
**What about p < 0.0001? and what is the percentage (out of 5408)?
***120 genes, 2.22%
**How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?
***33 genes, 0.61%
**How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?
***1193 genes, 22.06%
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)
***870 genes,
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)
***1200 genes,
**What about an average log fold change of > 0.25 and p < 0.05? (and %)
***828 genes,
**Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
***1146 genes,

= External Links =
{{Template:Rlegaspi}}

Rlegaspi Week 15

2015-12-12T02:02:45Z

Rlegaspi: /* Sanity Check */ Finishing Sanity check of shewanella data and p values to check genes significantly changed, still need to input percentage calculations and make sense of all the values.

{{Heavy Metal HaterZ}}

= Goals for Week 15 =
== Data Preparation and Statistical Analysis for GenMAPP ==
# Consult with Dr. Dahlquist on how to process the data (normalization, statistics)
# Perform the statistical analysis in Excel.
# Format the gene expression data for import into GenMAPP.
<nowiki>*</nowiki>Similar to goals from Week 14.

= Summary of Progress and Procedure =
== Compiling Raw Data and Statistical Analysis ==
=== December 8, 2015 ===
*Calculated averages from the split data
**Discovered that there are a total of '''5408''' genes.
*Calculated biological averages of each time point
*Calculated AverageLogRatio comparing C5, C20, and C60 to C0 and F5, F20, and F60 to C60
**Subtracted not divided due to log space
*Performed TTest on the above relationships to get the Pvalue
*Performed Bonferroni
*Performed Benjamini & Hochberg
*Excel file after all of these procedures were uploaded to XMLPipeDB and link to file is as follows: [[File:StatisticalAnalysis Shewanella RARL 20151207 HMH.xlsx]]

=== December 10, 2015 ===
Preparing compiled raw data for GenMAPP and creation of a .txt file

[[File:StatisticalAnalysis Shewanella RARL 20151210 HMH.xlsx]]

.txt file: [[File:CompiledRawDataforGenMAPP Shewanella RARL 20151210 HMH.txt]]

==== Sanity Check ====
Importance of Sanity Check (from [[http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae | DNA Microarray Analysis Activity]]: In summary, the p value cut-off should not be thought of as some magical number at which data becomes "significant". Instead, it is a moveable confidence level. If we want to be very confident of our data, use a small p value cut-off. If we are OK with being less confident about a gene expression change and want to include more genes in our analysis, we can use a larger p value cut-off. For the GenMAPP analysis below, we will use the fold change cut-off of greater than 0.25 or less than -0.25 and the unadjusted p value cut off of p < 0.05 for our analysis because we want to include several hundred genes in our analysis. (Note: The "AvgLogRatio" tells us the size of the gene expression change and in which direction. Positive values are increases relative to the control; negative values are decreases relative to the control.
*'''C5 and C0'''
**How many genes have p value < 0.05? and what is the percentage (out of 5408)?
***344 genes, 6.36%
**What about p < 0.01? and what is the percentage (out of 5408)?
***94 genes, 1.74%
**What about p < 0.001? and what is the percentage (out of 5408)?
***18 genes, 0.33%
**What about p < 0.0001? and what is the percentage (out of 5408)?
***5 genes, 0.09%
**How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?
***2 genes, 0.04%
How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?
***2 genes, 0.037%
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)
***180 genes, 3.33%
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)
***164 genes, 3.03%
**What about an average log fold change of > 0.25 and p < 0.05? (and %)
***161 genes, 2.98%
**Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
***149 genes, 2.76%
*'''C20 and C0'''
**How many genes have p value < 0.05? and what is the percentage (out of 5408)?
***868 genes, 16.05%
**What about p < 0.01? and what is the percentage (out of 5408)?
***342 genes, 6.32%
**What about p < 0.001? and what is the percentage (out of 5408)?
***79 genes, 1.46%
**What about p < 0.0001? and what is the percentage (out of 5408)?
***14 genes, 0.26%
**How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?
***1 gene, 0.01%
How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?
***34 genes, 0.63%
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)
***452 genes,
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)
***416 genes,
**What about an average log fold change of > 0.25 and p < 0.05? (and %)
***437 genes,
**Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
***405 genes,
*'''C60 and C0'''
**How many genes have p value < 0.05? and what is the percentage (out of 5408)?
***1017 genes, 18.81%
**What about p < 0.01? and what is the percentage (out of 5408)?
***471 genes, 8.71%
**What about p < 0.001? and what is the percentage (out of 5408)?
***163 genes, 3.01%
**What about p < 0.0001? and what is the percentage (out of 5408)?
***53 genes, 0.98%
**How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?
***13 genes, 0.24%
How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?
***229 genes, 4.23%
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)
***487 genes,
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)
***530 genes,
**What about an average log fold change of > 0.25 and p < 0.05? (and %)
***475 genes,
**Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
***513 genes,
*'''F5 and C60'''
**How many genes have p value < 0.05? and what is the percentage (out of 5408)?
***969 genes, 17.92%
**What about p < 0.01? and what is the percentage (out of 5408)?
***315 genes, 5.82%
**What about p < 0.001? and what is the percentage (out of 5408)?
***40 genes, 0.74%
**What about p < 0.0001? and what is the percentage (out of 5408)?
***7 genes, 0.13%
**How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?
***1 gene, 0.01%
How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?
***4 genes, 0.07%
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)
***479 genes,
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)
***490 genes,
**What about an average log fold change of > 0.25 and p < 0.05? (and %)
***441 genes,
**Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
***431 genes,
*'''F20 and C60'''
**How many genes have p value < 0.05? and what is the percentage (out of 5408)?
***1838 genes, 33.99%
**What about p < 0.01? and what is the percentage (out of 5408)?
***892 genes, 16.49%
**What about p < 0.001? and what is the percentage (out of 5408)?
***239 genes, 4.42%
**What about p < 0.0001? and what is the percentage (out of 5408)?
***54 genes, 1.00%
**How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?
***10 genes, 0.18%
How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?
***707 genes, 13.07%
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)
***826 genes,
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)
***1012 genes,
**What about an average log fold change of > 0.25 and p < 0.05? (and %)
***788 genes,
**Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
***963 genes,
*'''F60 and C60'''
**How many genes have p value < 0.05? and what is the percentage (out of 5408)?
***2070 genes, 38.28%
**What about p < 0.01? and what is the percentage (out of 5408)?
***1140 genes, 21.08%
**What about p < 0.001? and what is the percentage (out of 5408)?
***387 genes, 7.16%
**What about p < 0.0001? and what is the percentage (out of 5408)?
***120 genes, 2.22%
**How many genes are p < 0.05 for the Bonferroni-corrected p value? and what is the percentage (out of 5408)?
***33 genes, 0.61%
How many genes are p < 0.05 for the Benjamini and Hochberg-corrected p value? and what is the percentage (out of 5408)?
***1193 genes, 22.06%
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change greater than zero. How many are there? (and %)
***870 genes,
**Keeping the (unadjusted) "Pvalue" filter at p < 0.05, filter the "AvgLogRatio" column to show all genes with an average log fold change less than zero. How many are there? (and %)
***1200 genes,
**What about an average log fold change of > 0.25 and p < 0.05? (and %)
***828 genes,
**Or an average log fold change of < -0.25 and p < 0.05? (and %) (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
***1146 genes,

= External Links =
{{Template:Rlegaspi}}

Rlegaspi Week 15

2015-12-10T23:41:47Z

Rlegaspi: /* Compiling Raw Data and Statistical Analysis */ Did initial sanity check for C5/C0

Rlegaspi Week 15

2015-12-10T23:21:33Z

Rlegaspi: /* Compiling Raw Data and Statistical Analysis */ Inserting files for GenMAPP, procedure descriptions to follow.

File:CompiledRawDataforGenMAPP Shewanella RARL 20151210 HMH.txt

2015-12-10T23:21:07Z

Rlegaspi: .txt file for GenMAPP contains averages, average log ratio, p values from ttest, and bonferroni pvalues

.txt file for GenMAPP contains averages, average log ratio, p values from ttest, and bonferroni pvalues

File:StatisticalAnalysis Shewanella RARL 20151210 HMH.xlsx

2015-12-10T23:18:19Z

Rlegaspi: This excel file contains all of the compiled raw data, statistical analysis, and forGenMAPP sheet.

This excel file contains all of the compiled raw data, statistical analysis, and forGenMAPP sheet.

File:StatisticalAnalysis Shewanella RARL 20151207 HMH.xlsx

2015-12-10T23:03:39Z

Rlegaspi: Rlegaspi uploaded a new version of File:StatisticalAnalysis Shewanella RARL 20151207 HMH.xlsx

Rlegaspi Week 15

2015-12-09T08:26:13Z

Rlegaspi: Included template under external links portion of page.

Rlegaspi Week 15

2015-12-09T08:25:26Z

Rlegaspi: Similar procedure done by Emily. Still need to compare data with one another.

File:StatisticalAnalysis Shewanella RARL 20151207 HMH.xlsx

2015-12-09T08:19:35Z

Rlegaspi: Rlegaspi uploaded a new version of File:StatisticalAnalysis Shewanella RARL 20151207 HMH.xlsx

Rlegaspi Week 14

2015-12-08T07:39:25Z

Rlegaspi: /* December 3, 2015 thru December 8, 2015 */ updated file link

{{Heavy Metal HaterZ}}

= Goals for Week 14 =
== Data Preparation and Statistical Analysis ==
# Create a Master Raw Data file that contains the IDs and columns of data required for further analysis.
# Consult with Dr. Dahlquist on how to process the data (normalization, statistics)
# Perform the statistical analysis in Excel.
# Format the gene expression data for import into GenMAPP.

= Summary of Progress and Procedure =
== Compiling Raw Data and Statistical Analysis ==
=== December 1, 2015 through December 3, 2015 ===
Referencing the Week 12 Feedback provided by Dr. Dahlquist, I was able to begin compiling the raw data on single Excel File:
*Created an Excel File and named file ''Raw Data Shewanella RARL 20151201''
*Sheet 1 was entitled CompiledRawData Sheet:
**Column 1 = Gene ID
**Column 2 = MasterIndex (numbered from 1 to 11520)
**The rest of the columns was log data taken from the 0, 5, 20, and 60 time points respectively
***7 timepoints total (C0, C5, C20, C60, F5, F20, F60) and 4 replicates total; therefore, 28 total columns of data
*Created a MasterSheet and copied information from CompiledRawData Sheet into this new sheet
**Sorted the Gene ID's in alphabetical order (A-Z) and deleted the rows that contained an ID of '''Blank, blank, gDNA, NC-, or ORF''' resulting in the deletion of '''705 rows.'''
**Deleted the cells that contained the error message of <code>#NUM!</code> which resulted in the deletion of '''2,118 cells.'''
**Deleted the cells that contained the error message of <code>#DIV/0!</code> which resulted in the deletion of '''23 cells.'''
*Created a ScalingCentering Sheet
**Copied over data from the MasterSheet
**Added two rows right below the title row to represent the calculations for the Average and the Standard Deviation of each column
**For the Scaled and Centered Columns of data, typed the equation <code>=(C4-C$2)/C$3</code> in the first cell under scaled and centered column for replicate 1 at timepoint C0, and used Excel functions in order to scale and center the rest of the data with the equation as a template.
*Sent this file to Dr. Dahlquist to split the data to get rid of duplicates: [[File:Raw Data Shewanella RARL 20151201.xlsx]]
=== December 3, 2015 thru December 8, 2015 ===
Discrepancies and issues arose with data between my partner Emily Simso and I that were brought to our attention by Dr. Dahlquist; thus, a review of the compiled raw data needed to be done in order for our Excel Sheets to match and to continue on with statistical analysis:
*Repeated procedure from [[File:Raw Data Shewanella RARL 20151201.xlsx]]; however, feedback from Dr. Dahlquist was kept in mind and created a new Excel file called ''UpdatedCompiledRawData Shewanella RARL 20151201 HMH''
**Correct set of timepoints were used in my previous Excel file so no changes were needed to be done there
**Ensured that I had 11520 Gene IDs; in which the last row which had a "Gene ID" as its label was changed to the correct Gene ID that is "SO4357."
**Once all the necessary changes were made and I had touched base with my partner Emily on Sunday and Monday, I had uploaded the file for splitting by Dr. Dahlquist: [[File:UpdatedCompiledRawData Shewanella RARL 20151201 HMH.xlsx]]
*Split data was received and posted as a file on our team's file page by Dr. Dahlquist: [[File:UpdatedCompiledRawData Shewanella RARL 20151201 HMH forsplitting.xlsx]]
**Downloaded this file and copied the sheets of data into a new Excel file entitled ''StatisticalAnalysis Shewanella RARL 20151207 HMH''
**Created a new sheet called Averages
***Averaged together the replicate data from the two spots that are now split and used the equation <code>=AVERAGE(C2,AG2)</code> under the column for C0 replicate 1
***Used excel to copy this equation to the entire column and get a derivative of the equation copied for the other columns of averages for each replicate
**Created a new sheet called Statistics
***Copied and pasted values from the Averages sheet into this new sheet
***Computed the average of the biological replicates for each treatment, biological average was calculated with the following equation for C0: <code>=AVERAGE(C2:F2)</code> and a derivative of this equation was used for every timepoint.
***Calculated the average log ratios of C5/C0, C20/C0, C60/C0, F5/C60, F20/C60, and F60/C60
****Since its in log space, I just needed to subtract the average from the C5 to the average from the C0.
***Performed a two-sample T test between C5 and C0, C20 and C0, C60 and C0, F5 and C60, and so on:
=TTEST(<range of cells containing the biological replicates for C0>, <range of cells containing the biological replicates for C5>, 2,3]
: This will returned the p value and uploaded the file to the team's file page to be reviewed by Dr. Dahlquist, while performing a sanity check: [[File:StatisticalAnalysis Shewanella RARL 20151207 HMH.xlsx]]

= External Links =
{{Template:Rlegaspi}}

Rlegaspi Week 14

2015-12-08T07:35:40Z

Rlegaspi: /* Summary of Progress and Procedure */ Finished writing down my progress and procedure.

{{Heavy Metal HaterZ}}

= Goals for Week 14 =
== Data Preparation and Statistical Analysis ==
# Create a Master Raw Data file that contains the IDs and columns of data required for further analysis.
# Consult with Dr. Dahlquist on how to process the data (normalization, statistics)
# Perform the statistical analysis in Excel.
# Format the gene expression data for import into GenMAPP.

= Summary of Progress and Procedure =
== Compiling Raw Data and Statistical Analysis ==
=== December 1, 2015 through December 3, 2015 ===
Referencing the Week 12 Feedback provided by Dr. Dahlquist, I was able to begin compiling the raw data on single Excel File:
*Created an Excel File and named file ''Raw Data Shewanella RARL 20151201''
*Sheet 1 was entitled CompiledRawData Sheet:
**Column 1 = Gene ID
**Column 2 = MasterIndex (numbered from 1 to 11520)
**The rest of the columns was log data taken from the 0, 5, 20, and 60 time points respectively
***7 timepoints total (C0, C5, C20, C60, F5, F20, F60) and 4 replicates total; therefore, 28 total columns of data
*Created a MasterSheet and copied information from CompiledRawData Sheet into this new sheet
**Sorted the Gene ID's in alphabetical order (A-Z) and deleted the rows that contained an ID of '''Blank, blank, gDNA, NC-, or ORF''' resulting in the deletion of '''705 rows.'''
**Deleted the cells that contained the error message of <code>#NUM!</code> which resulted in the deletion of '''2,118 cells.'''
**Deleted the cells that contained the error message of <code>#DIV/0!</code> which resulted in the deletion of '''23 cells.'''
*Created a ScalingCentering Sheet
**Copied over data from the MasterSheet
**Added two rows right below the title row to represent the calculations for the Average and the Standard Deviation of each column
**For the Scaled and Centered Columns of data, typed the equation <code>=(C4-C$2)/C$3</code> in the first cell under scaled and centered column for replicate 1 at timepoint C0, and used Excel functions in order to scale and center the rest of the data with the equation as a template.
*Sent this file to Dr. Dahlquist to split the data to get rid of duplicates: [[File:Raw Data Shewanella RARL 20151201.xlsx]]
=== December 3, 2015 thru December 8, 2015 ===
Discrepancies and issues arose with data between my partner Emily Simso and I that were brought to our attention by Dr. Dahlquist; thus, a review of the compiled raw data needed to be done in order for our Excel Sheets to match and to continue on with statistical analysis:
*Repeated procedure from [[File:Raw Data Shewanella RARL 20151201.xlsx]]; however, feedback from Dr. Dahlquist was kept in mind and created a new Excel file called ''UpdatedCompiledRawData Shewanella RARL 20151201 HMH''
**Correct set of timepoints were used in my previous Excel file so no changes were needed to be done there
**Ensured that I had 11520 Gene IDs; in which the last row which had a "Gene ID" as its label was changed to the correct Gene ID that is "SO4357."
**Once all the necessary changes were made and I had touched base with my partner Emily on Sunday and Monday, I had uploaded the file for splitting by Dr. Dahlquist: [[File:UpdatedCompiledRawData Shewanella RARL 20151201 HMH.xlsx]]
*Split data was received and posted as a file on our team's file page by Dr. Dahlquist: [[UpdatedCompiledRawData_Shewanella_RARL_20151201_HMH_forsplitting.xlsx]]
**Downloaded this file and copied the sheets of data into a new Excel file entitled ''StatisticalAnalysis Shewanella RARL 20151207 HMH''
**Created a new sheet called Averages
***Averaged together the replicate data from the two spots that are now split and used the equation <code>=AVERAGE(C2,AG2)</code> under the column for C0 replicate 1
***Used excel to copy this equation to the entire column and get a derivative of the equation copied for the other columns of averages for each replicate
**Created a new sheet called Statistics
***Copied and pasted values from the Averages sheet into this new sheet
***Computed the average of the biological replicates for each treatment, biological average was calculated with the following equation for C0: <code>=AVERAGE(C2:F2)</code> and a derivative of this equation was used for every timepoint.
***Calculated the average log ratios of C5/C0, C20/C0, C60/C0, F5/C60, F20/C60, and F60/C60
****Since its in log space, I just needed to subtract the average from the C5 to the average from the C0.
***Performed a two-sample T test between C5 and C0, C20 and C0, C60 and C0, F5 and C60, and so on:
=TTEST(<range of cells containing the biological replicates for C0>, <range of cells containing the biological replicates for C5>, 2,3]
: This will returned the p value and uploaded the file to the team's file page to be reviewed by Dr. Dahlquist, while performing a sanity check: [[File:StatisticalAnalysis Shewanella RARL 20151207 HMH.xlsx]]

= External Links =
{{Template:Rlegaspi}}

Rlegaspi Week 14

2015-12-08T07:11:11Z

Rlegaspi: Inserted HMH template and my template with links.

{{Heavy Metal HaterZ}}

= Goals for Week 14 =
== Data Preparation and Statistical Analysis ==
# Create a Master Raw Data file that contains the IDs and columns of data required for further analysis.
# Consult with Dr. Dahlquist on how to process the data (normalization, statistics)
# Perform the statistical analysis in Excel.
# Format the gene expression data for import into GenMAPP.

= Summary of Progress and Procedure =
== Compiling Raw Data and Statistical Analysis ==
=== December 1, 2015 through December 3, 2015 ===
Referencing the Week 12 Feedback provided by Dr. Dahlquist, I was able to begin compiling the raw data on single Excel File:
*Created an Excel File and named file ''Raw Data Shewanella RARL 20151201''
*Sheet 1 was entitled CompiledRawData Sheet:
**Column 1 = Gene ID
**Column 2 = MasterIndex (numbered from 1 to 11520)
**The rest of the columns was log data taken from the 0, 5, 20, and 60 time points respectively
***7 timepoints total (C0, C5, C20, C60, F5, F20, F60) and 4 replicates total; therefore, 28 total columns of data
*Created a MasterSheet and copied information from CompiledRawData Sheet into this new sheet
**Sorted the Gene ID's in alphabetical order (A-Z) and deleted the rows that contained an ID of '''Blank, blank, gDNA, NC-, or ORF''' resulting in the deletion of '''705 rows.'''
**Deleted the cells that contained the error message of <code>#NUM!</code> which resulted in the deletion of '''2,118 cells.'''
**Deleted the cells that contained the error message of <code>#DIV/0!</code> which resulted in the deletion of '''23 cells.'''
*Created a ScalingCentering Sheet
**Copied over data from the MasterSheet
**Added two rows right below the title row to represent the calculations for the Average and the Standard Deviation of each column
**For the Scaled and Centered Columns of data, typed the equation <code>=(C4-C$2)/C$3</code> in the first cell under scaled and centered column for replicate 1 at timepoint C0, and used Excel functions in order to scale and center the rest of the data with the equation as a template.
*Sent this file to Dr. Dahlquist to split the data to get rid of duplicates: [[File:Raw Data Shewanella RARL 20151201.xlsx]]
=== December 3, 2015 thru December 8, 2015 ===
Discrepancies and issues arose with data between my partner Emily Simso and I that were brought to our attention by Dr. Dahlquist; thus, a review of the compiled raw data needed to be done in order for our Excel Sheets to match and to continue on with statistical analysis:
*Repeated procedure from [[File:Raw Data Shewanella RARL 20151201.xlsx]]; however, feedback from Dr. Dahlquist was kept in mind and created a new Excel file called ''UpdatedCompiledRawData Shewanella RARL 20151201 HMH''
**Correct set of timepoints were used in my previous Excel file so no changes were needed to be done there
**Ensured that I had 11520 Gene IDs; in which the last row which had a "Gene ID" as its label was changed to the correct Gene ID that is "SO4357."
**Once all the necessary changes were made and I had touched base with my partner Emily on Sunday and Monday, I had uploaded the file for splitting by Dr. Dahlquist: [[File:UpdatedCompiledRawData Shewanella RARL 20151201 HMH.xlsx]]
*Split data was received and posted as a file on our team's file page by Dr. Dahlquist: [[UpdatedCompiledRawData_Shewanella_RARL_20151201_HMH_forsplitting.xlsx]]
**Downloaded this file and copied the sheets of data into a new Excel file entitled ''StatisticalAnalysis Shewanella RARL 20151207 HMH''
**

[[File:StatisticalAnalysis Shewanella RARL 20151207 HMH.xlsx]]

You now need to do the following:
# Average together the replicate data from the two spots that are now split. This means that you need to average the "Log2FC-C0-rep1-scaledandcentered" in cell C2 with the value in cell AG2, for example.
# Copy and paste special > paste values into a new sheet called "statistics".
# Compute the average of the biological replicates for each treatment and timepoint. For example, average together all four biological replicates for Log2FC-C0. Repeat for each timepoint.
# Compute the ratio of the average log ratios so that you have values for the Average Log Ratio of C5/C0, C20/C0, C60/C0, F5/C60, F20/C60, and F60/C60). Since this is in log space, to take the ratio, you will actually subtract instead of divide.
# Perform a two-sample T test between C5 and C0, C20 and C0, C60 and C0, F5 and C60, etc. Use the equation:
=TTEST(<range of cells containing the biological replicates for C0>, <range of cells containing the biological replicates for C5>, 2,3]
: This will return the p value. Send me the link to the file at this point so I can check the results. You can also perform the sanity check. Let me know how it goes.

914 instances of error "#DIV/0!" replaced with a blank cell.

*# Sort the data A-->Z based on the "ID" column. Delete all rows that have an ID of "Blank", "blank", "gDNA", start with "NC-", start with "ORF". Record how many rows got deleted.
*# Some of your cells are going to have error messages in them because of the previous calculations you did. Find and replace all of these with nothing, record how many cells that is.
*# Create a new worksheet called "ScalingCentering" and copy and paste special all of your data into this new sheet. [http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae#Normalize_the_log_ratios_for_the_set_of_slides_in_the_experiment You will perform the scaling and centering operations like you did for the ''Vibrio cholerae'' data.]
*#* Once you have done this, e-mail Dr. Dahlquist and provide a link to your file. Your microarray data has duplicated spots. I have a script that will separate these out so that you can average them as technical replicates.
*# Create a new worksheet called "Statistics", copy and paste the values into this new sheet.
*# You will average the technical replicate spots for each sample to get one value for each sample.
*# You will average the biological replicates for each timepoint to get an average for each timepoint (C0, C5, C20, C60, F5, F20, F60).
*# You will take the ratio of the average log ratios so that you have values for the Average Log Ratio of C5/C0, C20/C0, C60/C0, F5/C60, F20/C60, and F60/C60). Since this is in log space, to take the ratio, you will actually subtract instead of divide.
*# You will perform a two-sample T test between C5 and C0, C20 and C0, C60 and C0, F5 and C60, etc. and perform the Bonferroni and Benjamini and Hochberg corrections on these p values. This computation is not the same as what we did for ''Vibrio''. Instead you will use the <code>TTEST</code> function in Excel (see me when you are ready to do this). The corrections will be the same as what you did before.
*# After that, then you will need to then format the data for GenMAPP and you'll be ready to import it into GenMAPP and run MAPPFinder.
* Let me know if you have any questions.

= External Links =
{{Template:Rlegaspi}}

Rlegaspi Week 14

2015-12-08T07:09:53Z

Rlegaspi: Summary of progress was updated to the point in which I acquired the split data; however, need to talk about what I did with data.

= Goals for Week 14 =
== Data Preparation and Statistical Analysis ==
# Create a Master Raw Data file that contains the IDs and columns of data required for further analysis.
# Consult with Dr. Dahlquist on how to process the data (normalization, statistics)
# Perform the statistical analysis in Excel.
# Format the gene expression data for import into GenMAPP.

= Summary of Progress and Procedure =
== Compiling Raw Data and Statistical Analysis ==
=== December 1, 2015 through December 3, 2015 ===
Referencing the Week 12 Feedback provided by Dr. Dahlquist, I was able to begin compiling the raw data on single Excel File:
*Created an Excel File and named file ''Raw Data Shewanella RARL 20151201''
*Sheet 1 was entitled CompiledRawData Sheet:
**Column 1 = Gene ID
**Column 2 = MasterIndex (numbered from 1 to 11520)
**The rest of the columns was log data taken from the 0, 5, 20, and 60 time points respectively
***7 timepoints total (C0, C5, C20, C60, F5, F20, F60) and 4 replicates total; therefore, 28 total columns of data
*Created a MasterSheet and copied information from CompiledRawData Sheet into this new sheet
**Sorted the Gene ID's in alphabetical order (A-Z) and deleted the rows that contained an ID of '''Blank, blank, gDNA, NC-, or ORF''' resulting in the deletion of '''705 rows.'''
**Deleted the cells that contained the error message of <code>#NUM!</code> which resulted in the deletion of '''2,118 cells.'''
**Deleted the cells that contained the error message of <code>#DIV/0!</code> which resulted in the deletion of '''23 cells.'''
*Created a ScalingCentering Sheet
**Copied over data from the MasterSheet
**Added two rows right below the title row to represent the calculations for the Average and the Standard Deviation of each column
**For the Scaled and Centered Columns of data, typed the equation <code>=(C4-C$2)/C$3</code> in the first cell under scaled and centered column for replicate 1 at timepoint C0, and used Excel functions in order to scale and center the rest of the data with the equation as a template.
*Sent this file to Dr. Dahlquist to split the data to get rid of duplicates: [[File:Raw Data Shewanella RARL 20151201.xlsx]]
=== December 3, 2015 thru December 8, 2015 ===
Discrepancies and issues arose with data between my partner Emily Simso and I that were brought to our attention by Dr. Dahlquist; thus, a review of the compiled raw data needed to be done in order for our Excel Sheets to match and to continue on with statistical analysis:
*Repeated procedure from [[File:Raw Data Shewanella RARL 20151201.xlsx]]; however, feedback from Dr. Dahlquist was kept in mind and created a new Excel file called ''UpdatedCompiledRawData Shewanella RARL 20151201 HMH''
**Correct set of timepoints were used in my previous Excel file so no changes were needed to be done there
**Ensured that I had 11520 Gene IDs; in which the last row which had a "Gene ID" as its label was changed to the correct Gene ID that is "SO4357."
**Once all the necessary changes were made and I had touched base with my partner Emily on Sunday and Monday, I had uploaded the file for splitting by Dr. Dahlquist: [[File:UpdatedCompiledRawData Shewanella RARL 20151201 HMH.xlsx]]
*Split data was received and posted as a file on our team's file page by Dr. Dahlquist: [[UpdatedCompiledRawData_Shewanella_RARL_20151201_HMH_forsplitting.xlsx]]
**Downloaded this file and copied the sheets of data into a new Excel file entitled ''StatisticalAnalysis Shewanella RARL 20151207 HMH''
**

[[File:StatisticalAnalysis Shewanella RARL 20151207 HMH.xlsx]]

You now need to do the following:
# Average together the replicate data from the two spots that are now split. This means that you need to average the "Log2FC-C0-rep1-scaledandcentered" in cell C2 with the value in cell AG2, for example.
# Copy and paste special > paste values into a new sheet called "statistics".
# Compute the average of the biological replicates for each treatment and timepoint. For example, average together all four biological replicates for Log2FC-C0. Repeat for each timepoint.
# Compute the ratio of the average log ratios so that you have values for the Average Log Ratio of C5/C0, C20/C0, C60/C0, F5/C60, F20/C60, and F60/C60). Since this is in log space, to take the ratio, you will actually subtract instead of divide.
# Perform a two-sample T test between C5 and C0, C20 and C0, C60 and C0, F5 and C60, etc. Use the equation:
=TTEST(<range of cells containing the biological replicates for C0>, <range of cells containing the biological replicates for C5>, 2,3]
: This will return the p value. Send me the link to the file at this point so I can check the results. You can also perform the sanity check. Let me know how it goes.

914 instances of error "#DIV/0!" replaced with a blank cell.

*# Sort the data A-->Z based on the "ID" column. Delete all rows that have an ID of "Blank", "blank", "gDNA", start with "NC-", start with "ORF". Record how many rows got deleted.
*# Some of your cells are going to have error messages in them because of the previous calculations you did. Find and replace all of these with nothing, record how many cells that is.
*# Create a new worksheet called "ScalingCentering" and copy and paste special all of your data into this new sheet. [http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae#Normalize_the_log_ratios_for_the_set_of_slides_in_the_experiment You will perform the scaling and centering operations like you did for the ''Vibrio cholerae'' data.]
*#* Once you have done this, e-mail Dr. Dahlquist and provide a link to your file. Your microarray data has duplicated spots. I have a script that will separate these out so that you can average them as technical replicates.
*# Create a new worksheet called "Statistics", copy and paste the values into this new sheet.
*# You will average the technical replicate spots for each sample to get one value for each sample.
*# You will average the biological replicates for each timepoint to get an average for each timepoint (C0, C5, C20, C60, F5, F20, F60).
*# You will take the ratio of the average log ratios so that you have values for the Average Log Ratio of C5/C0, C20/C0, C60/C0, F5/C60, F20/C60, and F60/C60). Since this is in log space, to take the ratio, you will actually subtract instead of divide.
*# You will perform a two-sample T test between C5 and C0, C20 and C0, C60 and C0, F5 and C60, etc. and perform the Bonferroni and Benjamini and Hochberg corrections on these p values. This computation is not the same as what we did for ''Vibrio''. Instead you will use the <code>TTEST</code> function in Excel (see me when you are ready to do this). The corrections will be the same as what you did before.
*# After that, then you will need to then format the data for GenMAPP and you'll be ready to import it into GenMAPP and run MAPPFinder.
* Let me know if you have any questions.

Heavy Metal HaterZ

2015-12-08T06:20:32Z

Rlegaspi: /* Week 14 Assignment */ Inserted reflection and status report.

{{Heavy Metal HaterZ}}
==Week 14 Assignment==
===Goals===
*Coder/QA
**Analyze the initial exports and make any necessary changes to the custom species profile to capture all of the IDs for your species
*GenMAPP Users
**Finish statistical analysis of compiled microarray data
**Prepare file for GenMAPP
===Status Report===
*Mary- I finished customizing the genMappBuilder and uploaded it onto this wiki so htat Josh could test to see if it works, which it does for the most part. A few genes aren't picked up that are not in the MOD but present elsewhere.
* Josh- Completed the export using the customized GenMAPP Builder from Mary. Checked the .gdb file and everything checked out. Used GenMAPP to see if the gene ID links worked and they did. Made more progress on the Gene Database Testing Report.
*Emily - I worked on manipulating the data to import it into GenMAPP and while Ron and I initially had some problems, I think we worked them out and are ready to continue this coming week.
*Ron - Worked on manipulation of the data along with Emily to prepare data for GenMAPP. Difficulties arose when it came to performing particular calculations and having data match with Emily's, but I think that they have been resolved with the help of Dr. Dahlquist; thus, we are closer to having a GenMAPP ready file.

====Reflections====
#What worked?
#What didn't work?
#What will I do next to fix what didn't work?

*Mary:
*#Dr.Dionisio's instructions were very clear so I was able to customize the genMappBuilder and it is working like it should.
*#There are still 11 "lost" genes that may need to be "found" somehow with my code, even though they are not located within the MOD.
*#First I need to determine if those genes are necessary for genMapp to catch, and then i will need to re-customize the code if so.

*Josh:
*# Our customized export seemed to work, since we got the 4196 count for Ordered Locus Names for which we were looking.
*#The 11 IDs we found that did not have gene tags in the XML file are a small issue for us. None of them exist in our model organism database, but 8 of them are present in our microarray data.
*#We are waiting on input from Professor Dahlquist regarding our next steps with these 11 IDs. Once we find out, we will act accordingly and possibly edit our code.

*Emily
*#It was very helpful to have Dr. Dahlquist's instructions for manipulating our data, so I was able to calculate the Pvalues and adjust them using the two tests.
*#Ron and I had to do a lot of work late this week to make our data match, but we worked together well and were able to solve the problems we encountered.
*#I will redo the Pvalues and the two adjustment calculations. Then I will get the data ready for GenMAPP.

*Ron
*#Dr. Dahlquist's instructions and feedback helped with ensuring the manipulation of the data was done accurately.
*#It was difficult to calculate the averages from the split data since the equations wouldn't copy down the entire column due to blank spaces within the data; in addition, calculating the p-values was difficult. In addition, there was a lot of work to be done since issues were encountered with the data and the equations.
*#Hopefully, after all the feedback and instructions Emily and I will be able to do all the necessary calculations and statistical analysis to have a file ready for GenMAPP by the end of this week.

==Week 12 Assignment==
===Goals===
*Coder/QA
**Prepare for journal club presentation
**Perform an initial Gene Database export and Gene Database Testing Report
*GenMAPP Users
**Compile the raw data in preparation for normalization and statistical analysis.
===Status Report===
*Emily: uploaded and formatted all microarray files after they were repleted with ferrous sulfate
*Mary: Prepared for genome paper journal club presentation. I also pushed the code from github onto a computer in the lab, which included downloading eclipse and git for windows on the lab computer.
*Josh: Prepared for genome paper presentation with Mary. Completed the initial import/export cycle and made significant progress on the Gene Database Testing Report.
*Ron: Similar to Emily, downloaded the microarray raw data files, followed the procedure given by Dr. Dahlquist for data processing (I worked with the files related to iron depletion with the iron chelator), and I uploaded the files to the wiki.

==Week 11 Assignment : Journal Club Presentation==
===Presentation Slides===
*These can also be accessed by going to our [[Heavy Metal HaterZ Files | Files]] page.

*[[File:Genome_Paper_Presentation_20151124_HMH.pptx]]
*[[File:SoMicroarrayPaperPresentation 20151117 HMH.pptx]]

===Goals===
*Prepare for journal club presentations
*Begin initial tasks on your research project
**Coder/QA
***Set up coding/testing environment
***Determine the regular expression for the ordered locus ID for your species
***Identify the appropriate model organism database for your species.
***Perform an initial Gene Database export and Gene Database Testing Report
**GenMAPP Users
***Describe the experimental design of the microarray data, including treatments, number of replicates (biological and/or technical), dye swaps.
***Determine the sample and data relationships, i.e., which files in the data correspond to which samples in the experimental design.
***Compile the raw data in preparation for normalization and statistical analysis.
===Status Report===
*Emily: worked on journal club presentation and created flow chart diagrams for the experimental design
*Mary: Completed journal club presentation slides with Josh. I downloaded eclipse on my personal laptop, so along with the use of the lab computers my coding/testing environment should be set up. I determined with Josh the regular expression of the ordered locus ID for our species. I was not able to, however, perform an initial export yet.
*Ron: Completed journal club presentation slides with Emily and uploaded slides in HMH Files pages. [[Media:SoMicroarrayPaperPresentation 20151117 HMH.pptx | Link to Microarray Paper Presentation here.]] Looked over sample and data relationships file from ArrayExpress entry (E-GEOD-15334) and converted .txt file into .xlsx file. I have not been able to compile raw data with Emily, as we still need clarification on which files are to be used for statistical analysis.
*Josh: Completed the genome paper presentation with Mary and did more research on our organism. Haven't done an initial import/export cycle yet. Planning to complete that later this week.

==Week 10 Assignment : Annotated Bibliography==

===Our Genome Paper===

Heidelberg, J. F., Paulsen, I. T., Nelson, K. E., Gaidos, E. J., Nelson, W. C., Read, T. D., ... & Fraser, C. M. (2002). Genome sequence of the dissimilatory metal ion–reducing bacterium Shewanella oneidensis. ''Nature biotechnology, 20''(11), 1118-1123. doi:10.1038/nbt749
*The [http://www.ncbi.nlm.nih.gov/pubmed/?term=Genome+sequence+of+the+dissimilatory+metal+ion%E2%80%93reducing+bacterium+Shewanella+oneidensis abstract] from PubMed.
*The full text of the article in PubMedCentral : Not available.
*The [http://www.nature.com/nbt/journal/v20/n11/full/nbt749.html full text] of the article from the publisher web site. 
*The [http://www.nature.com/nbt/journal/v20/n11/pdf/nbt749.pdf full PDF version] of the article from the publisher web site.
*Who owns the rights to the article?
**The Nature Publishing Group, which is the publisher of this article, according to this [https://s100.copyright.com/AppDispatchServlet?publisherName=NPG&publication=Nature%20Biotechnology&title=Genome%20sequence%20of%20the%20dissimilatory%20metal%20ion-reducing%20bacterium%20Shewanella%20oneidensis&author=John%20F.%20Heidelberg,%20Ian%20T.%20Paulsen,%20Karen%20E.%20Nelson,%20Eric%20J.%20Gaidos,%20William%20C.%20Nelson%20et%20al.&contentID=10.1038/nbt749&publicationDate=10/07/2002&volumeNum=20&issueNum=11&numPages=6&pageNumbers=pp1118-1123 site].
*Do the authors own the rights under a Creative Commons license?
**Yes, according to this [http://oaspa.org/member/nature-publishing-group-palgrave-macmillan/ site].
*Is the article available “Open Access”?
**According to [http://oaspa.org/membership/members/ this site], the article is available "Open Access".
*What organization is the publisher of the article? What type of organization is it?
**According to the site above, this publisher is a "Professional OA Publisher (Large)".
*Is this article available in print or online only?
**Online only. It was published online in November, 2002.
*Has LMU paid a subscription or other fee for your access to this article?
**No.
*We performed a search in the ISI Web of Science/Knowledge database by typing in the title "Genome sequence of the dissimilatory metal ion–reducing bacterium Shewanella oneidensis" to the search bar.
**Three articles came up as results. The first two articles title's did not exactly match, and were cited under 15 times each. The third article was the article we were searching for.
*How many articles does this article cite?
**This article has 41 cited references within the Web of Science Core Collection, according to this [https://apps.webofknowledge.com/full_record.do?product=UA&search_mode=GeneralSearch&qid=3&SID=3Evs6J6HvCojNOHG6K3&page=1&doc=3 site].
*How many articles cite this article?
**It has been cited 1079 times in all databases, and 426 within the Web of Science Core Collection, according to this [https://apps.webofknowledge.com/full_record.do?product=UA&search_mode=GeneralSearch&qid=3&SID=3Evs6J6HvCojNOHG6K3&page=1&doc=3 site].
*Based on the titles and abstracts of the papers, what type of research directions have been taken now that the genome for that organism has been sequenced?
**Examples of titles that reference the genome paper:
***Environmental genome shotgun sequencing of the Sargasso Sea
***Deciphering the evolution and metabolism of an anammox bacterium from a community genome
***Genome of Geobacter sulfurreducens: Metal reduction in subsurface environments
***More can be found by clicking this [https://apps.webofknowledge.com/summary.do?product=WOS&parentProduct=UA&search_mode=CitingArticles&qid=8&SID=3Evs6J6HvCojNOHG6K3&page=1&action=sort&sortBy=LC.D;PY.D;AU.A.en;SO.A.en;VL.D;PG.A&showFirstPage=1 link].
**These papers include studying within in the species, finding out the genomes of other species, as well as the metabolic versatility of microorganisms and metal ion reduction in environments. This shows that a sequenced genome can aide in experiments of many kinds.

===Our Microarray Paper===
*Dataset can be found at this [https://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-15334/?keywords=&organism=Shewanella+oneidensis&exptype%5B%5D=%22rna+assay%22&exptype%5B%5D=%22array+assay%22&array= link].


====E-GEOD-15334: Yang et. al (2009)====

This paper is suitable for your project. ''— [[User:Kdahlquist|Kdahlquist]] ([[User talk:Kdahlquist|talk]]) 09:41, 10 November 2015 (PST)''

Yang, Y., Harris, D. P., Luo, F., Xiong, W., Joachimiak, M., Wu, L., ... & Zhou, J. (2009). Snapshot of iron response in Shewanella oneidensis by gene network reconstruction. ''BMC genomics, 10''(1), 131.
*The link to the abstract from [http://www.ncbi.nlm.nih.gov/pubmed/?term=Yang%2C+Y.%2C+Harris%2C+D.+P.%2C+Luo%2C+F.%2C+Xiong%2C+W.%2C+Joachimiak%2C+M.%2C+Wu%2C+L.%2C+...+%26+Zhou%2C+J.+%282009%29.+Snapshot+of+iron+response+in+Shewanella+oneidensis+by+gene+network+reconstruction.+BMC+genomics%2C+10%281%29%2C+131. PubMed].
*The link to the full text of the article in [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2667191/ PubMedCentral]
*The link to the full text of the article (HTML format) from the publisher [http://www.biomedcentral.com/1471-2164/10/131 web site].
*The link to the full [http://www.biomedcentral.com/content/pdf/1471-2164-10-131.pdf PDF] version of the article from the publisher web site.
*Who owns the rights to the article?
**The article is Open Access and the authors own the rights under a Creative Commons license.
*What organization is the publisher of the article? What type of organization is it?
**BMC Genomics is the publisher, which is a scientific society
*Is this article available in print or online only?
**It is online only
*Has LMU paid a subscription or other fee for your access to this article?
**No
*How many articles does this article cite?
**This paper sites 48 other articles
*How many articles cite this article?
**3
***Roles of UndA and MtrC of ''Shewanella putrefaciens'' W3-18-1 in iron reduction
***Global transcriptional response of ''Caulobacter crescentus'' to iron availability
***Molecular ecological network analysis
*Based on the titles and abstracts of the papers, what type of research directions have been taken now that the genome for that organism has been sequenced?
**This article has mostly been used to look at the iron response of other strains or organisms. It may have been used for comparison's sake or to modify the original methodology to fit the new experiment.
*Link to microarray data
**Found it on [https://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-15334/ ArrayExpress]
**This contains the raw data that we will use for our research
*What experiment was performed? What was the "treatment" and what was the "control" in the experiment?
**Strains of ''Shewanella oneidensis'' were put under iron depletion and repletion conditions. The control would be a regular strain of the organism, while the treatments would be either increasing or decreasing the iron levels.
*Were replicate experiments of the "treatment" and "control" conditions conducted? Were these biological or technical replicates? How many of each?
**4 biological replicates of each treatment condition were performed

File:StatisticalAnalysis Shewanella RARL 20151207 HMH.xlsx

2015-12-08T02:41:59Z

Rlegaspi: Rlegaspi uploaded a new version of File:StatisticalAnalysis Shewanella RARL 20151207 HMH.xlsx

Heavy Metal HaterZ Files

2015-12-08T02:35:49Z

Rlegaspi: /* Statistical Analysis Excel Sheets */ Files containing averages of the data and ttests done to the data.

==All Files==
*All files will be listed here.
*Appropriate way to title files:
**FileName_YYYYMMDD_HMH

===Initial Flow Chart===
*Initial flow chart for experimental design - [[File:Experimental Design Flow Chart 20151115 HMH.pptx]]

===Journal Club Presentation Power Points===
*Microarray Paper Presentation - [[File:SoMicroarrayPaperPresentation 20151117 HMH.pptx]]
*Genome Paper Presentation - [[File:GenomePPT_20151123_HMH.pdf]]

===Data Processing Notes from Dr. Dahlquist===
*Page 1 of Notes - [[Media:DrDDataProcessNotes1 20151119 HMH.JPG]]
*Page 2 of Notes - [[Media:DrDDataProcessNotes2 20151119 HMH.JPG]]

===GenMapp Builder===
*[[File:ShewanellaOneidensisGMBuilder_20151201_HMH.zip]]

===Statistical Analysis Excel Sheets===
*Prior to Splitting:
*#[[File:UpdatedCompiledRawData Shewanella RARL 20151201 HMH.xlsx]]
*#[[File:UpdatedCompiledRawData 20151206 HMH.xlsx]]
* After splitting, use this one:
*# [[Media:UpdatedCompiledRawData_Shewanella_RARL_20151201_HMH_forsplitting.xlsx | UpdatedCompiledRawData_Shewanella_RARL_20151201_HMH_forsplitting.xlsx]]
*Following the splitting, averages taken and ttests done to data sets:
*#[[File:StatisticalAnalysis Shewanella RARL 20151207 HMH.xlsx]]
*#[[File:UpdatedCompiledRawData Shewanella RARL 20151201 ES HMH forsplitting.xlsx]]

[[Category:Heavy Metal HaterZ]]
[[Category:Group Projects]]

File:StatisticalAnalysis Shewanella RARL 20151207 HMH.xlsx

2015-12-08T02:32:34Z

Rlegaspi:

Rlegaspi Week 14

2015-12-08T02:22:52Z

Rlegaspi: Copying and pasting procedure from Dr. D.

= Goals for Week 14 =
== Milestone 2: Data Preparation ==
# Create a Master Raw Data file that contains the IDs and columns of data required for further analysis.
# Consult with Dr. Dahlquist on how to process the data (normalization, statistics)
# Perform the statistical analysis in Excel.
# Format the gene expression data for import into GenMAPP.
# Import data into GenMAPP, create ColorSets, and run MAPPFinder.
# Document and take notes on test runs with GenMAPP.
#* Use the ''EX.txt'' file to help the Coder/Quality Assurance team members to validate the ''.gdb''.
# Do a journal club outline of the paper so that you can use it in the Discussion section of your group report and your final presentation.
# Create a ''.mapp'' file showing one pathway that is changed in your data.

= Summary of Progress =
==Compiling Raw Data and Statistical Analysis==
Referencing the Week 12 Feedback provided by Dr. Dahlquist, I was able to begin compiling the raw data on single data sheet

*Created a CompiledRawData Sheet.
*Created a MasterSheet and deleted data with GeneID containing the following:
*#Number of deletions: 705
*Found the error message <code>#NUM!</code> and replaced with a blank space ("nothing") and there were 2118 replacements made.
*Found the error message <code>#DIV/0!</code> and replaced with blank space ("nothing") and there were 23 replacements made.

File was sent to Dr. Dahlquist: [[File:Raw Data Shewanella RARL 20151201.xlsx]]

== Week 12 Feedback ==

* In light of the effort that it took to get to this point and to cut down on the additional workload that your particular dataset entails, from this point forward, let's cut out two of the timepoints for the depletion and repletion experiments (ignoring the 10 minute and 40 minute timepoints and keeping the 0, 5, 20, and 60 minute timepoints).
*# All of your calculations at this point exist as individual files; you need to compile all of the log ratios you computed into a single file. In a new workbook, name the first sheet "CompiledRawData". Name Column A "ID" and copy and paste in the list of IDs from one of your files.
*# Create a "MasterIndex" column as follows. Insert a new column to the right of the "ID" column and name it "MasterIndex". In this column you will create a numerical index of genes so that you can always sort them back into the same order that they started out in.
*#* Type a "1" in cell B2 and a "2" in cell B3.
*#* Select both cells. Hover your mouse over the bottom-right corner of the selection until it makes a thin black + sign. Double-click on the + sign to fill the entire column with a series of numbers from 1 to 11520 (the number of spots on the microarray).
*# Then you need to copy and paste (values) the "log2" column from your individual files. They should be in order Log2FC-CO-rep1, Log2FC-CO-rep2, Log2FC-C0-rep3, Log2FC-C0-rep4, Log2FC-C5-rep1, etc.
*# The next set of manipulations should be performed in a new sheet called "MasterSheet".
*# Sort the data A-->Z based on the "ID" column. Delete all rows that have an ID of "Blank", "blank", "gDNA", start with "NC-", start with "ORF". Record how many rows got deleted.
*# Some of your cells are going to have error messages in them because of the previous calculations you did. Find and replace all of these with nothing, record how many cells that is.
*# Create a new worksheet called "ScalingCentering" and copy and paste special all of your data into this new sheet. [http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae#Normalize_the_log_ratios_for_the_set_of_slides_in_the_experiment You will perform the scaling and centering operations like you did for the ''Vibrio cholerae'' data.]
*#* Once you have done this, e-mail Dr. Dahlquist and provide a link to your file. Your microarray data has duplicated spots. I have a script that will separate these out so that you can average them as technical replicates.
*# Create a new worksheet called "Statistics", copy and paste the values into this new sheet.
*# You will average the technical replicate spots for each sample to get one value for each sample.
*# You will average the biological replicates for each timepoint to get an average for each timepoint (C0, C5, C20, C60, F5, F20, F60).
*# You will take the ratio of the average log ratios so that you have values for the Average Log Ratio of C5/C0, C20/C0, C60/C0, F5/C60, F20/C60, and F60/C60). Since this is in log space, to take the ratio, you will actually subtract instead of divide.
*# You will perform a two-sample T test between C5 and C0, C20 and C0, C60 and C0, F5 and C60, etc. and perform the Bonferroni and Benjamini and Hochberg corrections on these p values. This computation is not the same as what we did for ''Vibrio''. Instead you will use the <code>TTEST</code> function in Excel (see me when you are ready to do this). The corrections will be the same as what you did before.
*# After that, then you will need to then format the data for GenMAPP and you'll be ready to import it into GenMAPP and run MAPPFinder.
* Let me know if you have any questions.

== 2015-12-07 ==

* A new file with the split data has been uploaded to your team's files page: UpdatedCompiledRawData_Shewanella_RARL_20151201_HMH_forsplitting.xlsx
** Note that this file is based on "UpdatedCompiledRawData_Shewanella_RARL_20151201_HMH.xlsx". I still found an error in the other version of the file that there was a gene called "Gene ID" on the CompiledRawData sheet. This led to a missing gene on the MasterSheet and a discrepancy in the data for the scaling and centering between the two files in the fourth decimal place.

You now need to do the following:
# Average together the replicate data from the two spots that are now split. This means that you need to average the "Log2FC-C0-rep1-scaledandcentered" in cell C2 with the value in cell AG2, for example.
# Copy and paste special > paste values into a new sheet called "statistics".
# Compute the average of the biological replicates for each treatment and timepoint. For example, average together all four biological replicates for Log2FC-C0. Repeat for each timepoint.
# Compute the ratio of the average log ratios so that you have values for the Average Log Ratio of C5/C0, C20/C0, C60/C0, F5/C60, F20/C60, and F60/C60). Since this is in log space, to take the ratio, you will actually subtract instead of divide.
# Perform a two-sample T test between C5 and C0, C20 and C0, C60 and C0, F5 and C60, etc. Use the equation:
=TTEST(<range of cells containing the biological replicates for C0>, <range of cells containing the biological replicates for C5>, 2,3]
: This will return the p value. Send me the link to the file at this point so I can check the results. You can also perform the sanity check. Let me know how it goes.

914 instances of error "#DIV/0!" replaced with a blank cell.

Heavy Metal HaterZ Files

2015-12-07T00:44:38Z

Rlegaspi: /* Statistical Analysis Excel Sheets */ Updated file linked.

File:UpdatedCompiledRawData Shewanella RARL 20151201 HMH.xlsx

2015-12-07T00:44:02Z

Rlegaspi:

Heavy Metal HaterZ Files

2015-12-07T00:24:07Z

Rlegaspi: /* All Files */ Added statistical analysis sheets.

Rlegaspi Week 14

2015-12-07T00:19:48Z

Rlegaspi: /* Compiling Raw Data and Statistical Analysis */ Change link to file.

{{Heavy Metal HaterZ}}

= GenMAPP User Milestones in relation to Week 14 =
== Milestone 2: Data Preparation ==
# Create a Master Raw Data file that contains the IDs and columns of data required for further analysis.
# Consult with Dr. Dahlquist on how to process the data (normalization, statistics)

== Milestone 3: On-going Analysis Cycle ==
# Perform the statistical analysis in Excel.
# Format the gene expression data for import into GenMAPP.
# Import data into GenMAPP, create ColorSets, and run MAPPFinder.
# Document and take notes on test runs with GenMAPP.
#* Use the ''EX.txt'' file to help the Coder/Quality Assurance team members to validate the ''.gdb''.
# Do a journal club outline of the paper so that you can use it in the Discussion section of your group report and your final presentation.
# Create a ''.mapp'' file showing one pathway that is changed in your data.

= Summary of Progress =
==Compiling Raw Data and Statistical Analysis==
Referencing the Week 12 Feedback provided by Dr. Dahlquist, I was able to begin compiling the raw data on single data sheet

*Created a CompiledRawData Sheet.
*Created a MasterSheet and deleted data with GeneID containing the following:
*#Number of deletions: 705
*Found the error message <code>#NUM!</code> and replaced with a blank space ("nothing") and there were 2118 replacements made.
*Found the error message <code>#DIV/0!</code> and replaced with blank space ("nothing") and there were 23 replacements made.

File was sent to Dr. Dahlquist: [[File:Raw Data Shewanella RARL 20151201.xlsx]]

== Week 12 Feedback ==

* In light of the effort that it took to get to this point and to cut down on the additional workload that your particular dataset entails, from this point forward, let's cut out two of the timepoints for the depletion and repletion experiments (ignoring the 10 minute and 40 minute timepoints and keeping the 0, 5, 20, and 60 minute timepoints).
*# All of your calculations at this point exist as individual files; you need to compile all of the log ratios you computed into a single file. In a new workbook, name the first sheet "CompiledRawData". Name Column A "ID" and copy and paste in the list of IDs from one of your files.
*# Create a "MasterIndex" column as follows. Insert a new column to the right of the "ID" column and name it "MasterIndex". In this column you will create a numerical index of genes so that you can always sort them back into the same order that they started out in.
*#* Type a "1" in cell B2 and a "2" in cell B3.
*#* Select both cells. Hover your mouse over the bottom-right corner of the selection until it makes a thin black + sign. Double-click on the + sign to fill the entire column with a series of numbers from 1 to 11520 (the number of spots on the microarray).
*# Then you need to copy and paste (values) the "log2" column from your individual files. They should be in order Log2FC-CO-rep1, Log2FC-CO-rep2, Log2FC-C0-rep3, Log2FC-C0-rep4, Log2FC-C5-rep1, etc.
*# The next set of manipulations should be performed in a new sheet called "MasterSheet".
*# Sort the data A-->Z based on the "ID" column. Delete all rows that have an ID of "Blank", "blank", "gDNA", start with "NC-", start with "ORF". Record how many rows got deleted.
*# Some of your cells are going to have error messages in them because of the previous calculations you did. Find and replace all of these with nothing, record how many cells that is.
*# Create a new worksheet called "ScalingCentering" and copy and paste special all of your data into this new sheet. [http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae#Normalize_the_log_ratios_for_the_set_of_slides_in_the_experiment You will perform the scaling and centering operations like you did for the ''Vibrio cholerae'' data.]
*#* Once you have done this, e-mail Dr. Dahlquist and provide a link to your file. Your microarray data has duplicated spots. I have a script that will separate these out so that you can average them as technical replicates.
*# Create a new worksheet called "Statistics", copy and paste the values into this new sheet.
*# You will average the technical replicate spots for each sample to get one value for each sample.
*# You will average the biological replicates for each timepoint to get an average for each timepoint (C0, C5, C20, C60, F5, F20, F60).
*# You will take the ratio of the average log ratios so that you have values for the Average Log Ratio of C5/C0, C20/C0, C60/C0, F5/C60, F20/C60, and F60/C60). Since this is in log space, to take the ratio, you will actually subtract instead of divide.
*# You will perform a two-sample T test between C5 and C0, C20 and C0, C60 and C0, F5 and C60, etc. and perform the Bonferroni and Benjamini and Hochberg corrections on these p values. This computation is not the same as what we did for ''Vibrio''. Instead you will use the <code>TTEST</code> function in Excel (see me when you are ready to do this). The corrections will be the same as what you did before.
*# After that, then you will need to then format the data for GenMAPP and you'll be ready to import it into GenMAPP and run MAPPFinder.
* Let me know if you have any questions.

''— [[User:Kdahlquist|Kdahlquist]] ([[User talk:Kdahlquist|talk]]) 13:34, 24 November 2015 (PST)''

Rlegaspi Week 14

2015-12-03T23:52:36Z

Rlegaspi: Changing headers and new section entitled summary of progress

{{Heavy Metal HaterZ}}

= GenMAPP User Milestones in relation to Week 14 =
== Milestone 2: Data Preparation ==
# Create a Master Raw Data file that contains the IDs and columns of data required for further analysis.
# Consult with Dr. Dahlquist on how to process the data (normalization, statistics)

== Milestone 3: On-going Analysis Cycle ==
# Perform the statistical analysis in Excel.
# Format the gene expression data for import into GenMAPP.
# Import data into GenMAPP, create ColorSets, and run MAPPFinder.
# Document and take notes on test runs with GenMAPP.
#* Use the ''EX.txt'' file to help the Coder/Quality Assurance team members to validate the ''.gdb''.
# Do a journal club outline of the paper so that you can use it in the Discussion section of your group report and your final presentation.
# Create a ''.mapp'' file showing one pathway that is changed in your data.

= Summary of Progress =
==Compiling Raw Data and Statistical Analysis==
Referencing the Week 12 Feedback provided by Dr. Dahlquist, I was able to begin compiling the raw data on single data sheet

*Created a CompiledRawData Sheet.
*Created a MasterSheet and deleted data with GeneID containing the following:
*#Number of deletions: 705
*Found the error message <code>#NUM!</code> and replaced with a blank space ("nothing") and there were 2118 replacements made.
*Found the error message <code>#DIV/0!</code> and replaced with blank space ("nothing") and there were 23 replacements made.

File was sent to Dr. Dahlquist: [[Media:Raw Data Shewanella RARL 20151201.xlsx]]

== Week 12 Feedback ==

* In light of the effort that it took to get to this point and to cut down on the additional workload that your particular dataset entails, from this point forward, let's cut out two of the timepoints for the depletion and repletion experiments (ignoring the 10 minute and 40 minute timepoints and keeping the 0, 5, 20, and 60 minute timepoints).
*# All of your calculations at this point exist as individual files; you need to compile all of the log ratios you computed into a single file. In a new workbook, name the first sheet "CompiledRawData". Name Column A "ID" and copy and paste in the list of IDs from one of your files.
*# Create a "MasterIndex" column as follows. Insert a new column to the right of the "ID" column and name it "MasterIndex". In this column you will create a numerical index of genes so that you can always sort them back into the same order that they started out in.
*#* Type a "1" in cell B2 and a "2" in cell B3.
*#* Select both cells. Hover your mouse over the bottom-right corner of the selection until it makes a thin black + sign. Double-click on the + sign to fill the entire column with a series of numbers from 1 to 11520 (the number of spots on the microarray).
*# Then you need to copy and paste (values) the "log2" column from your individual files. They should be in order Log2FC-CO-rep1, Log2FC-CO-rep2, Log2FC-C0-rep3, Log2FC-C0-rep4, Log2FC-C5-rep1, etc.
*# The next set of manipulations should be performed in a new sheet called "MasterSheet".
*# Sort the data A-->Z based on the "ID" column. Delete all rows that have an ID of "Blank", "blank", "gDNA", start with "NC-", start with "ORF". Record how many rows got deleted.
*# Some of your cells are going to have error messages in them because of the previous calculations you did. Find and replace all of these with nothing, record how many cells that is.
*# Create a new worksheet called "ScalingCentering" and copy and paste special all of your data into this new sheet. [http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae#Normalize_the_log_ratios_for_the_set_of_slides_in_the_experiment You will perform the scaling and centering operations like you did for the ''Vibrio cholerae'' data.]
*#* Once you have done this, e-mail Dr. Dahlquist and provide a link to your file. Your microarray data has duplicated spots. I have a script that will separate these out so that you can average them as technical replicates.
*# Create a new worksheet called "Statistics", copy and paste the values into this new sheet.
*# You will average the technical replicate spots for each sample to get one value for each sample.
*# You will average the biological replicates for each timepoint to get an average for each timepoint (C0, C5, C20, C60, F5, F20, F60).
*# You will take the ratio of the average log ratios so that you have values for the Average Log Ratio of C5/C0, C20/C0, C60/C0, F5/C60, F20/C60, and F60/C60). Since this is in log space, to take the ratio, you will actually subtract instead of divide.
*# You will perform a two-sample T test between C5 and C0, C20 and C0, C60 and C0, F5 and C60, etc. and perform the Bonferroni and Benjamini and Hochberg corrections on these p values. This computation is not the same as what we did for ''Vibrio''. Instead you will use the <code>TTEST</code> function in Excel (see me when you are ready to do this). The corrections will be the same as what you did before.
*# After that, then you will need to then format the data for GenMAPP and you'll be ready to import it into GenMAPP and run MAPPFinder.
* Let me know if you have any questions.

''— [[User:Kdahlquist|Kdahlquist]] ([[User talk:Kdahlquist|talk]]) 13:34, 24 November 2015 (PST)''

Heavy Metal HaterZ

2015-12-03T23:25:44Z

Rlegaspi: Goals for the week.

{{Heavy Metal HaterZ}}
==Week 14 Assignment==
===Goals===
*Coder/QA
**Insert your goals here.
*GenMAPP Users
**Finish statistical analysis of compiled microarray data
**Prepare file for GenMAPP
===Status Report===

==Week 12 Assignment==
===Goals===
*Coder/QA
**Prepare for journal club presentation
**Perform an initial Gene Database export and Gene Database Testing Report
*GenMAPP Users
**Compile the raw data in preparation for normalization and statistical analysis.
===Status Report===
*Emily: uploaded and formatted all microarray files after they were repleted with ferrous sulfate
*Mary: Prepared for genome paper journal club presentation. I also pushed the code from github onto a computer in the lab, which included downloading eclipse and git for windows on the lab computer.
*Josh: Prepared for genome paper presentation with Mary. Completed the initial import/export cycle and made significant progress on the Gene Database Testing Report.
*Ron: Similar to Emily, downloaded the microarray raw data files, followed the procedure given by Dr. Dahlquist for data processing (I worked with the files related to iron depletion with the iron chelator), and I uploaded the files to the wiki.

==Week 11 Assignment : Journal Club Presentation==
===Presentation Slides===
*These can also be accessed by going to our [[Heavy Metal HaterZ Files | Files]] page.

*[[File:Genome_Paper_Presentation_20151124_HMH.pptx]]
*[[File:SoMicroarrayPaperPresentation 20151117 HMH.pptx]]

===Goals===
*Prepare for journal club presentations
*Begin initial tasks on your research project
**Coder/QA
***Set up coding/testing environment
***Determine the regular expression for the ordered locus ID for your species
***Identify the appropriate model organism database for your species.
***Perform an initial Gene Database export and Gene Database Testing Report
**GenMAPP Users
***Describe the experimental design of the microarray data, including treatments, number of replicates (biological and/or technical), dye swaps.
***Determine the sample and data relationships, i.e., which files in the data correspond to which samples in the experimental design.
***Compile the raw data in preparation for normalization and statistical analysis.
===Status Report===
*Emily: worked on journal club presentation and created flow chart diagrams for the experimental design
*Mary: Completed journal club presentation slides with Josh. I downloaded eclipse on my personal laptop, so along with the use of the lab computers my coding/testing environment should be set up. I determined with Josh the regular expression of the ordered locus ID for our species. I was not able to, however, perform an initial export yet.
*Ron: Completed journal club presentation slides with Emily and uploaded slides in HMH Files pages. [[Media:SoMicroarrayPaperPresentation 20151117 HMH.pptx | Link to Microarray Paper Presentation here.]] Looked over sample and data relationships file from ArrayExpress entry (E-GEOD-15334) and converted .txt file into .xlsx file. I have not been able to compile raw data with Emily, as we still need clarification on which files are to be used for statistical analysis.
*Josh: Completed the genome paper presentation with Mary and did more research on our organism. Haven't done an initial import/export cycle yet. Planning to complete that later this week.

==Week 10 Assignment : Annotated Bibliography==

===Our Genome Paper===

Heidelberg, J. F., Paulsen, I. T., Nelson, K. E., Gaidos, E. J., Nelson, W. C., Read, T. D., ... & Fraser, C. M. (2002). Genome sequence of the dissimilatory metal ion–reducing bacterium Shewanella oneidensis. ''Nature biotechnology, 20''(11), 1118-1123. doi:10.1038/nbt749
*The [http://www.ncbi.nlm.nih.gov/pubmed/?term=Genome+sequence+of+the+dissimilatory+metal+ion%E2%80%93reducing+bacterium+Shewanella+oneidensis abstract] from PubMed.
*The full text of the article in PubMedCentral : Not available.
*The [http://www.nature.com/nbt/journal/v20/n11/full/nbt749.html full text] of the article from the publisher web site. 
*The [http://www.nature.com/nbt/journal/v20/n11/pdf/nbt749.pdf full PDF version] of the article from the publisher web site.
*Who owns the rights to the article?
**The Nature Publishing Group, which is the publisher of this article, according to this [https://s100.copyright.com/AppDispatchServlet?publisherName=NPG&publication=Nature%20Biotechnology&title=Genome%20sequence%20of%20the%20dissimilatory%20metal%20ion-reducing%20bacterium%20Shewanella%20oneidensis&author=John%20F.%20Heidelberg,%20Ian%20T.%20Paulsen,%20Karen%20E.%20Nelson,%20Eric%20J.%20Gaidos,%20William%20C.%20Nelson%20et%20al.&contentID=10.1038/nbt749&publicationDate=10/07/2002&volumeNum=20&issueNum=11&numPages=6&pageNumbers=pp1118-1123 site].
*Do the authors own the rights under a Creative Commons license?
**Yes, according to this [http://oaspa.org/member/nature-publishing-group-palgrave-macmillan/ site].
*Is the article available “Open Access”?
**According to [http://oaspa.org/membership/members/ this site], the article is available "Open Access".
*What organization is the publisher of the article? What type of organization is it?
**According to the site above, this publisher is a "Professional OA Publisher (Large)".
*Is this article available in print or online only?
**Online only. It was published online in November, 2002.
*Has LMU paid a subscription or other fee for your access to this article?
**No.
*We performed a search in the ISI Web of Science/Knowledge database by typing in the title "Genome sequence of the dissimilatory metal ion–reducing bacterium Shewanella oneidensis" to the search bar.
**Three articles came up as results. The first two articles title's did not exactly match, and were cited under 15 times each. The third article was the article we were searching for.
*How many articles does this article cite?
**This article has 41 cited references within the Web of Science Core Collection, according to this [https://apps.webofknowledge.com/full_record.do?product=UA&search_mode=GeneralSearch&qid=3&SID=3Evs6J6HvCojNOHG6K3&page=1&doc=3 site].
*How many articles cite this article?
**It has been cited 1079 times in all databases, and 426 within the Web of Science Core Collection, according to this [https://apps.webofknowledge.com/full_record.do?product=UA&search_mode=GeneralSearch&qid=3&SID=3Evs6J6HvCojNOHG6K3&page=1&doc=3 site].
*Based on the titles and abstracts of the papers, what type of research directions have been taken now that the genome for that organism has been sequenced?
**Examples of titles that reference the genome paper:
***Environmental genome shotgun sequencing of the Sargasso Sea
***Deciphering the evolution and metabolism of an anammox bacterium from a community genome
***Genome of Geobacter sulfurreducens: Metal reduction in subsurface environments
***More can be found by clicking this [https://apps.webofknowledge.com/summary.do?product=WOS&parentProduct=UA&search_mode=CitingArticles&qid=8&SID=3Evs6J6HvCojNOHG6K3&page=1&action=sort&sortBy=LC.D;PY.D;AU.A.en;SO.A.en;VL.D;PG.A&showFirstPage=1 link].
**These papers include studying within in the species, finding out the genomes of other species, as well as the metabolic versatility of microorganisms and metal ion reduction in environments. This shows that a sequenced genome can aide in experiments of many kinds.

===Our Microarray Paper===
*Dataset can be found at this [https://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-15334/?keywords=&organism=Shewanella+oneidensis&exptype%5B%5D=%22rna+assay%22&exptype%5B%5D=%22array+assay%22&array= link].


====E-GEOD-15334: Yang et. al (2009)====

This paper is suitable for your project. ''— [[User:Kdahlquist|Kdahlquist]] ([[User talk:Kdahlquist|talk]]) 09:41, 10 November 2015 (PST)''

Yang, Y., Harris, D. P., Luo, F., Xiong, W., Joachimiak, M., Wu, L., ... & Zhou, J. (2009). Snapshot of iron response in Shewanella oneidensis by gene network reconstruction. ''BMC genomics, 10''(1), 131.
*The link to the abstract from [http://www.ncbi.nlm.nih.gov/pubmed/?term=Yang%2C+Y.%2C+Harris%2C+D.+P.%2C+Luo%2C+F.%2C+Xiong%2C+W.%2C+Joachimiak%2C+M.%2C+Wu%2C+L.%2C+...+%26+Zhou%2C+J.+%282009%29.+Snapshot+of+iron+response+in+Shewanella+oneidensis+by+gene+network+reconstruction.+BMC+genomics%2C+10%281%29%2C+131. PubMed].
*The link to the full text of the article in [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2667191/ PubMedCentral]
*The link to the full text of the article (HTML format) from the publisher [http://www.biomedcentral.com/1471-2164/10/131 web site].
*The link to the full [http://www.biomedcentral.com/content/pdf/1471-2164-10-131.pdf PDF] version of the article from the publisher web site.
*Who owns the rights to the article?
**The article is Open Access and the authors own the rights under a Creative Commons license.
*What organization is the publisher of the article? What type of organization is it?
**BMC Genomics is the publisher, which is a scientific society
*Is this article available in print or online only?
**It is online only
*Has LMU paid a subscription or other fee for your access to this article?
**No
*How many articles does this article cite?
**This paper sites 48 other articles
*How many articles cite this article?
**3
***Roles of UndA and MtrC of ''Shewanella putrefaciens'' W3-18-1 in iron reduction
***Global transcriptional response of ''Caulobacter crescentus'' to iron availability
***Molecular ecological network analysis
*Based on the titles and abstracts of the papers, what type of research directions have been taken now that the genome for that organism has been sequenced?
**This article has mostly been used to look at the iron response of other strains or organisms. It may have been used for comparison's sake or to modify the original methodology to fit the new experiment.
*Link to microarray data
**Found it on [https://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-15334/ ArrayExpress]
**This contains the raw data that we will use for our research
*What experiment was performed? What was the "treatment" and what was the "control" in the experiment?
**Strains of ''Shewanella oneidensis'' were put under iron depletion and repletion conditions. The control would be a regular strain of the organism, while the treatments would be either increasing or decreasing the iron levels.
*Were replicate experiments of the "treatment" and "control" conditions conducted? Were these biological or technical replicates? How many of each?
**4 biological replicates of each treatment condition were performed

Rlegaspi Week 14

2015-12-03T23:16:36Z

Rlegaspi: Inserted file that was sent to Dr. Dahlquist.

{{Heavy Metal HaterZ}}

= GenMAPP User Milestones =

== Milestone 1: Startup from Original Microarray Paper and Data ==

# Download microarray data in its “rawest” form that you can find. (Consult with Dr. Dahlquist about this.)
# Verify that the gene IDs in the microarray data match the chosen species and '''''strain''''' that is being used to create the ''.gdb''. (Needs to be done in conjunction with the QA and Coder.)
# File management: on your team's home page
#* Link to the source of the microarray data
#* Upload the microarray data files to the wiki

== Milestone 2: Data Preparation ==

# Create a table or list that shows the correspondence between the samples in the experiment and the files you have downloaded.
# Determine how many biological or technical replicates, and which samples were labeled with Cy3 or Cy5.
# Create a Master Raw Data file that contains the IDs and columns of data required for further analysis.
# Consult with Dr. Dahlquist on how to process the data (normalization, statistics)

== Milestone 3: On-going Analysis Cycle ==

This milestone represents the on-going work of the GenMAPP user, as the project gradually converges toward a final version of its gene database and compares its own results to the results in the original microarray paper. (Note: These milestones may continue on to Week 13.)

# Perform the statistical analysis in Excel.
# Format the gene expression data for import into GenMAPP.
# Import data into GenMAPP, create ColorSets, and run MAPPFinder.
# Document and take notes on test runs with GenMAPP.
#* Use the ''EX.txt'' file to help the Coder/Quality Assurance team members to validate the ''.gdb''.
# Do a journal club outline of the paper so that you can use it in the Discussion section of your group report and your final presentation.
# Create a ''.mapp'' file showing one pathway that is changed in your data.

== Week 14 Notes ==
*Compiling Raw Data and Statistical Analysis
*Created a CompiledRawData Sheet.
*Created a MasterSheet and deleted data with GeneID containing the following:
*#Number of deletions: 705
*Found the error message <code>#NUM!</code> and replaced with a blank space ("nothing") and there were 2118 replacements made.
*Found the error message <code>#DIV/0!</code> and replaced with blank space ("nothing") and there were 23 replacements made.

File was sent to Dr. Dahlquist: [[Media:Raw Data Shewanella RARL 20151201.xlsx]]

== Week 12 Feedback ==

* I have reviewed the work that Emily and you did to compile your raw data this last week and want to acknowledge the hard work that you both put in to organize the files and perform the calculations. I'm putting the feedback here on your page and will put a note on Emily's talk page to refer her to this page.
* In light of the effort that it took to get to this point and to cut down on the additional workload that your particular dataset entails, from this point forward, let's cut out two of the timepoints for the depletion and repletion experiments (ignoring the 10 minute and 40 minute timepoints and keeping the 0, 5, 20, and 60 minute timepoints).
*# All of your calculations at this point exist as individual files; you need to compile all of the log ratios you computed into a single file. In a new workbook, name the first sheet "CompiledRawData". Name Column A "ID" and copy and paste in the list of IDs from one of your files.
*# Create a "MasterIndex" column as follows. Insert a new column to the right of the "ID" column and name it "MasterIndex". In this column you will create a numerical index of genes so that you can always sort them back into the same order that they started out in.
*#* Type a "1" in cell B2 and a "2" in cell B3.
*#* Select both cells. Hover your mouse over the bottom-right corner of the selection until it makes a thin black + sign. Double-click on the + sign to fill the entire column with a series of numbers from 1 to 11520 (the number of spots on the microarray).
*# Then you need to copy and paste (values) the "log2" column from your individual files. They should be in order Log2FC-CO-rep1, Log2FC-CO-rep2, Log2FC-C0-rep3, Log2FC-C0-rep4, Log2FC-C5-rep1, etc.
*# The next set of manipulations should be performed in a new sheet called "MasterSheet".
*# Sort the data A-->Z based on the "ID" column. Delete all rows that have an ID of "Blank", "blank", "gDNA", start with "NC-", start with "ORF". Record how many rows got deleted.
*# Some of your cells are going to have error messages in them because of the previous calculations you did. Find and replace all of these with nothing, record how many cells that is.
*# Create a new worksheet called "ScalingCentering" and copy and paste special all of your data into this new sheet. [http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae#Normalize_the_log_ratios_for_the_set_of_slides_in_the_experiment You will perform the scaling and centering operations like you did for the ''Vibrio cholerae'' data.]
*#* Once you have done this, e-mail Dr. Dahlquist and provide a link to your file. Your microarray data has duplicated spots. I have a script that will separate these out so that you can average them as technical replicates.
*# Create a new worksheet called "Statistics", copy and paste the values into this new sheet.
*# You will average the technical replicate spots for each sample to get one value for each sample.
*# You will average the biological replicates for each timepoint to get an average for each timepoint (C0, C5, C20, C60, F5, F20, F60).
*# You will take the ratio of the average log ratios so that you have values for the Average Log Ratio of C5/C0, C20/C0, C60/C0, F5/C60, F20/C60, and F60/C60). Since this is in log space, to take the ratio, you will actually subtract instead of divide.
*# You will perform a two-sample T test between C5 and C0, C20 and C0, C60 and C0, F5 and C60, etc. and perform the Bonferroni and Benjamini and Hochberg corrections on these p values. This computation is not the same as what we did for ''Vibrio''. Instead you will use the <code>TTEST</code> function in Excel (see me when you are ready to do this). The corrections will be the same as what you did before.
*# After that, then you will need to then format the data for GenMAPP and you'll be ready to import it into GenMAPP and run MAPPFinder.
* Let me know if you have any questions.

''— [[User:Kdahlquist|Kdahlquist]] ([[User talk:Kdahlquist|talk]]) 13:34, 24 November 2015 (PST)''

File:Raw Data Shewanella RARL 20151201.xlsx

2015-12-03T22:56:24Z

Rlegaspi: Shewanella Raw Data File statistical analysis up to scaling and centering

Shewanella Raw Data File statistical analysis up to scaling and centering

Rlegaspi Week 14

2015-12-03T22:15:43Z

Rlegaspi: /* Week 14 Notes */ found another error message.

{{Heavy Metal HaterZ}}

= GenMAPP User Milestones =

== Milestone 1: Startup from Original Microarray Paper and Data ==

# Download microarray data in its “rawest” form that you can find. (Consult with Dr. Dahlquist about this.)
# Verify that the gene IDs in the microarray data match the chosen species and '''''strain''''' that is being used to create the ''.gdb''. (Needs to be done in conjunction with the QA and Coder.)
# File management: on your team's home page
#* Link to the source of the microarray data
#* Upload the microarray data files to the wiki

== Milestone 2: Data Preparation ==

# Create a table or list that shows the correspondence between the samples in the experiment and the files you have downloaded.
# Determine how many biological or technical replicates, and which samples were labeled with Cy3 or Cy5.
# Create a Master Raw Data file that contains the IDs and columns of data required for further analysis.
# Consult with Dr. Dahlquist on how to process the data (normalization, statistics)

== Milestone 3: On-going Analysis Cycle ==

This milestone represents the on-going work of the GenMAPP user, as the project gradually converges toward a final version of its gene database and compares its own results to the results in the original microarray paper. (Note: These milestones may continue on to Week 13.)

# Perform the statistical analysis in Excel.
# Format the gene expression data for import into GenMAPP.
# Import data into GenMAPP, create ColorSets, and run MAPPFinder.
# Document and take notes on test runs with GenMAPP.
#* Use the ''EX.txt'' file to help the Coder/Quality Assurance team members to validate the ''.gdb''.
# Do a journal club outline of the paper so that you can use it in the Discussion section of your group report and your final presentation.
# Create a ''.mapp'' file showing one pathway that is changed in your data.

== Week 14 Notes ==
*Compiling Raw Data and Statistical Analysis
*Created a CompiledRawData Sheet.
*Created a MasterSheet and deleted data with GeneID containing the following:
*#Number of deletions: 705
*Found the error message <code>#NUM!</code> and replaced with a blank space ("nothing") and there were 2118 replacements made.
*Found the error message <code>#DIV/0!</code> and replaced with blank space ("nothing") and there were 23 replacements made.

== Week 12 Feedback ==

* I have reviewed the work that Emily and you did to compile your raw data this last week and want to acknowledge the hard work that you both put in to organize the files and perform the calculations. I'm putting the feedback here on your page and will put a note on Emily's talk page to refer her to this page.
* In light of the effort that it took to get to this point and to cut down on the additional workload that your particular dataset entails, from this point forward, let's cut out two of the timepoints for the depletion and repletion experiments (ignoring the 10 minute and 40 minute timepoints and keeping the 0, 5, 20, and 60 minute timepoints).
*# All of your calculations at this point exist as individual files; you need to compile all of the log ratios you computed into a single file. In a new workbook, name the first sheet "CompiledRawData". Name Column A "ID" and copy and paste in the list of IDs from one of your files.
*# Create a "MasterIndex" column as follows. Insert a new column to the right of the "ID" column and name it "MasterIndex". In this column you will create a numerical index of genes so that you can always sort them back into the same order that they started out in.
*#* Type a "1" in cell B2 and a "2" in cell B3.
*#* Select both cells. Hover your mouse over the bottom-right corner of the selection until it makes a thin black + sign. Double-click on the + sign to fill the entire column with a series of numbers from 1 to 11520 (the number of spots on the microarray).
*# Then you need to copy and paste (values) the "log2" column from your individual files. They should be in order Log2FC-CO-rep1, Log2FC-CO-rep2, Log2FC-C0-rep3, Log2FC-C0-rep4, Log2FC-C5-rep1, etc.
*# The next set of manipulations should be performed in a new sheet called "MasterSheet".
*# Sort the data A-->Z based on the "ID" column. Delete all rows that have an ID of "Blank", "blank", "gDNA", start with "NC-", start with "ORF". Record how many rows got deleted.
*# Some of your cells are going to have error messages in them because of the previous calculations you did. Find and replace all of these with nothing, record how many cells that is.
*# Create a new worksheet called "ScalingCentering" and copy and paste special all of your data into this new sheet. [http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae#Normalize_the_log_ratios_for_the_set_of_slides_in_the_experiment You will perform the scaling and centering operations like you did for the ''Vibrio cholerae'' data.]
*#* Once you have done this, e-mail Dr. Dahlquist and provide a link to your file. Your microarray data has duplicated spots. I have a script that will separate these out so that you can average them as technical replicates.
*# Create a new worksheet called "Statistics", copy and paste the values into this new sheet.
*# You will average the technical replicate spots for each sample to get one value for each sample.
*# You will average the biological replicates for each timepoint to get an average for each timepoint (C0, C5, C20, C60, F5, F20, F60).
*# You will take the ratio of the average log ratios so that you have values for the Average Log Ratio of C5/C0, C20/C0, C60/C0, F5/C60, F20/C60, and F60/C60). Since this is in log space, to take the ratio, you will actually subtract instead of divide.
*# You will perform a two-sample T test between C5 and C0, C20 and C0, C60 and C0, F5 and C60, etc. and perform the Bonferroni and Benjamini and Hochberg corrections on these p values. This computation is not the same as what we did for ''Vibrio''. Instead you will use the <code>TTEST</code> function in Excel (see me when you are ready to do this). The corrections will be the same as what you did before.
*# After that, then you will need to then format the data for GenMAPP and you'll be ready to import it into GenMAPP and run MAPPFinder.
* Let me know if you have any questions.

''— [[User:Kdahlquist|Kdahlquist]] ([[User talk:Kdahlquist|talk]]) 13:34, 24 November 2015 (PST)''

Rlegaspi Week 14

2015-12-01T23:41:30Z

Rlegaspi: Deleting error messages and replacing them with blank spaces and recorded how many error messages contained on the mastersheet.

{{Heavy Metal HaterZ}}

= GenMAPP User Milestones =

== Milestone 1: Startup from Original Microarray Paper and Data ==

# Download microarray data in its “rawest” form that you can find. (Consult with Dr. Dahlquist about this.)
# Verify that the gene IDs in the microarray data match the chosen species and '''''strain''''' that is being used to create the ''.gdb''. (Needs to be done in conjunction with the QA and Coder.)
# File management: on your team's home page
#* Link to the source of the microarray data
#* Upload the microarray data files to the wiki

== Milestone 2: Data Preparation ==

# Create a table or list that shows the correspondence between the samples in the experiment and the files you have downloaded.
# Determine how many biological or technical replicates, and which samples were labeled with Cy3 or Cy5.
# Create a Master Raw Data file that contains the IDs and columns of data required for further analysis.
# Consult with Dr. Dahlquist on how to process the data (normalization, statistics)

== Milestone 3: On-going Analysis Cycle ==

This milestone represents the on-going work of the GenMAPP user, as the project gradually converges toward a final version of its gene database and compares its own results to the results in the original microarray paper. (Note: These milestones may continue on to Week 13.)

# Perform the statistical analysis in Excel.
# Format the gene expression data for import into GenMAPP.
# Import data into GenMAPP, create ColorSets, and run MAPPFinder.
# Document and take notes on test runs with GenMAPP.
#* Use the ''EX.txt'' file to help the Coder/Quality Assurance team members to validate the ''.gdb''.
# Do a journal club outline of the paper so that you can use it in the Discussion section of your group report and your final presentation.
# Create a ''.mapp'' file showing one pathway that is changed in your data.

== Week 14 Notes ==
*Compiling Raw Data and Statistical Analysis
*Created a CompiledRawData Sheet.
*Created a MasterSheet and deleted data with GeneID containing the following:
*#Number of deletions: 705
*Found the error message <code>#NUM!</code> and replaced with a blank space ("nothing") and there were 2118 spaces.

== Week 12 Feedback ==

* I have reviewed the work that Emily and you did to compile your raw data this last week and want to acknowledge the hard work that you both put in to organize the files and perform the calculations. I'm putting the feedback here on your page and will put a note on Emily's talk page to refer her to this page.
* In light of the effort that it took to get to this point and to cut down on the additional workload that your particular dataset entails, from this point forward, let's cut out two of the timepoints for the depletion and repletion experiments (ignoring the 10 minute and 40 minute timepoints and keeping the 0, 5, 20, and 60 minute timepoints).
*# All of your calculations at this point exist as individual files; you need to compile all of the log ratios you computed into a single file. In a new workbook, name the first sheet "CompiledRawData". Name Column A "ID" and copy and paste in the list of IDs from one of your files.
*# Create a "MasterIndex" column as follows. Insert a new column to the right of the "ID" column and name it "MasterIndex". In this column you will create a numerical index of genes so that you can always sort them back into the same order that they started out in.
*#* Type a "1" in cell B2 and a "2" in cell B3.
*#* Select both cells. Hover your mouse over the bottom-right corner of the selection until it makes a thin black + sign. Double-click on the + sign to fill the entire column with a series of numbers from 1 to 11520 (the number of spots on the microarray).
*# Then you need to copy and paste (values) the "log2" column from your individual files. They should be in order Log2FC-CO-rep1, Log2FC-CO-rep2, Log2FC-C0-rep3, Log2FC-C0-rep4, Log2FC-C5-rep1, etc.
*# The next set of manipulations should be performed in a new sheet called "MasterSheet".
*# Sort the data A-->Z based on the "ID" column. Delete all rows that have an ID of "Blank", "blank", "gDNA", start with "NC-", start with "ORF". Record how many rows got deleted.
*# Some of your cells are going to have error messages in them because of the previous calculations you did. Find and replace all of these with nothing, record how many cells that is.
*# Create a new worksheet called "ScalingCentering" and copy and paste special all of your data into this new sheet. [http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae#Normalize_the_log_ratios_for_the_set_of_slides_in_the_experiment You will perform the scaling and centering operations like you did for the ''Vibrio cholerae'' data.]
*#* Once you have done this, e-mail Dr. Dahlquist and provide a link to your file. Your microarray data has duplicated spots. I have a script that will separate these out so that you can average them as technical replicates.
*# Create a new worksheet called "Statistics", copy and paste the values into this new sheet.
*# You will average the technical replicate spots for each sample to get one value for each sample.
*# You will average the biological replicates for each timepoint to get an average for each timepoint (C0, C5, C20, C60, F5, F20, F60).
*# You will take the ratio of the average log ratios so that you have values for the Average Log Ratio of C5/C0, C20/C0, C60/C0, F5/C60, F20/C60, and F60/C60). Since this is in log space, to take the ratio, you will actually subtract instead of divide.
*# You will perform a two-sample T test between C5 and C0, C20 and C0, C60 and C0, F5 and C60, etc. and perform the Bonferroni and Benjamini and Hochberg corrections on these p values. This computation is not the same as what we did for ''Vibrio''. Instead you will use the <code>TTEST</code> function in Excel (see me when you are ready to do this). The corrections will be the same as what you did before.
*# After that, then you will need to then format the data for GenMAPP and you'll be ready to import it into GenMAPP and run MAPPFinder.
* Let me know if you have any questions.

''— [[User:Kdahlquist|Kdahlquist]] ([[User talk:Kdahlquist|talk]]) 13:34, 24 November 2015 (PST)''

Rlegaspi Week 14

2015-12-01T23:36:06Z

Rlegaspi: Inserted a notes section and wrote down how many deletions were made on the MasterSheet.

{{Heavy Metal HaterZ}}

= GenMAPP User Milestones =

== Milestone 1: Startup from Original Microarray Paper and Data ==

# Download microarray data in its “rawest” form that you can find. (Consult with Dr. Dahlquist about this.)
# Verify that the gene IDs in the microarray data match the chosen species and '''''strain''''' that is being used to create the ''.gdb''. (Needs to be done in conjunction with the QA and Coder.)
# File management: on your team's home page
#* Link to the source of the microarray data
#* Upload the microarray data files to the wiki

== Milestone 2: Data Preparation ==

# Create a table or list that shows the correspondence between the samples in the experiment and the files you have downloaded.
# Determine how many biological or technical replicates, and which samples were labeled with Cy3 or Cy5.
# Create a Master Raw Data file that contains the IDs and columns of data required for further analysis.
# Consult with Dr. Dahlquist on how to process the data (normalization, statistics)

== Milestone 3: On-going Analysis Cycle ==

This milestone represents the on-going work of the GenMAPP user, as the project gradually converges toward a final version of its gene database and compares its own results to the results in the original microarray paper. (Note: These milestones may continue on to Week 13.)

# Perform the statistical analysis in Excel.
# Format the gene expression data for import into GenMAPP.
# Import data into GenMAPP, create ColorSets, and run MAPPFinder.
# Document and take notes on test runs with GenMAPP.
#* Use the ''EX.txt'' file to help the Coder/Quality Assurance team members to validate the ''.gdb''.
# Do a journal club outline of the paper so that you can use it in the Discussion section of your group report and your final presentation.
# Create a ''.mapp'' file showing one pathway that is changed in your data.

== Week 14 Notes ==
Compiling Raw Data and Statistical Analysis

Created a CompiledRawData Sheet.

Created a MasterSheet and deleted data with GeneID containing the following:
Number of deletions: 705

== Week 12 Feedback ==

* I have reviewed the work that Emily and you did to compile your raw data this last week and want to acknowledge the hard work that you both put in to organize the files and perform the calculations. I'm putting the feedback here on your page and will put a note on Emily's talk page to refer her to this page.
* In light of the effort that it took to get to this point and to cut down on the additional workload that your particular dataset entails, from this point forward, let's cut out two of the timepoints for the depletion and repletion experiments (ignoring the 10 minute and 40 minute timepoints and keeping the 0, 5, 20, and 60 minute timepoints).
*# All of your calculations at this point exist as individual files; you need to compile all of the log ratios you computed into a single file. In a new workbook, name the first sheet "CompiledRawData". Name Column A "ID" and copy and paste in the list of IDs from one of your files.
*# Create a "MasterIndex" column as follows. Insert a new column to the right of the "ID" column and name it "MasterIndex". In this column you will create a numerical index of genes so that you can always sort them back into the same order that they started out in.
*#* Type a "1" in cell B2 and a "2" in cell B3.
*#* Select both cells. Hover your mouse over the bottom-right corner of the selection until it makes a thin black + sign. Double-click on the + sign to fill the entire column with a series of numbers from 1 to 11520 (the number of spots on the microarray).
*# Then you need to copy and paste (values) the "log2" column from your individual files. They should be in order Log2FC-CO-rep1, Log2FC-CO-rep2, Log2FC-C0-rep3, Log2FC-C0-rep4, Log2FC-C5-rep1, etc.
*# The next set of manipulations should be performed in a new sheet called "MasterSheet".
*# Sort the data A-->Z based on the "ID" column. Delete all rows that have an ID of "Blank", "blank", "gDNA", start with "NC-", start with "ORF". Record how many rows got deleted.
*# Some of your cells are going to have error messages in them because of the previous calculations you did. Find and replace all of these with nothing, record how many cells that is.
*# Create a new worksheet called "ScalingCentering" and copy and paste special all of your data into this new sheet. [http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae#Normalize_the_log_ratios_for_the_set_of_slides_in_the_experiment You will perform the scaling and centering operations like you did for the ''Vibrio cholerae'' data.]
*#* Once you have done this, e-mail Dr. Dahlquist and provide a link to your file. Your microarray data has duplicated spots. I have a script that will separate these out so that you can average them as technical replicates.
*# Create a new worksheet called "Statistics", copy and paste the values into this new sheet.
*# You will average the technical replicate spots for each sample to get one value for each sample.
*# You will average the biological replicates for each timepoint to get an average for each timepoint (C0, C5, C20, C60, F5, F20, F60).
*# You will take the ratio of the average log ratios so that you have values for the Average Log Ratio of C5/C0, C20/C0, C60/C0, F5/C60, F20/C60, and F60/C60). Since this is in log space, to take the ratio, you will actually subtract instead of divide.
*# You will perform a two-sample T test between C5 and C0, C20 and C0, C60 and C0, F5 and C60, etc. and perform the Bonferroni and Benjamini and Hochberg corrections on these p values. This computation is not the same as what we did for ''Vibrio''. Instead you will use the <code>TTEST</code> function in Excel (see me when you are ready to do this). The corrections will be the same as what you did before.
*# After that, then you will need to then format the data for GenMAPP and you'll be ready to import it into GenMAPP and run MAPPFinder.
* Let me know if you have any questions.

''— [[User:Kdahlquist|Kdahlquist]] ([[User talk:Kdahlquist|talk]]) 13:34, 24 November 2015 (PST)''

Rlegaspi Week 14

2015-12-01T22:50:40Z

Rlegaspi: Creation of Week 14 page with notes from Dr. Dahlquist

{{Heavy Metal HaterZ}}

= GenMAPP User Milestones =

== Milestone 1: Startup from Original Microarray Paper and Data ==

# Download microarray data in its “rawest” form that you can find. (Consult with Dr. Dahlquist about this.)
# Verify that the gene IDs in the microarray data match the chosen species and '''''strain''''' that is being used to create the ''.gdb''. (Needs to be done in conjunction with the QA and Coder.)
# File management: on your team's home page
#* Link to the source of the microarray data
#* Upload the microarray data files to the wiki

== Milestone 2: Data Preparation ==

# Create a table or list that shows the correspondence between the samples in the experiment and the files you have downloaded.
# Determine how many biological or technical replicates, and which samples were labeled with Cy3 or Cy5.
# Create a Master Raw Data file that contains the IDs and columns of data required for further analysis.
# Consult with Dr. Dahlquist on how to process the data (normalization, statistics)

== Milestone 3: On-going Analysis Cycle ==

This milestone represents the on-going work of the GenMAPP user, as the project gradually converges toward a final version of its gene database and compares its own results to the results in the original microarray paper. (Note: These milestones may continue on to Week 13.)

# Perform the statistical analysis in Excel.
# Format the gene expression data for import into GenMAPP.
# Import data into GenMAPP, create ColorSets, and run MAPPFinder.
# Document and take notes on test runs with GenMAPP.
#* Use the ''EX.txt'' file to help the Coder/Quality Assurance team members to validate the ''.gdb''.
# Do a journal club outline of the paper so that you can use it in the Discussion section of your group report and your final presentation.
# Create a ''.mapp'' file showing one pathway that is changed in your data.

== Week 12 Feedback ==

* I have reviewed the work that Emily and you did to compile your raw data this last week and want to acknowledge the hard work that you both put in to organize the files and perform the calculations. I'm putting the feedback here on your page and will put a note on Emily's talk page to refer her to this page.
* In light of the effort that it took to get to this point and to cut down on the additional workload that your particular dataset entails, from this point forward, let's cut out two of the timepoints for the depletion and repletion experiments (ignoring the 10 minute and 40 minute timepoints and keeping the 0, 5, 20, and 60 minute timepoints).
*# All of your calculations at this point exist as individual files; you need to compile all of the log ratios you computed into a single file. In a new workbook, name the first sheet "CompiledRawData". Name Column A "ID" and copy and paste in the list of IDs from one of your files.
*# Create a "MasterIndex" column as follows. Insert a new column to the right of the "ID" column and name it "MasterIndex". In this column you will create a numerical index of genes so that you can always sort them back into the same order that they started out in.
*#* Type a "1" in cell B2 and a "2" in cell B3.
*#* Select both cells. Hover your mouse over the bottom-right corner of the selection until it makes a thin black + sign. Double-click on the + sign to fill the entire column with a series of numbers from 1 to 11520 (the number of spots on the microarray).
*# Then you need to copy and paste (values) the "log2" column from your individual files. They should be in order Log2FC-CO-rep1, Log2FC-CO-rep2, Log2FC-C0-rep3, Log2FC-C0-rep4, Log2FC-C5-rep1, etc.
*# The next set of manipulations should be performed in a new sheet called "MasterSheet".
*# Sort the data A-->Z based on the "ID" column. Delete all rows that have an ID of "Blank", "blank", "gDNA", start with "NC-", start with "ORF". Record how many rows got deleted.
*# Some of your cells are going to have error messages in them because of the previous calculations you did. Find and replace all of these with nothing, record how many cells that is.
*# Create a new worksheet called "ScalingCentering" and copy and paste special all of your data into this new sheet. [http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae#Normalize_the_log_ratios_for_the_set_of_slides_in_the_experiment You will perform the scaling and centering operations like you did for the ''Vibrio cholerae'' data.]
*#* Once you have done this, e-mail Dr. Dahlquist and provide a link to your file. Your microarray data has duplicated spots. I have a script that will separate these out so that you can average them as technical replicates.
*# Create a new worksheet called "Statistics", copy and paste the values into this new sheet.
*# You will average the technical replicate spots for each sample to get one value for each sample.
*# You will average the biological replicates for each timepoint to get an average for each timepoint (C0, C5, C20, C60, F5, F20, F60).
*# You will take the ratio of the average log ratios so that you have values for the Average Log Ratio of C5/C0, C20/C0, C60/C0, F5/C60, F20/C60, and F60/C60). Since this is in log space, to take the ratio, you will actually subtract instead of divide.
*# You will perform a two-sample T test between C5 and C0, C20 and C0, C60 and C0, F5 and C60, etc. and perform the Bonferroni and Benjamini and Hochberg corrections on these p values. This computation is not the same as what we did for ''Vibrio''. Instead you will use the <code>TTEST</code> function in Excel (see me when you are ready to do this). The corrections will be the same as what you did before.
*# After that, then you will need to then format the data for GenMAPP and you'll be ready to import it into GenMAPP and run MAPPFinder.
* Let me know if you have any questions.

''— [[User:Kdahlquist|Kdahlquist]] ([[User talk:Kdahlquist|talk]]) 13:34, 24 November 2015 (PST)''

Rlegaspi Week 12

2015-11-24T09:23:43Z

Rlegaspi: Linked combined files onto my assignment page.

{{Heavy Metal HaterZ}}

= GenMAPP User Milestones =

== Milestone 1: Startup from Original Microarray Paper and Data ==

# Download microarray data in its “rawest” form that you can find. (Consult with Dr. Dahlquist about this.)
# Verify that the gene IDs in the microarray data match the chosen species and '''''strain''''' that is being used to create the ''.gdb''. (Needs to be done in conjunction with the QA and Coder.)
# File management: on your team's home page
#* Link to the source of the microarray data
#* Upload the microarray data files to the wiki

== Milestone 2: Data Preparation ==

# Create a table or list that shows the correspondence between the samples in the experiment and the files you have downloaded.
# Determine how many biological or technical replicates, and which samples were labeled with Cy3 or Cy5.
# Create a Master Raw Data file that contains the IDs and columns of data required for further analysis.
# Consult with Dr. Dahlquist on how to process the data (normalization, statistics)

== Milestone 3: On-going Analysis Cycle ==

This milestone represents the on-going work of the GenMAPP user, as the project gradually converges toward a final version of its gene database and compares its own results to the results in the original microarray paper. (Note: These milestones may continue on to Week 13.)

# Perform the statistical analysis in Excel.
# Format the gene expression data for import into GenMAPP.
# Import data into GenMAPP, create ColorSets, and run MAPPFinder.
# Document and take notes on test runs with GenMAPP.
#* Use the ''EX.txt'' file to help the Coder/Quality Assurance team members to validate the ''.gdb''.
# Do a journal club outline of the paper so that you can use it in the Discussion section of your group report and your final presentation.
# Create a ''.mapp'' file showing one pathway that is changed in your data.

= Summary of Progress =

== Consultation for Data Processing ==

* During the Thursday class period (11/19/2015), Dr. Dahlquist went over an interpretation of how the paper writers acquired microarray data; in addition, Dr. Dahlquist explained an approach we would need to take in order to extract and to process the raw data from the files in order to be used for normalization and statistical analysis.
*# Dr. Dahlquist Notes:
*## Page 1 - [[Media:DrDDataProcessNotes1 20151119 HMH.JPG]]
*## Page 2 - [[Media:DrDDataProcessNotes2 20151119 HMH.JPG]]
*Will work on all "C" files - Microarray data gathered from MR-1 cultures treated with iron chelator at different time intervals.
*52 chips total - 104 files (one for Cy3, one for Cy5)
*Sample and Data Relationships File has 208 - Presence of duplicates.
*Need columns G (Gene ID), K (Signal Median), L (Background Median), H (Flag Column)
*Log2 (Cy5 signal median - Cy5 signal background / Cy3 signal median - Cy3 signal background) - Log Equation to be used with data

== Data Processing ==

=== Raw Data Files (Cy3 and Cy5 files) ===

*Downloaded each raw data file and stored files on flash drive.
*Opened raw data files using Microsoft Excel.
*Deleted unnecessary information at the top of the raw data columns; information was labeled as ''Header''.
*Created new tab with "Needed Columns" label
*Transferred columns G, H, K, and L to this tab
*Saved the file as a .xlsx file and uploaded to the wiki


*C0_1_Cy3: [[File:GSM496728 MR-1 C0 1 Cy3.xlsx]]
*C0_1_Cy5: [[File:GSM496728 MR-1 C0 1 Cy5.xlsx]]
*C0_2_Cy3: [[File:GSM496729 MR-1 C0 2 Cy3.xlsx]]
*C0_2_Cy5: [[File:GSM496729 MR-1 C0 2 Cy5.xlsx]]
*C0_3_Cy3: [[File:GSM496730 MR-1 C0 3 Cy3.xlsx]]
*C0_3_Cy5: [[File:GSM496730 MR-1 C0 3 Cy5.xlsx]]
*C0_4_Cy3: [[File:GSM496731 MR-1 C0 4 Cy3.xlsx]]
*C0_4_Cy5: [[File:GSM496731 MR-1 C0 4 Cy5.xlsx]]
*C1_1_Cy3: [[File:GSM496732 MR-1 C1 1 Cy3.xlsx]]
*C1_1_Cy5: [[File:GSM496732 MR-1 C1 1 Cy5.xlsx]]
*C1_2_Cy3: [[File:GSM496733 MR-1 C1 2 Cy3.xlsx]]
*C1_2_Cy5: [[File:GSM496733 MR-1 C1 2 Cy5.xlsx]]
*C1_3_Cy3: [[File:GSM496734 MR-1 C1 3 Cy3.xlsx]]
*C1_3_Cy5: [[File:GSM496734 MR-1 C1 3 Cy5.xlsx]]
*C1_4_Cy3: [[File:GSM496735 MR-1 C1 4 Cy3.xlsx]]
*C1_4_Cy5: [[File:GSM496735 MR-1 C1 4 Cy5.xlsx]]
*C5_1_Cy3: [[File:GSM496736 MR-1 C5 1 Cy3.xlsx]]
*C5_1_Cy5: [[File:GSM496736 MR-1 C5 1 Cy5.xlsx]]
*C5_2_Cy3: [[File:GSM496737 MR-1 C5 2 Cy3.xlsx]]
*C5_2_Cy5: [[File:GSM496737 MR-1 C5 2 Cy5.xlsx]]
*C5_3_Cy3: [[File:GSM496738 MR-1 C5 3 Cy3.xlsx]]
*C5_3_Cy5: [[File:GSM496738 MR-1 C5 3 Cy5.xlsx]]
*C5_4_Cy3: [[File:GSM496739 MR-1 C5 4 Cy3.xlsx]]
*C5_4_Cy5: [[File:GSM496739 MR-1 C5 4 Cy5.xlsx]]
*C10_1_Cy3: [[File:GSM496740 MR-1 C10 1 Cy3.xlsx]]
*C10_1_Cy5: [[File:GSM496740 MR-1 C10 1 Cy5.xlsx]]
*C10_2_Cy3: [[File:GSM496741 MR-1 C10 2 Cy3.xlsx]]
*C10_2_Cy5: [[File:GSM496741 MR-1 C10 2 Cy5.xlsx]]
*C10_3_Cy3: [[File:GSM496742 MR-1 C10 3 Cy3.xlsx]]
*C10_3_Cy5: [[File:GSM496742 MR-1 C10 3 Cy5.xlsx]]
*C10_4_Cy3: [[File:GSM496743 MR-1 C10 4 Cy3.xlsx]]
*C10_4_Cy5: [[File:GSM496743 MR-1 C10 4 Cy5.xlsx]]
*C20_1_Cy3: [[File:GSM496744 MR-1 C20 1 Cy3.xlsx]]
*C20_1_Cy5: [[File:GSM496744 MR-1 C20 1 Cy5.xlsx]]
*C20_2_Cy3: [[File:GSM496745 MR-1 C20 2 Cy3.xlsx]]
*C20_2_Cy5: [[File:GSM496745 MR-1 C20 2 Cy5.xlsx]]
*C20_3_Cy3: [[File:GSM496746 MR-1 C20 3 Cy3.xlsx]]
*C20_3_Cy5: [[File:GSM496746 MR-1 C20 3 Cy5.xlsx]]
*C20_4_Cy3: [[File:GSM496747 MR-1 C20 4 Cy3.xlsx]]
*C20_4_Cy5: [[File:GSM496747 MR-1 C20 4 Cy5.xlsx]]
*C40_1_Cy3: [[File:GSM496748 MR-1 C40 1 Cy3.xlsx]]
*C40_1_Cy5: [[File:GSM496748 MR-1 C40 1 Cy5.xlsx]]
*C40_2_Cy3: [[File:GSM496749 MR-1 C40 2 Cy3.xlsx]]
*C40_2_Cy5: [[File:GSM496749 MR-1 C40 2 Cy5.xlsx]]
*C40_3_Cy3: [[File:GSM496750 MR-1 C40 3 Cy3.xlsx]]
*C40_3_Cy5: [[File:GSM496750 MR-1 C40 3 Cy5.xlsx]]
*C40_4_Cy3: [[File:GSM496751 MR-1 C40 4 Cy3.xlsx]]
*C40_4_Cy5: [[File:GSM496751 MR-1 C40 4 Cy5.xlsx]]
*C60_1_Cy3: [[File:GSM496752 MR-1 C60 1 Cy3.xlsx]]
*C60_1_Cy5: [[File:GSM496752 MR-1 C60 1 Cy5.xlsx]]
*C60_2_Cy3: [[File:GSM496753 MR-1 C60 2 Cy3.xlsx]]
*C60_2_Cy5: [[File:GSM496753 MR-1 C60 2 Cy5.xlsx]]
*C60_3_Cy3: [[File:GSM496754 MR-1 C60 3 Cy3.xlsx]]
*C60_3_Cy5: [[File:GSM496754 MR-1 C60 3 Cy5.xlsx]]
*C60_4_Cy3: [[File:GSM496755 MR-1 C60 4 Cy3.xlsx]]
*C60_4_Cy5: [[File:GSM496755 MR-1 C60 4 Cy5.xlsx]]

=== Consolidating Raw Data Files ===

*Combined the Cy3 and Cy5 files for each time point onto a single spreadsheet
*In each new spreadsheet:
**Sheet 1 = Cy3
**Sheet 2 = Cy5
**Sheet 3 = Combined
***Added _Cy3 to columns A-D and _Cy5 to columns E-H on row titles to differentiate the samples
*Calculated the Log2 for each of the following files in Sheet 4
**Copied over all of the data from Combined sheet
**Column I labeled (Cy5 signal median - Cy5 background median)
***<code>I2=G2-H2</code>
**Column J labeled (Cy3 signal median - Cy3 background median)
***<code>J2=C2-D2</code>
**Column K labeled (Cy5/Cy3)
***<code>K2=I2/J2</code>
**Column L labeled Log2
***performed <code>=LOG(K2, 2)</code>


*C0_1: [[File:C0 1 Combined.xlsx]]
*C0_2: [[File:C0 2 Combined.xlsx]]
*C0_3: [[File:C0 3 Combined.xlsx]]
*C0_4: [[File:C0 4 Combined.xlsx]]
*C1_1: [[File:C1 1 Combined.xlsx]]
*C1_2: [[File:C1 2 Combined.xlsx]]
*C1_3: [[File:C1 3 Combined.xlsx]]
*C1_4: [[File:C1 4 Combined.xlsx]]
*C5_1: [[File:C5 1 Combined.xlsx]]
*C5_2: [[File:C5 2 Combined.xlsx]]
*C5_3: [[File:C5 3 Combined.xlsx]]
*C5_4: [[File:C5 4 Combined.xlsx]]
*C10_1: [[File:C10 1 Combined.xlsx]]
*C10_2: [[File:C10 2 Combined.xlsx]]
*C10_3: [[File:C10 3 Combined.xlsx]]
*C10_4: [[File:C10 4 Combined.xlsx]]
*C20_1: [[File:C20 1 Combined.xlsx]]
*C20_2: [[File:C20 2 Combined.xlsx]]
*C20_3: [[File:C20 3 Combined.xlsx]]
*C20_4: [[File:C20 4 Combined.xlsx]]
*C40_1: [[File:C40 1 Combined.xlsx]]
*C40_2: [[File:C40 2 Combined.xlsx]]
*C40_3: [[File:C40 3 Combined.xlsx]]
*C40_4: [[File:C40 4 Combined.xlsx]]
*C60_1: [[File:C60 1 Combined.xlsx]]
*C60_2: [[File:C60 2 Combined.xlsx]]
*C60_3: [[File:C60 3 Combined.xlsx]]
*C60_4: [[File:C60 4 Combined.xlsx]]

= External Links =
{{Template:Rlegaspi}}

[[Category:Group Projects]]
[[Category:Heavy Metal HaterZ]]
[[Category:Journal Entry]]

File:C60 4 Combined.xlsx

2015-11-24T09:23:22Z

Rlegaspi:

File:C60 3 Combined.xlsx

2015-11-24T09:22:57Z

Rlegaspi:

File:C60 2 Combined.xlsx

2015-11-24T09:22:29Z

Rlegaspi:

File:C60 1 Combined.xlsx

2015-11-24T09:22:09Z

Rlegaspi:

File:C40 4 Combined.xlsx

2015-11-24T09:21:42Z

Rlegaspi:

File:C40 3 Combined.xlsx

2015-11-24T09:21:09Z

Rlegaspi:

File:C40 2 Combined.xlsx

2015-11-24T09:20:44Z

Rlegaspi:

File:C40 1 Combined.xlsx

2015-11-24T09:20:19Z

Rlegaspi:

File:C20 4 Combined.xlsx

2015-11-24T09:19:56Z

Rlegaspi:

File:C20 3 Combined.xlsx

2015-11-24T09:19:32Z

Rlegaspi:

File:C20 2 Combined.xlsx

2015-11-24T09:18:56Z

Rlegaspi:

File:C20 1 Combined.xlsx

2015-11-24T09:18:29Z

Rlegaspi:

File:C10 4 Combined.xlsx

2015-11-24T09:17:48Z

Rlegaspi:

File:C10 3 Combined.xlsx

2015-11-24T09:17:23Z

Rlegaspi:

File:C10 2 Combined.xlsx

2015-11-24T09:16:58Z

Rlegaspi:

File:C10 1 Combined.xlsx

2015-11-24T09:16:28Z

Rlegaspi: