Difference between revisions of "MSymond1 Week 9"

From LMU BioDB 2024
Jump to navigation Jump to search
(started sanity check)
(Sanity Check: first part of sanity check)
Line 9: Line 9:
 
A new worksheet was created in Excel with all of the raw p-values, the p-values were ranked from least to greatest in this new worksheet. To calculate the new p-values, the p-values were multiplied by 6189 again, and then they were divided by their rank of the p-values. The same process was repeated from the Bonferroni calculations in which they were only reported if they were less than 1.
 
A new worksheet was created in Excel with all of the raw p-values, the p-values were ranked from least to greatest in this new worksheet. To calculate the new p-values, the p-values were multiplied by 6189 again, and then they were divided by their rank of the p-values. The same process was repeated from the Bonferroni calculations in which they were only reported if they were less than 1.
 
===Sanity Check===
 
===Sanity Check===
#
+
#2528 genes have p>.05, 40.84%
 +
#1652 genes have p>.01, 26.59%
 +
#919 genes have p>.0001, 14.84%
 +
#496 genes have p>.0001, 8.01%
  
 
Strain: Wild type
 
Strain: Wild type

Revision as of 19:24, 20 March 2024

Purpose

This lab was conducted in Microsoft Excel to analyze a microarray dataset. Each group in class was given a different dataset for a different strain of data. The strain used in the present study was the wild type. The ANOVA tests done on the dataset determined which genes had a p value of less than .05, and they were then further analyzed further to discover that an ANOVA test alone is not enough to determine if there is a statistically significant change in the genes over time.

Methods & Results

ANOVA

The data was imported into Microsoft Excel, an ANOVA test was run on every gene (6189) in the data set by first calculating the average of each data point for each time interval (15, 30, 60, 90, 120). Then the sum of squares for each time interval was also calculated using the syntax provided by the lab protocol. The Fstat was then calculated for the full dataset, which then allowed to calculate the p-value for each gene. Once the p-values were all calculated, they were then modified to correct for the multiple testing problem.

Bonferroni and p-value correction

To calculate the Bonferroni p-value, the original p values were all multiplied by 6189. Then, in another column, the Bonferroni p values were all either changed to 1 if they were greater than 1, or they were reported as their Bonferroni p-values if they were less than 1.

Benjamini & Hochberg p-value correction

A new worksheet was created in Excel with all of the raw p-values, the p-values were ranked from least to greatest in this new worksheet. To calculate the new p-values, the p-values were multiplied by 6189 again, and then they were divided by their rank of the p-values. The same process was repeated from the Bonferroni calculations in which they were only reported if they were less than 1.

Sanity Check

  1. 2528 genes have p>.05, 40.84%
  2. 1652 genes have p>.01, 26.59%
  3. 919 genes have p>.0001, 14.84%
  4. 496 genes have p>.0001, 8.01%

Strain: Wild type File name: Symonds_BIOL367_S24_microarrary-data_wt.xlsx number of replicates: 4 times 15, 30, 60, 90, 120

  • P value: probability that you would have seen a change of that size due to chance
  • P value of >.05 is significant, 5%, 1/20
  • 5% of 6189, roughly 300
  • multiple hypothesis problem, the more tests you do, the more likely you'll find significance by chance
  • Bon Feronni correction, multiply p value by # hypothesis test
  • multiply p value by 6189

Media:Symonds_BIOL367_S24_microarray-data_wt3-19-24.xlsx.zip

User Page

Assignment Pages

Individual Journal Pages

Class Journal Pages