Difference between revisions of "Taur.vil Week 8"

From LMU BioDB 2013
Jump to: navigation, search
(Sanity Check: questions)
(Sanity Check)
Line 16: Line 16:
 
#In Merrell et al. (2002), the researchers used the SAM (statistical analysis for microarrays) program package to determine if there were significant differences in gene expression. In their analysis, they required a 100% fold change while we are using about a 20% fold change. This change in threshold decreases the chances of a false positive, but may also result in several false negative.  
 
#In Merrell et al. (2002), the researchers used the SAM (statistical analysis for microarrays) program package to determine if there were significant differences in gene expression. In their analysis, they required a 100% fold change while we are using about a 20% fold change. This change in threshold decreases the chances of a false positive, but may also result in several false negative.  
  
#VC0028: p-value=0.047      Average Fold Change=1.653
+
#VC0028: p-value=0.047;     Average Fold Change=1.653
#VC0941: p-value=0.676      Average Fold Change=0.093
+
#VC0941: p-value=0.676;     Average Fold Change=0.093
#VC0869: p-value=0.017      Average Fold Change=1.499
+
#VC0869: p-value=0.017;     Average Fold Change=1.499
#VC0051: p-value=0.014      Average Fold Change=1.922
+
#VC0051: p-value=0.014;     Average Fold Change=1.922
#VC0647: p-value<0.001      Average Fold Change=-1.113
+
#VC0647: p-value<0.001;     Average Fold Change=-1.113
#VC0468: p-value=0.335      Average Fold Change=-0.169
+
#VC0468: p-value=0.335;     Average Fold Change=-0.169
#VC2350: p-value=0.013      Average Fold Change=-2.402
+
#VC2350: p-value=0.013;     Average Fold Change=-2.402
#VCA0583: p-value=0.101      Average Fold Change=1.063
+
#VCA0583: p-value=0.101;     Average Fold Change=1.063
  
 
The results of our study and that of Merrell et al. (2002) were mixed. Out of the eight genes, we found 5 to be significant with a p-value less than 0.05 and a fold change greater that 1. However, the other three were not significant (particularly p>0.05).
 
The results of our study and that of Merrell et al. (2002) were mixed. Out of the eight genes, we found 5 to be significant with a p-value less than 0.05 and a fold change greater that 1. However, the other three were not significant (particularly p>0.05).

Revision as of 17:28, 17 October 2013

Uploaded Files:

Contents

Question Answers

Sanity Check

  1. 948 genes had a p-value <0.05.
  2. 235 genes had a p-value <0.01.
  3. 24 genes had a p-value <0.001.
  4. 2 genes had a p-value <0.0001.
  1. 352 of the genes with a p-value <0.05 had a positive Avg_LogFC_all value.
  2. 596 of the genes with a p-value <0.05 had a positive Avg_LogFC_all value.
  3. 339 of the genes with a p-value <0.05 had a Avg_LogFC_all greater than 0.25.
  4. 339 of the genes with a p-value <0.05 had a Avg_LogFC_all less than -0.25.
  1. In Merrell et al. (2002), the researchers used the SAM (statistical analysis for microarrays) program package to determine if there were significant differences in gene expression. In their analysis, they required a 100% fold change while we are using about a 20% fold change. This change in threshold decreases the chances of a false positive, but may also result in several false negative.
  1. VC0028: p-value=0.047; Average Fold Change=1.653
  2. VC0941: p-value=0.676; Average Fold Change=0.093
  3. VC0869: p-value=0.017; Average Fold Change=1.499
  4. VC0051: p-value=0.014; Average Fold Change=1.922
  5. VC0647: p-value<0.001; Average Fold Change=-1.113
  6. VC0468: p-value=0.335; Average Fold Change=-0.169
  7. VC2350: p-value=0.013; Average Fold Change=-2.402
  8. VCA0583: p-value=0.101; Average Fold Change=1.063

The results of our study and that of Merrell et al. (2002) were mixed. Out of the eight genes, we found 5 to be significant with a p-value less than 0.05 and a fold change greater that 1. However, the other three were not significant (particularly p>0.05).

Document Files:

text
excel
Exceptions
Expression Dataset
GO results file
GO w/ filter
MAPP not done yet
GO mapping file

Digital Notebook:

Th 10/10/2013

  1. Downloaded original data (Merrell_Compiled_Raw_Data_Vibrio.xls) from [[1]]
  2. Observed that the data collected had already been log transformed (there were negative numbers)
    • Meant we could begin at the normalization step
  3. Created a new sheet in Excel, copied in data from previous data sheet and titled it "scaled_centered".
  4. In scaled_centered, inserted two empty rows and calculated average and standard deviation for each replicate using the Excel AVERAGE and STDEV functions.
  5. Created a new column for each of the samples, relabeling them with a _sc (for scaled centered) after the name. Filled these columns with the scaled centered values calculated by taking the raw data minus the average for the sample (row 2) divided by the standard deviation (row 3).
    • ex: (B4-B$2)/B$3
    • This process served to normalize the data.
  6. Created a new worksheet called statistics and copied the ID column into the new worksheet.
  7. Pasted (using values only) the scaled and centered columns.
  8. Deleted the rows for average and standard deviation.
  9. Inserted columns to the right of the data for the average log fold change (FC) of patient and calculated the value by taking the average of the three technical replicates.
  10. Calculated the t-stat for each gene in a new column by taking the average of the three biological replicates divided by (the standard deviation of the biological replicates divided by the sq. root of the sample size (which was three) ).
    • ex: Average (N2:P2)/(STDDEV(N2:P2)/SQRT(3))
  11. Calculated the p-value in a new column by using Excel's TDIST function
    • ex: TDIST(ABS(R2),degrees of freedom,2)
      • R2 referred to the t-stat calculated earlier. There were 2 degrees freedom.
  12. Took an average FC for each of the three biological replicates in a new column.
  13. Copied into a new page titled forGenMAPP and inserted column 2 (System Code) where N was entered for each row.

Tu 10/15/2013

  1. Downloaded GenMAPP, my text files, and the 2009 Vibrio cholerae Gene Database
  2. Set the database to the downloaded 2009 Vibrio cholerae Gene Database
  3. In Expression Dataset Manager, created a new dataset by importing my txt for GenMAPP File
    • 772 errors reported, all of which were reported as gene not found (verified using excel filters)
  4. Created new color set (Pathogenic vs Lab)
    • Selected increasing items as those which had an AvgLogFC change > 0.25 and a p-value less than 0.05. ([AvgLogFC_all]>0.25 AND [Pvalue]<0.05)
    • Selected decreasing items using the same criteria, just an inversed AvgLogFC change ([AvgLogFC_all]<-0.25 AND [Pvalue]<0.05)
    • Colored increased as red, decreases as green, and no change as yellow
  5. Used GenMAPP tool MAPPFinder to create a table for decreases in gene expression, exported as DecreasedVibrio_2009_TV.

want z to be high and positive, higher than 2 means p is less than .05

comparing to zero which is the null hypothesis of no change. 1) magnitude of change observing 2) variation and number of replicates

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox