Difference between revisions of "Malverso Week 8"

From LMU BioDB 2015
Jump to: navigation, search
(added comparison between results)
(inputted protocol for normalizing the log ratios)
Line 1: Line 1:
 +
==Electronic Lab Notebook==
 +
*Protocols came from [http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae part 1] and [http://www.openwetware.org/wiki/BIOL367/F10:GenMAPP_and_MAPPFinder_Protocols part 2]
 +
 +
==Part One==
 +
*The data from the Merrell et al. (2002) paper was accessed from the Stanford Microarray Database.
 +
*The Log2 of R/G Normalized Ratio (Median) has been copied from the raw data files downloaded from the Stanford Microarray Database.
 +
====Patient A====
 +
*Sample 1: 24047.xls (A1)
 +
*Sample 2: 24048.xls (A2)
 +
*Sample 3: 24213.xls (A3)
 +
*Sample 4: 24202.xls (A4)
 +
====Patient B====
 +
*Sample 5: 24049.xls (B1)
 +
*Sample 6: 24050.xls (B2)
 +
*Sample 7: 24203.xls (B3)
 +
*Sample 8: 24204.xls (B4)
 +
====Patient C====
 +
*Sample 9: 24053.xls (C1)
 +
*Sample 10: 24054.xls (C2)
 +
*Sample 11: 24205.xls (C3)
 +
*Sample 12: 24206.xls (C4)
 +
 +
*I downloaded the Merrell_Compiled_Raw_Data_Vibrio.xls file to my Desktop and saved it with my initials and the date.
 +
 +
===Normalizing the Log Ratios===
 +
*To scale and center the data I:
 +
**Inserted a new Worksheet into my Excel file, and named it "scaled_centered".
 +
**Going back to the "compiled_raw_data" worksheet, I clicked to select all and copy. I then went to the "scaled_centered" worksheet, click on the upper, left-hand cell (cell A1) and pasted the values.
 +
**I inserted two rows in between the top row of headers and the first data row.
 +
**In cell A2, I typed "Average" and in cell A3, "StdDev".
 +
*I then computed the Average log ratio for each chip (each column of data).
 +
**In cell B2, I typed =AVERAGE(B4:B5224)and then pressed Enter.
 +
*I then computed the Standard Deviation of the log ratios on each chip (each column of data).
 +
**In cell B3 I typed = STDEV(B4:B5224)and then pressed enter.
 +
**I then clicked on B2 and dragged it across all of the columns to copy the equation across all the data. I repeated this with B3 as well. Excel automatically changed the equation to match the cell designations for those columns.
 +
*I copied the column headings for all of my data columns and then pasted them to the right of the last data column so that there was a second set of headers above blank columns of cells. I Edited the names of the columns so that they read: A1_scaled_centered, A2_scaled_centered, etc.
 +
*In cell N4, I typed =(B4-B$2)/B$3 so that the data in cell B4 has the average subtracted from it (cell B2) and is divided by the standard deviation (cell B3). I used the dollar sign symbols in front of the "2" and "3" to tell Excel to always reference that row in the equation, even though I will pasted it for the entire column of 5221 genes.
 +
*Why is this important? '''This is important because if we didn’t use the dollar signs, Excel would assume that each cell should be subtracted by the cell two above it and divided by the cell directly above it instead of always by the average and standard deviation.'''
 +
*I copy and pasted this equation into the entire column by clicking on the original cell with my equation and position my cursor at the bottom right corner. When the curser changed to a thin black plus sign (not a chubby white one) I double clicked, and the formula magically copied to the entire column of genes.
 +
*I then copied and pasted the scaling and centering equation for each of the columns of data with the "_scaled_centered" column header, making sure to adjust the equations to pertain to their respective columns. 
 +
 +
 +
 +
 
==Files:==
 
==Files:==
  

Revision as of 05:54, 26 October 2015

Electronic Lab Notebook

Part One

  • The data from the Merrell et al. (2002) paper was accessed from the Stanford Microarray Database.
  • The Log2 of R/G Normalized Ratio (Median) has been copied from the raw data files downloaded from the Stanford Microarray Database.

Patient A

  • Sample 1: 24047.xls (A1)
  • Sample 2: 24048.xls (A2)
  • Sample 3: 24213.xls (A3)
  • Sample 4: 24202.xls (A4)

Patient B

  • Sample 5: 24049.xls (B1)
  • Sample 6: 24050.xls (B2)
  • Sample 7: 24203.xls (B3)
  • Sample 8: 24204.xls (B4)

Patient C

  • Sample 9: 24053.xls (C1)
  • Sample 10: 24054.xls (C2)
  • Sample 11: 24205.xls (C3)
  • Sample 12: 24206.xls (C4)
  • I downloaded the Merrell_Compiled_Raw_Data_Vibrio.xls file to my Desktop and saved it with my initials and the date.

Normalizing the Log Ratios

  • To scale and center the data I:
    • Inserted a new Worksheet into my Excel file, and named it "scaled_centered".
    • Going back to the "compiled_raw_data" worksheet, I clicked to select all and copy. I then went to the "scaled_centered" worksheet, click on the upper, left-hand cell (cell A1) and pasted the values.
    • I inserted two rows in between the top row of headers and the first data row.
    • In cell A2, I typed "Average" and in cell A3, "StdDev".
  • I then computed the Average log ratio for each chip (each column of data).
    • In cell B2, I typed =AVERAGE(B4:B5224)and then pressed Enter.
  • I then computed the Standard Deviation of the log ratios on each chip (each column of data).
    • In cell B3 I typed = STDEV(B4:B5224)and then pressed enter.
    • I then clicked on B2 and dragged it across all of the columns to copy the equation across all the data. I repeated this with B3 as well. Excel automatically changed the equation to match the cell designations for those columns.
  • I copied the column headings for all of my data columns and then pasted them to the right of the last data column so that there was a second set of headers above blank columns of cells. I Edited the names of the columns so that they read: A1_scaled_centered, A2_scaled_centered, etc.
  • In cell N4, I typed =(B4-B$2)/B$3 so that the data in cell B4 has the average subtracted from it (cell B2) and is divided by the standard deviation (cell B3). I used the dollar sign symbols in front of the "2" and "3" to tell Excel to always reference that row in the equation, even though I will pasted it for the entire column of 5221 genes.
  • Why is this important? This is important because if we didn’t use the dollar signs, Excel would assume that each cell should be subtracted by the cell two above it and divided by the cell directly above it instead of always by the average and standard deviation.
  • I copy and pasted this equation into the entire column by clicking on the original cell with my equation and position my cursor at the bottom right corner. When the curser changed to a thin black plus sign (not a chubby white one) I double clicked, and the formula magically copied to the entire column of genes.
  • I then copied and pasted the scaling and centering equation for each of the columns of data with the "_scaled_centered" column header, making sure to adjust the equations to pertain to their respective columns.



Files:

File:Merrell Compiled Raw Data Vibrio MA 20151015 (2).xls

File:Merrell Compiled Raw Data Vibrio MA 20151015 (2).txt

File:Merrell Compiled Raw Data Vibrio MA 20151015 (2).gex

File:Merrell Compiled Raw Data Vibrio MA 20151015 (2).EX.txt

Sanity Check: Number of Genes Significantly Changed

  • Number of genes with a p value less than .05 = 948. This is 18.16% of the genes.
  • Number of genes with a p value less than .01 = 235. This is 4.50% of the genes.
  • Number of genes with a p value less than .001 = 24. This is 0.46% of the genes.
  • Number of genes with a p value less than .0001 = 2. This is 0.04% of the genes.
  • Number of genes with a Bonferroni p value less than .05 = 0. This is 0% of the genes.
  • Number of genes with a B-H p value less than .05 = 0. This is 0% of the genes.
  • Avg_LogFC_all > 0 = 352. 6.7%
  • < 0 = 596. 11.34%
  • > .25 = 339. 17.83%
  • <-.25 = 579. 11.09%
  • "two-class SAM analysis was conducted" "with statistically significant changes in the level of expression - at least a twofold change" They used the actual level of expression change to figure out what was significant while we used p value (which is the probability that changes in expression are due to chance). Merrell et al. (2002) used a more stringent method because they found 237 genes that were significantly changed while we found 918. (-.25 < pvalue > .25)

Sanity Check: Compare Individual Genes with Known Data

  • VC0028 fold change: 1.65 pvalue:.0474 and 1.27 .0692
  • first is sig changed, second is not
  • VC0941 -.28 .1636 and .88 .0001
  • VC0869 Fold changes: 2.12, 1.50, 1.59, 1.95, 2.20

P value: 0.02, 0.0174, 0.0463, 0.0227, 0.002 All are significantly changed

  • VC0051 Fold change: 1.89, 1.92

P value: 0.016, 0.0139 Both are significantly changed

  • VC0647 fold change: -1.11 pvalue:.0003 and fold change:-0.94 pvalue:.0125 and fold change:-1.05 pvalue:.0051
  • VC0468 -.17 and pvalue:.3350
  • VC2350 fold change: -2.40 pvalue:.0130
  • VCA0583 fold change: 1.06 pvalue: .1011


Part Two

  • I used the 2010 database.
  • There were 121 errors.
  • Kristin used the 2009 database and 772 errors were detected.
  • My database is newer and therefore it makes sense that the number of errors has decreased from the previous year, because it makes sense that the change in the database from 2009 - 2010 was an improvement. The database I used probably has more entries and less bugs.
  • We were an increased pair, which I labeled red.
  • I labeled decreased with green.

Gene Ontology Results

  1. branched chain family amino acid metabolic process
  2. branched chain family amino acid biosynthetic process
  3. IMP metabolic process
  4. IMP biosynthetic process
  5. purine ribonucleoside monophosphate biosynthetic process
  6. purine ribonucleoside monophosphate metabolic process
  7. purine nucleoside monophosphate metabolic process
  8. purine nucleoside monophosphate biosynthetic process
  9. 'de novo' IMP biosynthetic process
  10. arginine metabolic process
  • Our results were completely different. There must have been some significant findings in the year 2009 that uncovered significant gene changes.


Team Page

Heavy Metal HaterZ

Assignments

Individual Journal Entries

Shared Journal Entries