Difference between revisions of "Malverso Week 8"

Revision as of 05:54, 26 October 2015

Electronic Lab Notebook

Protocols came from part 1 and part 2

Part One

The data from the Merrell et al. (2002) paper was accessed from the Stanford Microarray Database.
The Log2 of R/G Normalized Ratio (Median) has been copied from the raw data files downloaded from the Stanford Microarray Database.

Patient A

Sample 1: 24047.xls (A1)
Sample 2: 24048.xls (A2)
Sample 3: 24213.xls (A3)
Sample 4: 24202.xls (A4)

Patient B

Sample 5: 24049.xls (B1)
Sample 6: 24050.xls (B2)
Sample 7: 24203.xls (B3)
Sample 8: 24204.xls (B4)

Patient C

Sample 9: 24053.xls (C1)
Sample 10: 24054.xls (C2)
Sample 11: 24205.xls (C3)
Sample 12: 24206.xls (C4)

I downloaded the Merrell_Compiled_Raw_Data_Vibrio.xls file to my Desktop and saved it with my initials and the date.

Normalizing the Log Ratios

To scale and center the data I:
- Inserted a new Worksheet into my Excel file, and named it "scaled_centered".
- Going back to the "compiled_raw_data" worksheet, I clicked to select all and copy. I then went to the "scaled_centered" worksheet, click on the upper, left-hand cell (cell A1) and pasted the values.
- I inserted two rows in between the top row of headers and the first data row.
- In cell A2, I typed "Average" and in cell A3, "StdDev".
I then computed the Average log ratio for each chip (each column of data).
- In cell B2, I typed =AVERAGE(B4:B5224)and then pressed Enter.
I then computed the Standard Deviation of the log ratios on each chip (each column of data).
- In cell B3 I typed = STDEV(B4:B5224)and then pressed enter.
- I then clicked on B2 and dragged it across all of the columns to copy the equation across all the data. I repeated this with B3 as well. Excel automatically changed the equation to match the cell designations for those columns.
I copied the column headings for all of my data columns and then pasted them to the right of the last data column so that there was a second set of headers above blank columns of cells. I Edited the names of the columns so that they read: A1_scaled_centered, A2_scaled_centered, etc.
In cell N4, I typed =(B4-B$2)/B$3 so that the data in cell B4 has the average subtracted from it (cell B2) and is divided by the standard deviation (cell B3). I used the dollar sign symbols in front of the "2" and "3" to tell Excel to always reference that row in the equation, even though I will pasted it for the entire column of 5221 genes.
Why is this important? This is important because if we didn’t use the dollar signs, Excel would assume that each cell should be subtracted by the cell two above it and divided by the cell directly above it instead of always by the average and standard deviation.
I copy and pasted this equation into the entire column by clicking on the original cell with my equation and position my cursor at the bottom right corner. When the curser changed to a thin black plus sign (not a chubby white one) I double clicked, and the formula magically copied to the entire column of genes.
I then copied and pasted the scaling and centering equation for each of the columns of data with the "_scaled_centered" column header, making sure to adjust the equations to pertain to their respective columns.

Files:

File:Merrell Compiled Raw Data Vibrio MA 20151015 (2).xls

File:Merrell Compiled Raw Data Vibrio MA 20151015 (2).txt

File:Merrell Compiled Raw Data Vibrio MA 20151015 (2).gex

File:Merrell Compiled Raw Data Vibrio MA 20151015 (2).EX.txt

Sanity Check: Number of Genes Significantly Changed

Number of genes with a p value less than .05 = 948. This is 18.16% of the genes.
Number of genes with a p value less than .01 = 235. This is 4.50% of the genes.
Number of genes with a p value less than .001 = 24. This is 0.46% of the genes.
Number of genes with a p value less than .0001 = 2. This is 0.04% of the genes.
Number of genes with a Bonferroni p value less than .05 = 0. This is 0% of the genes.
Number of genes with a B-H p value less than .05 = 0. This is 0% of the genes.
Avg_LogFC_all > 0 = 352. 6.7%
< 0 = 596. 11.34%
> .25 = 339. 17.83%
<-.25 = 579. 11.09%
"two-class SAM analysis was conducted" "with statistically significant changes in the level of expression - at least a twofold change" They used the actual level of expression change to figure out what was significant while we used p value (which is the probability that changes in expression are due to chance). Merrell et al. (2002) used a more stringent method because they found 237 genes that were significantly changed while we found 918. (-.25 < pvalue > .25)

Sanity Check: Compare Individual Genes with Known Data

VC0028 fold change: 1.65 pvalue:.0474 and 1.27 .0692
first is sig changed, second is not
VC0941 -.28 .1636 and .88 .0001
VC0869 Fold changes: 2.12, 1.50, 1.59, 1.95, 2.20

P value: 0.02, 0.0174, 0.0463, 0.0227, 0.002 All are significantly changed

VC0051 Fold change: 1.89, 1.92

P value: 0.016, 0.0139 Both are significantly changed

VC0647 fold change: -1.11 pvalue:.0003 and fold change:-0.94 pvalue:.0125 and fold change:-1.05 pvalue:.0051
VC0468 -.17 and pvalue:.3350
VC2350 fold change: -2.40 pvalue:.0130
VCA0583 fold change: 1.06 pvalue: .1011

Part Two

I used the 2010 database.
There were 121 errors.
Kristin used the 2009 database and 772 errors were detected.
My database is newer and therefore it makes sense that the number of errors has decreased from the previous year, because it makes sense that the change in the database from 2009 - 2010 was an improvement. The database I used probably has more entries and less bugs.
We were an increased pair, which I labeled red.
I labeled decreased with green.

Gene Ontology Results

branched chain family amino acid metabolic process
branched chain family amino acid biosynthetic process
IMP metabolic process
IMP biosynthetic process
purine ribonucleoside monophosphate biosynthetic process
purine ribonucleoside monophosphate metabolic process
purine nucleoside monophosphate metabolic process
purine nucleoside monophosphate biosynthetic process
'de novo' IMP biosynthetic process
arginine metabolic process

Our results were completely different. There must have been some significant findings in the year 2009 that uncovered significant gene changes.

Team Page

Heavy Metal HaterZ

Assignments

Individual Journal Entries

Shared Journal Entries

@@ Line 1: / Line 1: @@
+==Electronic Lab Notebook==
+*Protocols came from [http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae part 1] and [http://www.openwetware.org/wiki/BIOL367/F10:GenMAPP_and_MAPPFinder_Protocols part 2]
+==Part One==
+*The data from the Merrell et al. (2002) paper was accessed from the Stanford Microarray Database.
+*The Log2 of R/G Normalized Ratio (Median) has been copied from the raw data files downloaded from the Stanford Microarray Database.
+====Patient A====
+*Sample 1: 24047.xls (A1)
+*Sample 2: 24048.xls (A2)
+*Sample 3: 24213.xls (A3)
+*Sample 4: 24202.xls (A4)
+====Patient B====
+*Sample 5: 24049.xls (B1)
+*Sample 6: 24050.xls (B2)
+*Sample 7: 24203.xls (B3)
+*Sample 8: 24204.xls (B4)
+====Patient C====
+*Sample 9: 24053.xls (C1)
+*Sample 10: 24054.xls (C2)
+*Sample 11: 24205.xls (C3)
+*Sample 12: 24206.xls (C4)
+*I downloaded the Merrell_Compiled_Raw_Data_Vibrio.xls file to my Desktop and saved it with my initials and the date.
+===Normalizing the Log Ratios===
+*To scale and center the data I:
+**Inserted a new Worksheet into my Excel file, and named it "scaled_centered".
+**Going back to the "compiled_raw_data" worksheet, I clicked to select all and copy. I then went to the "scaled_centered" worksheet, click on the upper, left-hand cell (cell A1) and pasted the values.
+**I inserted two rows in between the top row of headers and the first data row.
+**In cell A2, I typed "Average" and in cell A3, "StdDev".
+*I then computed the Average log ratio for each chip (each column of data).
+**In cell B2, I typed =AVERAGE(B4:B5224)and then pressed Enter.
+*I then computed the Standard Deviation of the log ratios on each chip (each column of data).
+**In cell B3 I typed = STDEV(B4:B5224)and then pressed enter.
+**I then clicked on B2 and dragged it across all of the columns to copy the equation across all the data. I repeated this with B3 as well. Excel automatically changed the equation to match the cell designations for those columns.
+*I copied the column headings for all of my data columns and then pasted them to the right of the last data column so that there was a second set of headers above blank columns of cells. I Edited the names of the columns so that they read: A1_scaled_centered, A2_scaled_centered, etc.
+*In cell N4, I typed =(B4-B$2)/B$3 so that the data in cell B4 has the average subtracted from it (cell B2) and is divided by the standard deviation (cell B3). I used the dollar sign symbols in front of the "2" and "3" to tell Excel to always reference that row in the equation, even though I will pasted it for the entire column of 5221 genes.
+*Why is this important? '''This is important because if we didn’t use the dollar signs, Excel would assume that each cell should be subtracted by the cell two above it and divided by the cell directly above it instead of always by the average and standard deviation.'''
+*I copy and pasted this equation into the entire column by clicking on the original cell with my equation and position my cursor at the bottom right corner. When the curser changed to a thin black plus sign (not a chubby white one) I double clicked, and the formula magically copied to the entire column of genes.
+*I then copied and pasted the scaling and centering equation for each of the columns of data with the "_scaled_centered" column header, making sure to adjust the equations to pertain to their respective columns.
 ==Files:==

Difference between revisions of "Malverso Week 8"

Revision as of 05:54, 26 October 2015

Contents

Electronic Lab Notebook

Part One

Patient A

Patient B

Patient C

Normalizing the Log Ratios

Files:

Sanity Check: Number of Genes Significantly Changed

Sanity Check: Compare Individual Genes with Known Data

Part Two

Gene Ontology Results

Team Page

Assignments

Individual Journal Entries

Shared Journal Entries

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools