Difference between revisions of "Laurmagee: Week 8"

From LMU BioDB 2013
Jump to: navigation, search
(Journal to First Link)
(Journal to Second Link)
Line 84: Line 84:
 
===Part 2===
 
===Part 2===
 
* Launch GenMapp and note that the lower left-hand corner should display the gene database you would like to select.  
 
* Launch GenMapp and note that the lower left-hand corner should display the gene database you would like to select.  
* To find the gene database you want to use, select Data > Choose Gene Database. I used the gene database Vc-Std_External_20101022.gdb that can be downloaded here: follow this link to the XMLPipeDB SourceForge Download page.
+
* To find the gene database you want to use, select Data > Choose Gene Database. I used the gene database Vc-Std_External_20101022.gdb that can be downloaded here: [http://downloads.sourceforge.net/project/xmlpipedb/V.%20cholerae%20Gene%20Database/V.%20cholerae%2020101022/Vc-Std_External_20101022.zip?r=&ts=1382406289&use_mirror=heanet].
 
* Select the Data menu from the main Drafting Board window and choose Expression Dataset Manager from the drop-down list. The Expression Dataset Manager window will open.
 
* Select the Data menu from the main Drafting Board window and choose Expression Dataset Manager from the drop-down list. The Expression Dataset Manager window will open.
 
* Select New Dataset from the Expression Datasets menu. Select the tab-delimited text file that you formatted for GenMAPP (.txt) in Part 1.  
 
* Select New Dataset from the Expression Datasets menu. Select the tab-delimited text file that you formatted for GenMAPP (.txt) in Part 1.  
 
* The Data Type Specification window will appear. GenMAPP is expecting that you are providing numerical data. If any of your columns has text (character) data, you would check the box next to the field (column) name. In this case there are not any text data, so the box does not need to be checked.  
 
* The Data Type Specification window will appear. GenMAPP is expecting that you are providing numerical data. If any of your columns has text (character) data, you would check the box next to the field (column) name. In this case there are not any text data, so the box does not need to be checked.  
 
* Once the data has been converted, save your new .gex file:
 
* Once the data has been converted, save your new .gex file:
* [[File:Merrell_Compiled_Raw_Data_Vibrio_LM_20131010.EX.txt]]
 
 
*[[File:Merrell_Compiled_Raw_Data_Vibrio_LM_20131010.gex]]
 
*[[File:Merrell_Compiled_Raw_Data_Vibrio_LM_20131010.gex]]
 +
*A message may appear saying that the Expression Dataset Manager could not convert one or more lines of data. Lines that generate an error during the conversion of a raw data file are not added to the Expression Dataset. Instead, an exception file is created. The exception file is given the same name as your raw data file with .EX before the extension. The exception file will contain all of your raw data, with the addition of a column named “Error”. This column contains either error messages or, if the program finds no errors, a single-space character.
 +
* [[File:Merrell_Compiled_Raw_Data_Vibrio_LM_20131010.EX.txt]]
 +
* The above file notes 121 errors all due to an error message of "Gene not found in OrderedLocusNames or any related system". Since I am working with the 2010 database, I have less errors than my partner. This is most likely due to the fact that more genes have been added to the system, so many are now identifiable with the 2010 update.
 +
 
*[[File:New Data File -Decreased-Criterion1-GO.txt]]
 
*[[File:New Data File -Decreased-Criterion1-GO.txt]]
 
*[[File:New Data File -Decreased-Criterion1-GO.xls]]
 
*[[File:New Data File -Decreased-Criterion1-GO.xls]]

Revision as of 01:54, 22 October 2013

Part 1

  • First download the following data file: File:Merrell Compiled Raw Data Vibrio.xls
  • Insert a new spreadsheet in the label it "scaled_centered".
  • Select all the data in your previous worksheet and copy it into the new spreadsheet
  • Insert two rows in between the top row of headers and the first data row.
  • In cell A2, type "Average" and in cell A3, type "StdDev".
  • Compute the Average log ratio for each chip (each column of data). In cell B2, type the following equation: =AVERAGE(B4:B5224)
  • Compute the Standard Deviation of the log ratios on each chip (each column of data). In cell B3, type the following equation: =STDEV(B4:B5224)
  • Drag these lines into the rest of the columns, because Excel will automatically change the equation to match your specific column.
  • Copy the column headings for all of your data columns and then paste them to the right of the last data column so that you have a second set of headers above blank colums of cells. Edit the names of the columns so that they now read: A1_scaled_centered, A2_scaled_centered, etc.
  • In cell N4, type the following equation: =(B4-B$2)/B$3 The dollar signs are important because we will be using the same mean (B2) and standard deviation (B3) in all the succeeding columns values as well.
  • Copy and paste the scaling and centering equation for each of the columns of data with the "_scaled_centered" column header. Be sure that your equation is correct for the column you are calculating (i.e. change the values to match the cells with the correct average and standard deviation).
  • Insert a new worksheet and name it "statistics".
  • Go back to the "scaling_centering" worksheet and copy the first column ("ID"). And paste these values into the first column of your new spreadsheet.
  • Go back to the "scaling_centering" worksheet and copy the columns that are designated "_scaled_centered". Click on the B1 cell and select "Paste Special" from the Edit menu. A window will open: click on the radio button for "Values" and click OK.
  • Go to a new column on the right of your worksheet. Type the header "Avg_LogFC_A", "Avg_LogFC_B", and "Avg_LogFC_C" into the top cell of the next three columns.
  • Compute the average log fold change for the replicates for each patient by typing the equation: =AVERAGE(B2:E2) into cell N2. Copy this equation and paste it into the rest of the column.
  • Create the equation for patients B and C and paste it into their respective columns.
  • Now you will compute the average of the averages. Type the header "Avg_LogFC_all" into the first cell in the next empty column.
  • Create the equation that will compute the average of the three previous averages you calculated and paste it into this entire column.
  • Insert a new column labeled "Tstat".
  • Enter the equation: =AVERAGE(N2:P2)/(STDEV(N2:P2)/SQRT(3)) and paste it into all rows in that column.
  • Label the top cell in the next column "Pvalue" and enter the equation: =TDIST(ABS(R2),2,2) paste it into all rows in that column.
  • Insert a new worksheet and name it "forGenMAPP".
  • Go back to the "statistics" worksheet and Select All and Copy.
  • Go to your new sheet and click on cell A1 and select Paste Special, click on the Values radio button, and click OK.
  • Select Columns B through Q (all the fold changes). Select the menu item Format > Cells. Under the number tab, select 2 decimal places. Click OK.
  • Select Columns R and S. Select the menu item Format > Cells. Under the number tab, select 4 decimal places. Click OK.
  • Select Columns N through S and Cut. Select Column B by left-clicking on the "B" at the top of the column. Then right-click on the Column B header and select "Insert Cut Cells". This will insert the data without writing over your existing columns.
  • Insert a column to the right of the "ID" column. Type the header "SystemCode" into the top cell of this column. Fill the entire column (each cell) with the letter "N".
  • Select the menu item File > Save As, and choose "Text (Tab-delimited) (*.txt)" from the file type drop-down menu.
  • To make sure that we performed our data analysis correctly, first open your spreadsheet and go to the "forGenMAPP" tab.
  • Click on cell A1 and select the menu item Data > Filter > Autofilter.
  • Click on the drop-down arrow on your "Pvalue" column. Select "Custom". In the window that appears, set a criterion that will filter your data so that the Pvalue has to be less than 0.05.
  • How many genes have p value < 0.05?
    • 5221 genes.
  • What about p < 0.01?
    • 169 genes.
  • What about p < 0.001?
    • 19 genes.
  • What about p < 0.0001?
    • 1 gene.
  • Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change greater than zero. How many are there?
    • 300 genes
  • Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change less than zero. How many are there?
    • 454 genes
  • What about an average log fold change of > 0.25 and p < 0.05?
    • 281 genes
  • Or an average log fold change of < -0.25 and p < 0.05? (These are more realistic values for the fold change cut-offs because it represents about a 20% fold change which is about the level of detection of this technology.)
    • 442 genes
  • What criteria did Merrell et al. (2002) use to determine a significant gene expression change? How does it compare to our method?
    • The Statistical Analysis for Microarrays (SAM) program was used to determine a significant gene expression change, whereas we used Excel.
  • Merrell et al. (2002) report that genes with IDs: VC0028, VC0941, VC0869, VC0051, VC0647, VC0468, VC2350, and VCA0583 were all significantly changed in their data. Look these genes up in your spreadsheet. What are their fold changes and p values? Are they significantly changed in our analysis?
    1. VC0028
    • Fold Change: 1.37
    • P-value: 0.0466
    1. VC0941
    • Fold Change: -.21
    • P-value: 0.2467
    1. VC0869
    • Fold Change: 1.63
    • P-value: 0.0464
    1. VC0051
    • Fold Change: 1.79
    • P-value: 0.0152
    1. VC0647
    • Fold Change: -.78
    • P-value: 0.0776
    1. VC0468
    • Fold Change: -.01
    • P-value: 0.9443
    1. VC235
    • Fold Change: -1.12
    • P-value: 0.2265
    1. VCA0583
    • Fold Change: .50
    • P-value: 0.2768
  • These values were not significantly changed in our analysis.

Part 2

  • Launch GenMapp and note that the lower left-hand corner should display the gene database you would like to select.
  • To find the gene database you want to use, select Data > Choose Gene Database. I used the gene database Vc-Std_External_20101022.gdb that can be downloaded here: [1].
  • Select the Data menu from the main Drafting Board window and choose Expression Dataset Manager from the drop-down list. The Expression Dataset Manager window will open.
  • Select New Dataset from the Expression Datasets menu. Select the tab-delimited text file that you formatted for GenMAPP (.txt) in Part 1.
  • The Data Type Specification window will appear. GenMAPP is expecting that you are providing numerical data. If any of your columns has text (character) data, you would check the box next to the field (column) name. In this case there are not any text data, so the box does not need to be checked.
  • Once the data has been converted, save your new .gex file:
  • File:Merrell Compiled Raw Data Vibrio LM 20131010.gex
  • A message may appear saying that the Expression Dataset Manager could not convert one or more lines of data. Lines that generate an error during the conversion of a raw data file are not added to the Expression Dataset. Instead, an exception file is created. The exception file is given the same name as your raw data file with .EX before the extension. The exception file will contain all of your raw data, with the addition of a column named “Error”. This column contains either error messages or, if the program finds no errors, a single-space character.
  • File:Merrell Compiled Raw Data Vibrio LM 20131010.EX.txt
  • The above file notes 121 errors all due to an error message of "Gene not found in OrderedLocusNames or any related system". Since I am working with the 2010 database, I have less errors than my partner. This is most likely due to the fact that more genes have been added to the system, so many are now identifiable with the 2010 update.

Laurmagee (talk) 22:56, 17 October 2013 (PDT)

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox