*Download the file '''Merrell_Compiled_Raw_Data_Vibrio.xls''' from the [http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae Sample Microarray Analysis for ''Vibrio cholerae'' page].
 
*Download the file '''Merrell_Compiled_Raw_Data_Vibrio.xls''' from the [http://www.openwetware.org/wiki/BIOL398-01/S10:Sample_Microarray_Analysis_Vibrio_cholerae Sample Microarray Analysis for ''Vibrio cholerae'' page].
**Save the file with the following format for the filename: Merrell_Compiled_Raw_Data_Vibrio_<Initials>_<Date>.xls
+
**Save the file with the following format for the filename: Merrell_Compiled_Raw_Data_Vibrio_<Initials>_<Date>.xls. In my case, the filename is "Merrell_Compiled_Raw_Data_Vibrio_KS_20131010.xls".
 
*This file contains the Log<sub>2</sub> of Red Dye/Green Dye Normalized Ratio (Median) organized in the following manner:
 
*This file contains the Log<sub>2</sub> of Red Dye/Green Dye Normalized Ratio (Median) organized in the following manner:
 
**'''Patient A'''
 
**'''Patient A'''
 
*Go back to the "scaled_centered" worksheet and Select All and Copy the scaled and centered data.
 
*Go back to the "scaled_centered" worksheet and Select All and Copy the scaled and centered data.
 
:*To do so, click on the first cell of the data (cell O4). Then hold the Shift and Ctrl keys, hit the Right Arrow key, and then hit the Down Arrow key (making sure that you are still holding down the Shift and Ctrl keys). Then, Copy the selection.
 
:*To do so, click on the first cell of the data (cell O4). Then hold the Shift and Ctrl keys, hit the Right Arrow key, and then hit the Down Arrow key (making sure that you are still holding down the Shift and Ctrl keys). Then, Copy the selection.
*Go to the "statistics" worksheet and right click on cell B2. Go to the "Paste Special" option. A window will open: click on the radio button for "Values" and click OK. This pastes the data as numerical values rather than equations.  
+
*Go to the "statistics" worksheet and right click on cell B2. Highlight the "Paste Special..." option and then click on "Paste Special...". A window will open: click on the radio button for "Values" and click OK. This pastes the data as numerical values rather than equations.
 
*To the right of the data you just pasted into the worksheet, type the following headers into the first cell of the next three columns: "Avg_LogFC_A", "Avg_LogFC_B", and "Avg_LogFC_C".
 
*To the right of the data you just pasted into the worksheet, type the following headers into the first cell of the next three columns: "Avg_LogFC_A", "Avg_LogFC_B", and "Avg_LogFC_C".
*
+
*Compute the average log fold change for the replicates for each patient A by typing the following equation:
 +
=AVERAGE(B2:E2)
 +
:into cell N2 and hit Enter.
 +
:*Double click on the lower right hand corner of cell N2 to compute the average of the replicates for Patient A for the remainder of the genes.
 +
*Repeat the calculation for Patients B and C in their respective columns.
 +
*Type the header "Avg_LogFC_all" into the first cell in the next empty column (column Q). Compute the average of the averages by typing the following equation into cell Q2:
 +
=AVERAGE(N2:P2)
 +
:and hit Enter.
 +
:*Double click on the lower right hand corner of cell Q2 to compute the average of the averages for the rest of the genes.
 +
*Now, compute a T statistic to determine how much the average log fold change of all the patients deviates from 0, which corresponds to now change. Type the header "Tstat" into the first cell in the next empty column (column R). Type the following equation into cell R2:
 +
=Q2/(STDEV(N2:P2)/SQRT(COUNT(N2:P2))
 +
:and hit Enter. (The command COUNT() counts the number of patients in the experiment.)
 +
:*Double click on the lower right hand corner of cell R2 to compute the T statistic for the remainder of the genes.
 +
*Now, compute the P value to determine how significant is the deviation of the average log fold change of all the patients from 0. Type the header "Pvalue" into the first cell in the next empty column (column S). Type the following equation into cell S2:
 +
=TDIST(ABS(R2),COUNT(N2:P2)-1,2)
 +
:and hit Enter. Here, the command COUNT(N2:P2)-1 computes the degrees of freedom, which is one less the number of replicates. The "2" specifies that a two-tailed distribution is used to compute the p value.
 +
:*Double click on the lower right hand corner of cell R2 to compute the p value for the remainder of the genes.
 +
 
 +
===Format the data for GenMAPP===
 +
 
 +
*Insert a new worksheet and name it "forGenMAPP".
 +
*Go back to the "statistics" worksheet and Select All and Copy.
 +
*Go to the "forGenMAPP" worksheet and right click on cell A1. Highlight the "Paste Special..." option and then click on "Paste Special...". A window will open: click on the radio button for "Values" and click OK. This pastes the data as numerical values rather than equations.
 +
*Insert a column to the left of column B. Label this column (type into the first cell of the column) "SystemCode".
 +
*In cell B2, type "N". Double click on the lower right hand corner of cell B2 to fill the rest of the column with "N".
 +
*Save this worksheet as a tab-delimited file.
 +
:*Go to File > Save As.
 +
:*In the drop-down menu next to "Save as type:" select "Text(Tab Delimited)". Click "Save".
 +
:*Select "OK" or "Yes" for any error messages that may pop up.
 +
 
 +
==Sanity Check: Number of genes significantly changed==
 +
 
 +
*The number of genes with a
 +
:*p value < 0.05 is 948.
 +
:*p value < 0.01 is 235.
 +
:*p value < 0.001 is 24.
 +
:*p value < 0.0001 is 2.
 +
*Keeping the filter p value < 0.05:
 +
:*There are 352 genes with an average log fold change for all patients that is greater than 0.
 +
:*There are 596 genes with an average log fold change for all patients that is less than 0.
 +
:*There are 339 genes with an average log fold change for all patients that is greater than 0.25.
 +
:*There are 579 genes with an average log fold change for all patients that is less than -0.25.
 +
*To determine significant gene expression changes, Merrell et al. (2002) used the Statistical Analysis Microarray program to determine which genes had at least a twofold change in expression from the control.
 +
 
 +
==Sanity Check: Compare individual genes with known data==
 +
 
 +
In the data that I normalized,
 +
*'''VC0028''' has an average log fold change for all patients of 1.6526 and a p value of 0.0474.
 +
*'''VC0941''' has an average log fold change for all patients of 0.0934 and a p value of 0.6759.
 +
*'''VC0869''' has an average log fold change for all patients of 1.4990 and a p value of 0.0174.
 +
*'''VC0051''' has an average log fold change for all patients of 1.9218 and a p value of 0.0139.
 +
*'''VC0647''' has an average log fold change for all patients of -1.1126 and a p value of 0.0003.
 +
*'''VC0468''' has an average log fold change for all patients of -0.1686 and a p value of 0.3350.
 +
*'''VC2350''' has an average log fold change for all patients of -2.4029 and a p value of 0.0130.
 +
*'''VCA0583''' has an average log fold change for all patients of 1.0628 and a p value of 0.1011.
 +
Looking at the p values, VC0028, VC0869, VC0051, VC0647, and VC2350 are significantly changed in my analysis.
 +
 
 +
==Lab Journal: Analysis of ''Vibrio cholerae'' microarray data using GenMAPP and MAPPFinder==
 +
 
 +
*Installed GenMAPP Classic from this [http://www.genmapp.org/download_v2.1.php page] onto my computer.
 +
*Download the 2009 Gene Database for ''Vibrio cholerae'' [http://sourceforge.net/projects/xmlpipedb/files/V.%20cholerae%20Gene%20Database/V.%20cholerae%2020090622/Vc-Std_External_20090622.zip/download Vc-Std_External_20090622.gdb]
 +
:*Download the file to the folder C:\GenMAPP 2 Data\Gene Databases.
 +
 
 +
===Convert normalized microarray data using the GenMAPP Expression Dataset Manager===
 +
 
 +
*Launch GenMAPP 2.
 +
:*Look at the lower-left hand corner to see what gene database is loaded. For this assignment, the gene database "Vc-Std_External_20090622.gdb" should appear in the corner.
 +
:*If another database appears or if there is "No Gene Database", go to Data > Choose Gene Database and find the database you need to use.
 +
[[Image:GenMAPP Screenshot Load Gene Database KS 20131017.jpg|thumb|none|upright=3]]
 +
*Go to Data > Expression Dataset Manager.
 +
*In the window that pops up, go to Expression Datasets > New Dataset and open the tab-delimited file you created for GenMAPP.
 +
[[Image:GenMAPP Screenshot Load Expression Dataset KS 20131017.jpg|thumb|none|upright=3.5]]
 +
*In the "Data Type Specification" window that pops up, only check the box next to a column header if that column has character data. For the Merrell data set, do not check any boxes because all the data is numerical.
 +
[[Image:GenMAPP Screenshot Load Expression Dataset Data Type Specification KS 20131017.jpg|thumb|none|upright=3.5]]
 +
*Give the Expression Dataset Manager time to convert your data into a GEX file.
 +
:*An error message may appear that states that the Expression Dataset Manager was unable to convert some of the lines of the data. These lines of data are not incorporated into the Expression Dataset but rather recorded in an exception file that contains all of your raw data and an additional column called ~Error~.
 +
::*The exception file is a tab-delimited file with the suffix .EX appended to the name of the raw data file you loaded into the Expression Dataset Manager.
 +
:*Open the the exception file in Excel.
 +
:*Go to Data > Filter.
 +
:*To determine what the errors were for the rows that were not converted, locate the ~Errors~ column, click on the down arrow in the cell, and select the "Sort Z to A" option.
 +
::*'''Using the 2009 Gene Database, there were 772 errors, each of which was "Gene not found in OrderedLocusNames or any related system."'''
 +
::*'''It is likely that my buddy may have a different number of errors because she is using a newer gene database for ''Vibrio cholerae'' than I am. The newer database may include genes that were not included in the older database that are a part of the expression data.'''
 +
*Customize the new Expression Dataset by creating Color Sets, which contain the instructions to GenMAPP for displaying data on MAPPs.
 +
:*In the "Color Sets" section, type in "Pathogenic v lab" in the "Name" field.
 +
:*To specify what value appears next to each gene on a MAPP, select "Avg_LogFC_all" in the drop down menu in the "Gene Value" field.
 +
:*In the "Criteria Builder" section, click on the "New" button. Now, we will construct the criterion to query the data.
 +
:*We will set the criterion to query for all the genes that have a significant (i.e. [Pvalue] < 0.05) decrease in the average log fold change (i.e. [Avg_LogFC_all] < -0.25).
 +
::*In the menu under "Columns" in the "Criteria Builder" section, select "Avg_LogFC_all", which will then appear in the "Criterion" field.
 +
::*Under "Ops", click on the "<" operator. Then, type -0.25 (this will appear in the "Criterion" field).
 +
::*Under "Ops", click on the "AND" operator.
 +
::*In the menu under "Columns" in the "Criteria Builder" section, select "Pvalue".
 +
::*Under "Ops", click on the "<" operator. Then, type 0.05 (this will appear in the "Criterion" field).
 +
::*Enter a name for the criterion in the "Label in Legend" field (ex. "Decreased").
 +
::*Choose a color for the criterion by left-clicking on the box next to "Color". Choose a color from the Color window that appears and click OK.
 +
:#Once done specifying the criterion (look at the screenshot below to see an example of what is in each of the fields when specifying the criterion), click on the "Add" button.
 +
[[Image:GenMAPP Screenshot Construct Criterion to Query Dataset KS 20131017.jpg|thumb|none|upright=3.5]]
 +
*To add more criteria, repeat the steps mentioned above to specify a new criterion.
 +
:*To set a criterion to query for all the genes that have a significant increase in the average log fold change, the Criterion "field" should look like
 +
[Avg_LogFC_all] > 0.25 AND [Pvalue] < 0.05
 +
*Save the entire Expression Dataset by going to Expression Datasets > Save.
 +
*Exit the Expression Dataset to view the Color Sets on a MAPP.
 +
[[Image:GenMAPP Screenshot View Color Set on MAPP KS 20131017.jpg|thumb|none|upright=3]]
 +
 
 +
===MAPPFinder Procedure===
 +
 
 +
*Launch MAPPFinder from within GenMAPP by selection Tools > MAPPFinder.
 +
[[Image:GenMAPP Screenshot Launch MAPPFinder KS 20131017.jpg|thumb|none|upright=3]]
 +
*Click on the button "Calculate New Results".
 +
[[Image:GenMAPP Screenshot Calculate New Results in MAPPFinder KS 20131017.jpg|thumb|none|upright=3.5]]
 +
*Click on "Find File" and choose the the GEX file you created of your Expression Dataset and click OK.
 +
*Choose the Color Set and Criteria with which to filter the data. Click on "Decreased" criteria in the right-hand box.
 +
*Check the boxes next to "Gene Ontology" and "p value".
 +
*Click the "Browse" button and create a meaningful filename for your results (ex. "Merrell_Vibrio_Data_MAPPFinder_Analysis_Decreased_KS_20131017").
 +
*Click "Run MAPPFinder".
 +
[[Image:GenMAPP Screenshot MAPPFinder Select Criterion to Calculate Results KS 20131017.jpg|thumb|none|upright=3.5]]
 +
*When the results have been calculated, a Gene Ontology browser will open showing your results.
 +
[[Image:MAPPFinder Screenshots GO Table KS 20131017.jpg|thumb|none|upright=5]]
 +
:*To see a list of the most significant Gene Ontology terms, click on the menu item "Show Ranked List".
 +
::*'''The screenshot below shows the top 10 Gene Ontology terms from the results:'''
 +
[[Image:MAPPFinder Screenshots Top 10 Gene Ontology Terms KS 20131017.jpg|thumb|none|upright=4]]
 +
::*No, my buddy and I did not have the same top 10 GO terms. From comparing the GO result text files (the process to obtain them is described in later steps) between the two of us, I believe that this is a result of a discrepancy in setting the criterion for decreased expression in the Expression Dataset Manager.
 +
*In the main MAPPFinder Browser window, click on the button "Collapse the Tree". Then, you can search for the genes that were mentioned by Merrell et al. (2002), VC0028, VC0941, VC0869, VC0051, VC0647, VC0468, VC2350, and VCA0583.
 +
:*Type the identifier of one of the genes into the MAPPFinder browser gene ID search field.
 +
:*Choose "OrderedLocusNames" from the drop-down menu to the right of the search field.
 +
:*Click on the GeneID Search button. The GO term(s) that are associated with that gene will be highlighted in blue.
 +
:*'''Below are the genes that were found and the GO terms associated with them:'''
 +
::*'''VC0647''': mRNA catabolic process, RNA processing, cytoplasm, RNA binding, 3'-5' exonuclease activity, transferase activity, nucleotidyltransferase activity, polyribonucleotide nucelotidyltransferase activity
 +
::*'''VCA0583''': transport, outer membrane-bounded periplasmic space, transporter activity
 +
*Click on the '''RNA processing''' GO term, which is associated with the gene '''VC0647''', the expression of which did change significanly in the experiment (refer to the section ''Sanity Check: Compare individual genes with known data'' for the p value). A MAPP will open listing all of the genes (as boxes) associated with that GO term.
 +
:*To match the gene of interest to its identification go to the UniProt site and type in the ID for your gene into the search bar.
 +
:*In the MAPP, double click on the box '''PNP_VIBCH'''.
 +
:*An Internet Explorer window will pop up that has links to different pages for the gene in public databases.
 +
::*'''The VC0647 gene is involved in mRNA degradation. It hydrolyzes single-stranded polyribunucleotides in the 3'-5' direction. (As described in the gene entry in UniProt.)'''
 +
*In Windows, make a copy of the results (i.e. Merrell_Vibrio_Data_MAPPFinder_Analysis_Decreased_KS_20131017-Criterion0-GO.txt) file.
 +
*Open the copy of the results file in Excel.
 +
:*Comparing the results file between my buddy and I, it seems as if there is a discrepancy in the criterion for Avg_LogFC_All. This discrepancy resulted in different numbers for probes that satisfied the other criteria listed under the "Calculation Summary".
 +
*Click on a cell in the row of headers. On the tool bar, select Sort & Filter > Filter. Set the following filters:
 +
Z Score (in column N) greater than 2
 +
PermuteP (in column O) less than 0.05
 +
Number Changed (in column I) greater than or equal to 5 and less than 100
 +
Percent Changed (in column L) greater than or equal to 25
 +
*Save the file as a different Excel spreadsheet named, for example "Merrell_Vibrio_Data_MAPPFinder_Analysis_Decreased_KS_20131017-Criterion0-GO_Filtered", by selecting File > Save As and select Excel workbook (.xls) from the drop-down menu.
 +
*Use the MAPPFinder browser to determine which GO terms in the spreadsheet are closely related.
   −
The resulting Excel file can be downloaded [[Media:Merrell Compiled Raw Data Vibrio KS 20131010.xls|here]].
+
====Interpretation of GO Results====
   Unexpected non-MediaWiki exception encountered, of type "Error"
Error: Call to undefined function each() in /apps/xmlpipedb/biodb/fall2013/includes/diff/DairikiDiff.php:374
Stack trace:
#0 /apps/xmlpipedb/biodb/fall2013/includes/diff/DairikiDiff.php(480): _DiffEngine->_diag()
#1 /apps/xmlpipedb/biodb/fall2013/includes/diff/DairikiDiff.php(291): _DiffEngine->_compareseq()
#2 /apps/xmlpipedb/biodb/fall2013/includes/diff/DairikiDiff.php(175): _DiffEngine->diff_local()
#3 /apps/xmlpipedb/biodb/fall2013/includes/diff/DairikiDiff.php(653): _DiffEngine->diff()
#4 /apps/xmlpipedb/biodb/fall2013/includes/diff/DairikiDiff.php(820): Diff->__construct()
#5 /apps/xmlpipedb/biodb/fall2013/includes/diff/DairikiDiff.php(1240): MappedDiff->__construct()
#6 /apps/xmlpipedb/biodb/fall2013/includes/diff/DairikiDiff.php(1458): WordLevelDiff->__construct()
#7 /apps/xmlpipedb/biodb/fall2013/includes/diff/DairikiDiff.php(952): TableDiffFormatter->_changed()
#8 /apps/xmlpipedb/biodb/fall2013/includes/diff/DairikiDiff.php(924): DiffFormatter->_block()
#9 /apps/xmlpipedb/biodb/fall2013/includes/diff/DifferenceEngine.php(765): DiffFormatter->format()
#10 /apps/xmlpipedb/biodb/fall2013/includes/diff/DifferenceEngine.php(655): DifferenceEngine->generateDiffBody()
#11 /apps/xmlpipedb/biodb/fall2013/includes/diff/DifferenceEngine.php(593): DifferenceEngine->getDiffBody()
#12 /apps/xmlpipedb/biodb/fall2013/includes/diff/DifferenceEngine.php(566): DifferenceEngine->getDiff()
#13 /apps/xmlpipedb/biodb/fall2013/includes/diff/DifferenceEngine.php(409): DifferenceEngine->showDiff()
#14 /apps/xmlpipedb/biodb/fall2013/includes/Article.php(725): DifferenceEngine->showDiffPage()
#15 /apps/xmlpipedb/biodb/fall2013/includes/Article.php(478): Article->showDiffPage()
#16 /apps/xmlpipedb/biodb/fall2013/includes/actions/ViewAction.php(37): Article->view()
#17 /apps/xmlpipedb/biodb/fall2013/includes/Wiki.php(427): ViewAction->show()
#18 /apps/xmlpipedb/biodb/fall2013/includes/Wiki.php(304): MediaWiki->performAction()
#19 /apps/xmlpipedb/biodb/fall2013/includes/Wiki.php(536): MediaWiki->performRequest()
#20 /apps/xmlpipedb/biodb/fall2013/includes/Wiki.php(446): MediaWiki->main()
#21 /apps/xmlpipedb/biodb/fall2013/index.php(59): MediaWiki->run()
#22 {main}