Difference between revisions of "Running GenMAPP Builder"

From LMU BioDB 2013
Jump to: navigation, search
(Initial transcription of this page.)
 
(GOA: fixed vibrio link)
 
(19 intermediate revisions by 3 users not shown)
Line 1: Line 1:
These instructions assume that you are running GenMAPP Builder on the Windows XP Machines in the Seaver 120 computer lab.
+
This tutorial will take you through all of the steps for running GenMAPP Builder for the first time.
  
== Download and Extract GenMAPP Builder ==
+
== Pre-requisites ==
  
<!--* Go to the [http://sourceforge.net/projects/xmlpipedb/ XMLPipeDB SourceForge page] and download the latest version of GenMAPP Builder into the XMLPipeDB folder on the Desktop of the Keck Lab Windows Virtual Machine.-->
+
This tutorial assumes that you are working in a Windows environment. To run GenMAPP Builder under the Mac OS X or Linux, you need to use a Windows virtual machine. The end product, a GenMAPP-compatible Gene Database (''.gdb''), can only be used with the GenMAPP program, which can only be run on Windows.
* [[Media:Gmbuilder-2.0b59_Seaver120.zip | Click on this link to download the latest version of GenMAPP Builder to your desktop.]]
+
 
* Extract the files using the [http://www.7-zip.org/ 7-zip] utility which has been installed on these machines.
+
The Windows machines in the Keck Lab Annex have all of the software below loaded.  If you wish to run GenMAPP Builder and perform the quality control tests on your own computer, you will need to set up your working environment with:
** Right-click on the file and select the menu item, 7-zip > Extract here.
+
 
** Note that the native Windows unzip utility will not extract these files properly.
+
# Any tool that can unpack ''.gz'' and ''.zip'' files
** WinZip will extract them, too, but 7-zip will be faster.
+
#* We use [http://www.7-zip.org/ 7-zip]
 +
#* Note that we have found that the native Windows utility cannot reliably unpack ''.gz'' files or ''.zip'' files containing ''.jar'' files.  
 +
# [http://www.postgresql.org PostgreSQL] on Windows (http://www.enterprisedb.com/products-services-training/pgdownload)
 +
#* This tutorial was written using PostgreSQL 9.2.4.
 +
# GenMAPP Builder (http://sourceforge.net/projects/xmlpipedb/files/GenMAPP%20Builder/)
 +
#* Requires 32-bit Java JDK or JRE version 6 or higher (http://java.com/en/download/manual_v6.jsp)
 +
#* This particular application may get updates during the project, if groups catch issues with their specific datasets which require changes to the import/export process.  Thus, it is worthwhile to know how to download new versions of GenMAPP Builder as needed.
 +
# GenMAPP 2 (http://genmapp.org)
 +
#* GenMAPP 2 is now called “GenMAPP Classic” and can be downloaded [http://www.genmapp.org/download_v2.1.php here].
 +
# XMLPipeDB ''match'' utility (https://sourceforge.net/projects/xmlpipedb/files/) for counting IDs in XML files
 +
# Microsoft Access or any other tool that can read ''.mdb'' files
  
 
== Download and Extract Data Source Files ==
 
== Download and Extract Data Source Files ==
  
* Download UniProt XML and GOA files from [http://www.ebi.ac.uk/integr8/ Integr8]
+
Follow these instructions to download UniProt XML and GOA files.
** Browse to the page for your species.  For example, the [http://www.ebi.ac.uk/integr8/OrganismSelection.do?action=makeCurrent&proteomeId=46 ''Vibrio cholerae'' page].
+
 
** Click the [http://www.ebi.ac.uk/integr8/FtpSearch.do?orgProteomeId=46 Download] link.
+
=== UniProt XML ===
** Click to download the complete proteome set in XML format (save it to your Desktop).
+
 
*** These files are updated on a monthly basis.  The date of the last update can be found on the [http://www.ebi.ac.uk/integr8/HelpAction.do?action=searchById&refId=5 current status] page.  Note the release versions for Integr8, UniProt, and InterPro, and the release date as version control for your group.
+
# Go to the [http://www.uniprot.org/taxonomy/complete-proteomes UniProt Complete Proteomes] page.
** Right-click on the link to download the GO annotations and select "Save target as" or "Save link as" and save the GOA file.   
+
# Browse to the complete proteome download page for your species of interest.  For example, to get to ''Vibrio cholerae'' page, first click on the link to [http://www.uniprot.org/taxonomy/?query=complete:yes%20ancestor:2 "List all Bacteria"] under the Complete Proteome heading.
*** Note: Since the GOA file is a text file, your browser will not automatically download it when you left-click on the link.  Instead, it will try to open the file in your browser window.  Since it is a large file, this could take a long time if your internet connection is slow.
+
# Click through the results until you get to [http://www.uniprot.org/taxonomy/?query=complete%3ayes+ancestor%3a2&offset=1600 this page].
*** The version information can be found on the [http://www.ebi.ac.uk/GOA/ GOA] page under "GOA News"Record the version information for "GOA Proteome Sets" and the date they were released for version control for your group.
+
# Click on the link for “complete proteome set” or “complete reference set” for the organism of interest, e.g. [http://www.uniprot.org/uniprot/?query=organism:243277+keyword:1185 Vibrio cholerae serotype O1 (strain ATCC 39315 / El Tor Inaba N16961)].
* Download the GO OBO-XML formatted file from the [http://www.geneontology.org/GO.downloads.ontology.shtml Gene Ontology download page].  Click on the link for "obo-xml.gz"
+
# Click the orange [http://www.uniprot.org/uniprot/?query=organism%3a243277+keyword%3a1185&format=* Download] link in the upper right-hand corner of the page.
** This file is uploaded daily, the time will be stated near the upper-right side of the page under "Current ontology statistics" as Pacific Standard Time (PST)Note the version, date, and time listed there as version control for your group.
+
# Click to download the complete proteome set in [http://www.uniprot.org/uniprot/?query=organism%3a243277+keyword%3a1185&force=yes&format=xml XML format] (make sure that you are saving it to your local hard drive).
* Extract the UniProt XML and GO OBO-XML .gz files.
+
#* '''Note:''' You also have the option to download the data in a compressed ''.gz'' format.  Click on the [http://www.uniprot.org/uniprot/?query=organism%3a243277+keyword%3a1185&format=*&compress=yes compressed] link at the top of the page and then [http://www.uniprot.org/uniprot/?query=organism%3a243277+keyword%3a1185&compress=yes&format=xml click to download the complete proteome set].
** Right-click on the file and select the menu item 7-zip > Extract here.
+
<!--*** These files are updated on a monthly basis.  The date of the last update can be found on the [http://www.ebi.ac.uk/integr8/HelpAction.do?action=searchById&refId=5 current status] page.  It is a good idea to note the release versions for Integr8, UniProt, and InterPro, and the release date as this is your original data source.-->
 +
 
 +
=== GOA ===
 +
 
 +
# Go to the [http://www.ebi.ac.uk/GOA/downloads UniProt-GOA Downloads] page.
 +
# The current and previous UniProt-GOA files can be downloaded from the [http://ftp.ebi.ac.uk/pub/databases/GO/goa/ UniProt-GOA ftp site].
 +
# In the directory that appears, click the link to the [http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ “proteomes” directory].
 +
# Find your organism of interest and right-click on the link to download the GO annotations and select “Save target as” or “Save link as” and save the GOA file.  For example, [http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/46.V_cholerae_ATCC_39315.goa this is the link] for ''Vibrio cholerae''.
 +
#* '''Note:''' Since the GOA file is a text file, your browser will not automatically download it when you left-click on the link.  Instead, it will try to open the file in your browser window.  Because it is a large file, this could take a long time if your internet connection is slow.
 +
#* The version information can be found on displayed in the ftp file directory under the “Last modified” columnYou should record the version information for “GOA Proteome Sets” and the date they were released as this is your original data source.
 +
 
 +
*'''Note:Current directions are not working.  Follow these instructions for your respective species'''
 +
*From Running GenMAPP Builder page, clicked on the UniProt-GOA Downloads link.
 +
*Was given an error message. Changed url from "ftp" to "http" at beginning.
 +
*Was entered, was taken to Index of/pub/database/GO/goa
 +
*Clicked on "proteomes" folder
 +
*Directed to Index of /pub/databases/GO/goa/proteomes. Downloaded 58.R_meliloti.goa
 +
*Note: R. meliloti is an alternative name to S. Melitoti.
 +
 
 +
 
 +
'''Connectivity issue 2013-10-21:''' Direct downloads to the above site are currently not working, for as yet unknown reasonsAs a temporary workaround, the specific ''V. cholerae'' GOA file has been uploaded to this wiki.  For now, download from here instead: [[Media:46.V_cholerae_ATCC_39315.goa|46.V_cholerae_ATCC_39315.goa]]
 +
 
 +
=== GO OBO-XML ===
  
== Create New Database in Postgres ==
+
# Download the GO OBO-XML formatted file from the [http://www.geneontology.org/GO.downloads.ontology.shtml Gene Ontology download page].  Click on the link for ''obo-xml.gz''.
 +
#* Note that Gene Ontology has announced that they are making changes to the page listed above and that users should use a [http://beta.geneontology.org/page/download-ontology beta page here].
 +
#* This file is updated daily, the time used to be stated near the top of the page under “Current ontology statistics” as Pacific Standard Time (PST), although it does not appear to be there right now.  You can get the day/time file was created from the file properties after you have unzipped the file.
 +
# Extract the UniProt XML and GO OBO-XML .gz files using [http://www.7-zip.org/ 7-zip] or other utility.
  
* Launch PGAdmin III by going to Start menu > PostgreSQL 8.4 > pgAdmin III
+
== Create New Database in PostgreSQL ==
* Double-click on PostgreSQL 8.4 (localhost:5432) on the upper left hand side of the window.
+
** This is the equivalent of connecting you to the server.
+
** It will prompt you for a password.  The password is: <code>Phe1oSer3Tyr4</code>
+
* Right click on "Databases" and Select "New Database"
+
* Give the database a name in the "Name" field and click OK.  You now need to tell the database what tables to have. 
+
* Click on your new database icon in the treeview on the left.
+
* Go back to your gmbuilder-2.0b59 folder that you extracted to your Desktop.  Open the folder called "sql".  ''Right click the file gmbuilder.sql.  Select the menu item "Open with" and choose the Notepad program.'' Select All and Copy all to the clipboard.
+
* Back in PGAdmin III, select your database so it is highlighted and then click on the SQL pencil icon.
+
* ''You will see information in the upper panel (it is actually the SQL query that created the database).  Delete this information.'' Paste the gmbuilder.sql text into the upper panel.
+
* Click the run icon which looks like a green arrow.
+
* You should get a message like "Query returned successfully with no result in 15583 ms."  This query now created all the tables in the database (although there is still no data in them).
+
* Close the query window (you don't need to save the query because you have already run it).
+
* To double check that all is OK, open the + signs for the database to see the number of tables, it should be 140 (it becomes 141 later, after you complete the ''Process GO Data'' step [[#Importing_Data_into_the_Postgres_Database|below]]).
+
  
== Importing Data into the Postgres Database ==
+
'''Note:''' if you have already performed this step and want to use GenMAPP Builder functions with a database you previously created in PostgreSQL, you can skip this step.
  
* Launch GenMAPP Builder by double-clicking on the gmbuilder.bat file.
+
These steps might also feel familiar, and you’d be right—you did very similar things for the [[Week 6]] assignment.
* Select File--> Configure Database
+
**  "Host or address" should be "localhost" (it should default to that).
+
** "Port Number" should be "5432" (it should default to that).
+
** "Database Name" should be the name of the database you created in pgAdmin III in the previous steps.
+
** "Username" is "postgres" (you will need to type this the first time).
+
** "Password" is <code>Phe1oSer3Tyr4</code> (note that this is the same password you typed in pgAdmin III to connect to the database.
+
* Click on the OK button.
+
* Select File > Import UniProt XML...
+
** Navigate to the UniProt XML file that you extracted previously (it should be on the Desktop) and click the Import button.
+
** This should take about 5-7 minutes, record the time it took to import (the computer will report this to you when it is done).
+
* Select File > Import GO XML...
+
** Navigate to the GO OBO-XML file that you extracted previously (it should be on the Desktop).  Click the Import button.
+
** This should take about 5-7 minutes, record the time it took to import (the computer will report this to you when it is done).
+
* Click OK to the message asking you to process the GO data.  This should also take 5-7 minutes.  Record the time it took to process (the computer will report this to you when it is done).
+
* Select File > Import GOA file...
+
** Navigate to the .goa file that you downloaded (it should be on the Desktop) and click on the Import button.
+
** This should take less than 1 minute.  Record the time ti took to import (the computer will report this to you when it is done).
+
  
== Exporting a GenMAPP Gene Database (.gdb) ==
+
# Launch pgAdmin III.
 +
# Double-click on PostgreSQL 9.2 (localhost:5432) on the upper left hand side of the window.
 +
#* This is the equivalent of connecting you to the server and you may be asked for a password at this point.
 +
# Right click on ''Databases'' and select ''New Database...''
 +
# Give the database a name in the ''Name'' field and click OK.
 +
# Click on your new database name in the treeview on the left.
 +
# Click on the SQL icon in the toolbar at the top of the window.
 +
#* The SQL Editor tab will be open and there may be leftover query text in the upper pane.  Delete this text.  You are now going to use an XMLPipeDB query to create the tables in the database.
 +
# Click on the ''Open File'' icon in the toolbar (the yellow folder with an arrow).
 +
# Navigate to the folder in which you unzipped GenMAPP Builder.
 +
# Open the ''sql'' folder and open the file ''gmbuilder.sql''.  You should see SQL code appear in the SQL Editor tab.
 +
# Click the Execute Query icon which looks like a green “Play” triangle button.
 +
# You should get a series of NOTICE messages in the Messages tab at the bottom of the window, concluding with a message like “Query returned successfully with no result in 15583 ms” in the end.  This query now created all the tables in the database (although there is still no data in them).
 +
# Close the query window (you don’t need to save the query because you have already run it).
 +
# To double check that all is OK, click the + sign for the database, then the + sign for Schemas, then finally the + sign for public.  Under the Tables section, you should see a count of 159 in parentheses.
  
* Select File > Export to GenMAPP...
+
== Download or Update GenMAPP Builder ==
* Type a name in the Owner field (or else it won't let you export).  Dr. Dionisio and I use "LMU Bioinformatics Group" and click Next.
+
* Create GenMAPP Database: click on the Specify File button and accept the default folder and file name.  It will create the file in the export folder of your gmbuilder-2.0b59 folder.
+
** Leave the radio button on "Full Database" and click the Next button.
+
* Select "Move all" for both the upper and lower panels. Click Next.
+
* A window will ask you to verify the system tables.  Clicking Next in this window starts the import process.
+
** Record the starting and ending times from the black console window.  An export of the ''Vibrio cholerae'' Gene Database should take about 50-55 minutes.
+
*** The progress bar that appears is not really accurate.  It goes from 1% to 66% to 67% to 100%.  Just be patient.
+
  
== Complete a Gene Database Testing Report ==
+
# Visit the GenMAPP Builder folder on SourceForge (http://sourceforge.net/projects/xmlpipedb/files/GenMAPP%20Builder/).
 +
# If you do not yet have GenMAPP Builder, or if there is a more recent version of GenMAPP Builder than the one that you have, click on the release folder for that version.
 +
# Download the ''.zip'' file for that version of GenMAPP Builder.
 +
# Extract the GenMAPP Builder folder using [http://www.7-zip.org/ 7-zip] or other utility.
  
* [[Gene Database Testing Report Sample]]
+
== Configure GenMAPP Builder to Connect to your PostgreSQL Database ==
  
<!--== Appendices: Computer Use Instructions ==
+
# Launch ''gmbuilder-32bit.bat''
 +
#* If the program does not detect a database configuration, you will see a message window to this effect and the configuation dialog will open automatically once you close the message window.  Otherwise:
 +
# Select the menu item ''File > Configure Database...''
 +
# Under the Database Connections tab the Database Driver defaults to PostgreSQL.  Enter information in the following fields:
 +
#* Host or address: ''localhost''
 +
#* Port number: ''5432''
 +
#* Database name: ''<enter the name of the PostgreSQL database you created above>''
 +
#* Username: ''<enter the username of the PostgreSQL database you created above>''
 +
#* Password: ''<enter the password of the PostgreSQL database you created above>''
 +
# Click the OK button.
  
The following instructions sets should help you with getting to Windows on the Keck lab computers.
+
== Import Data into the PostgreSQL Database ==
  
=== Running Windows Directly on the Keck Lab Linux Machines ===
+
# Select ''File > Import UniProt XML...''
 +
#* Navigate to the UniProt XML file that you extracted previously and click the ''Open'' button.
 +
#* This should take about 5-10 minutes, but may take longer depending on the size of the file, processor speed, and available memory of the machine.  When the process has completed, record the elapsed time from the message window that appears.
 +
# Select ''File > Import GO OBO-XML...''
 +
#* Navigate to the GO OBO-XML file that you extracted previously.  Click the ''Open'' button.
 +
#* This should take about 5-10 minutes, but may take longer depending on the size of the file, processor speed, and available memory of the machine.  When the process has completed, record the elapsed time from the message window that appears.
 +
# Click OK to the message asking you to process the GO data.
 +
#* This should take about 5-10 minutes, but may take longer depending on the size of the file, processor speed, and available memory of the machine.  When the process has completed, record the elapsed time from the message window that appears.
 +
# Select ''File > Import GOA...''
 +
#* Navigate to the GOA file that you downloaded previously and click the ''Import'' button.  This process should only take a minute or so.
  
For the Keck lab machines, which predominantly run Linux, instructions for running Windows are slightly different.  In summary:
+
== Export a GenMAPP Gene Database (.gdb) ==
# Login ''not'' with your Keck lab account, but with username ''keckwindows'' and password ''keck''.
+
# From the ''Applications'' menu in the menu bar at the top of the screen, go to ''System Tools'' then run ''Sun VirtualBox''.
+
# On the window that appears, click on ''Start''.
+
# Finally on the Windows login, login as ''Keck Lab User'' with password ''keck''.
+
  
To shut down/quit:
+
# Select ''File > Export to GenMAPP Gene Database...''
# Shut down Windows from the ''Start'' menu.
+
# Type a name in the Owner field (or else it won’t let you export).
# Close the ''Sun VirtualBox'' window.
+
# GenMAPP Builder scans your PostgreSQL database to see what species are available.  Click on the species that you would like to export, then click ''Next'' to continue.
# Logout of the Linux account as usual.
+
# Create GenMAPP Database: click on the ''Save GenMAPP Database File As...'' button.  A default folder and file name are provided; modify these as needed then click on ''Save''.
 +
# Click the ''Next'' button.  This starts the import process.
 +
#* Record the starting and ending times from the black console window.  This will take 1-2 hours for a typical bacterial genome, depending on the size of the database, the processor speed, and available memory.  Large eukaryotic genomes (like ''Arabidopsis thaliana'') or genomes with many GO annotations (like ''Saccharomyces cerevisiae'') can take much longer, in the range of 12-24 hours.  '''Note:''' The progress bar is not accurate.
  
=== Running Windows Remotely from a Mac ===
+
== Check the Quality of your Exported Gene Database ==
  
If you have access mainly to Macs, you can still connect remotely to a Keck lab machine if you have an Internet connection.  Do the following:
+
Now you need to check the quality of your exported Gene Database to make sure that all of the data from the XML files made it into the PostgreSQL database and was then exported to the GenMAPP Gene Database. We have created a [[Gene Database Testing Report Sample]] to help guide you through this process.
# First, take note of a machine name in the Keck lab (they are labeled, so drop by sometime and note the name) &mdash; for these instructions, suppose that the name is ''rayner''.
+
# Run the ''Terminal'' application, located in the ''Utilities'' folder inside the ''Applications'' folder.
+
# Type the command: <pre>ssh -X keckwindows@rayner.cs.lmu.edu</pre> Note the machine name ''rayner'' &mdash; this may be different depending on the machine you want to use.
+
# When you reach the command line, type: <pre>VirtualBox</pre>
+
# The ''Sun VirtualBox'' window should now appear on your Mac.
+
# From this point on, you can follow the instructions from step 3 onward in the previous section.
+
# When you’ve closed the ''Sun VirtualBox'' window, you can logout from ''Terminal'', and you can quit the ''X11'' application that appeared on your Mac.
+
-->
+

Latest revision as of 17:42, 16 June 2015

This tutorial will take you through all of the steps for running GenMAPP Builder for the first time.

Contents

[edit] Pre-requisites

This tutorial assumes that you are working in a Windows environment. To run GenMAPP Builder under the Mac OS X or Linux, you need to use a Windows virtual machine. The end product, a GenMAPP-compatible Gene Database (.gdb), can only be used with the GenMAPP program, which can only be run on Windows.

The Windows machines in the Keck Lab Annex have all of the software below loaded. If you wish to run GenMAPP Builder and perform the quality control tests on your own computer, you will need to set up your working environment with:

  1. Any tool that can unpack .gz and .zip files
    • We use 7-zip
    • Note that we have found that the native Windows utility cannot reliably unpack .gz files or .zip files containing .jar files.
  2. PostgreSQL on Windows (http://www.enterprisedb.com/products-services-training/pgdownload)
    • This tutorial was written using PostgreSQL 9.2.4.
  3. GenMAPP Builder (http://sourceforge.net/projects/xmlpipedb/files/GenMAPP%20Builder/)
    • Requires 32-bit Java JDK or JRE version 6 or higher (http://java.com/en/download/manual_v6.jsp)
    • This particular application may get updates during the project, if groups catch issues with their specific datasets which require changes to the import/export process. Thus, it is worthwhile to know how to download new versions of GenMAPP Builder as needed.
  4. GenMAPP 2 (http://genmapp.org)
    • GenMAPP 2 is now called “GenMAPP Classic” and can be downloaded here.
  5. XMLPipeDB match utility (https://sourceforge.net/projects/xmlpipedb/files/) for counting IDs in XML files
  6. Microsoft Access or any other tool that can read .mdb files

[edit] Download and Extract Data Source Files

Follow these instructions to download UniProt XML and GOA files.

[edit] UniProt XML

  1. Go to the UniProt Complete Proteomes page.
  2. Browse to the complete proteome download page for your species of interest. For example, to get to Vibrio cholerae page, first click on the link to "List all Bacteria" under the Complete Proteome heading.
  3. Click through the results until you get to this page.
  4. Click on the link for “complete proteome set” or “complete reference set” for the organism of interest, e.g. Vibrio cholerae serotype O1 (strain ATCC 39315 / El Tor Inaba N16961).
  5. Click the orange Download link in the upper right-hand corner of the page.
  6. Click to download the complete proteome set in XML format (make sure that you are saving it to your local hard drive).

[edit] GOA

  1. Go to the UniProt-GOA Downloads page.
  2. The current and previous UniProt-GOA files can be downloaded from the UniProt-GOA ftp site.
  3. In the directory that appears, click the link to the “proteomes” directory.
  4. Find your organism of interest and right-click on the link to download the GO annotations and select “Save target as” or “Save link as” and save the GOA file. For example, this is the link for Vibrio cholerae.
    • Note: Since the GOA file is a text file, your browser will not automatically download it when you left-click on the link. Instead, it will try to open the file in your browser window. Because it is a large file, this could take a long time if your internet connection is slow.
    • The version information can be found on displayed in the ftp file directory under the “Last modified” column. You should record the version information for “GOA Proteome Sets” and the date they were released as this is your original data source.
  • Note:Current directions are not working. Follow these instructions for your respective species
  • From Running GenMAPP Builder page, clicked on the UniProt-GOA Downloads link.
  • Was given an error message. Changed url from "ftp" to "http" at beginning.
  • Was entered, was taken to Index of/pub/database/GO/goa
  • Clicked on "proteomes" folder
  • Directed to Index of /pub/databases/GO/goa/proteomes. Downloaded 58.R_meliloti.goa
  • Note: R. meliloti is an alternative name to S. Melitoti.


Connectivity issue 2013-10-21: Direct downloads to the above site are currently not working, for as yet unknown reasons. As a temporary workaround, the specific V. cholerae GOA file has been uploaded to this wiki. For now, download from here instead: 46.V_cholerae_ATCC_39315.goa

[edit] GO OBO-XML

  1. Download the GO OBO-XML formatted file from the Gene Ontology download page. Click on the link for obo-xml.gz.
    • Note that Gene Ontology has announced that they are making changes to the page listed above and that users should use a beta page here.
    • This file is updated daily, the time used to be stated near the top of the page under “Current ontology statistics” as Pacific Standard Time (PST), although it does not appear to be there right now. You can get the day/time file was created from the file properties after you have unzipped the file.
  2. Extract the UniProt XML and GO OBO-XML .gz files using 7-zip or other utility.

[edit] Create New Database in PostgreSQL

Note: if you have already performed this step and want to use GenMAPP Builder functions with a database you previously created in PostgreSQL, you can skip this step.

These steps might also feel familiar, and you’d be right—you did very similar things for the Week 6 assignment.

  1. Launch pgAdmin III.
  2. Double-click on PostgreSQL 9.2 (localhost:5432) on the upper left hand side of the window.
    • This is the equivalent of connecting you to the server and you may be asked for a password at this point.
  3. Right click on Databases and select New Database...
  4. Give the database a name in the Name field and click OK.
  5. Click on your new database name in the treeview on the left.
  6. Click on the SQL icon in the toolbar at the top of the window.
    • The SQL Editor tab will be open and there may be leftover query text in the upper pane. Delete this text. You are now going to use an XMLPipeDB query to create the tables in the database.
  7. Click on the Open File icon in the toolbar (the yellow folder with an arrow).
  8. Navigate to the folder in which you unzipped GenMAPP Builder.
  9. Open the sql folder and open the file gmbuilder.sql. You should see SQL code appear in the SQL Editor tab.
  10. Click the Execute Query icon which looks like a green “Play” triangle button.
  11. You should get a series of NOTICE messages in the Messages tab at the bottom of the window, concluding with a message like “Query returned successfully with no result in 15583 ms” in the end. This query now created all the tables in the database (although there is still no data in them).
  12. Close the query window (you don’t need to save the query because you have already run it).
  13. To double check that all is OK, click the + sign for the database, then the + sign for Schemas, then finally the + sign for public. Under the Tables section, you should see a count of 159 in parentheses.

[edit] Download or Update GenMAPP Builder

  1. Visit the GenMAPP Builder folder on SourceForge (http://sourceforge.net/projects/xmlpipedb/files/GenMAPP%20Builder/).
  2. If you do not yet have GenMAPP Builder, or if there is a more recent version of GenMAPP Builder than the one that you have, click on the release folder for that version.
  3. Download the .zip file for that version of GenMAPP Builder.
  4. Extract the GenMAPP Builder folder using 7-zip or other utility.

[edit] Configure GenMAPP Builder to Connect to your PostgreSQL Database

  1. Launch gmbuilder-32bit.bat
    • If the program does not detect a database configuration, you will see a message window to this effect and the configuation dialog will open automatically once you close the message window. Otherwise:
  2. Select the menu item File > Configure Database...
  3. Under the Database Connections tab the Database Driver defaults to PostgreSQL. Enter information in the following fields:
    • Host or address: localhost
    • Port number: 5432
    • Database name: <enter the name of the PostgreSQL database you created above>
    • Username: <enter the username of the PostgreSQL database you created above>
    • Password: <enter the password of the PostgreSQL database you created above>
  4. Click the OK button.

[edit] Import Data into the PostgreSQL Database

  1. Select File > Import UniProt XML...
    • Navigate to the UniProt XML file that you extracted previously and click the Open button.
    • This should take about 5-10 minutes, but may take longer depending on the size of the file, processor speed, and available memory of the machine. When the process has completed, record the elapsed time from the message window that appears.
  2. Select File > Import GO OBO-XML...
    • Navigate to the GO OBO-XML file that you extracted previously. Click the Open button.
    • This should take about 5-10 minutes, but may take longer depending on the size of the file, processor speed, and available memory of the machine. When the process has completed, record the elapsed time from the message window that appears.
  3. Click OK to the message asking you to process the GO data.
    • This should take about 5-10 minutes, but may take longer depending on the size of the file, processor speed, and available memory of the machine. When the process has completed, record the elapsed time from the message window that appears.
  4. Select File > Import GOA...
    • Navigate to the GOA file that you downloaded previously and click the Import button. This process should only take a minute or so.

[edit] Export a GenMAPP Gene Database (.gdb)

  1. Select File > Export to GenMAPP Gene Database...
  2. Type a name in the Owner field (or else it won’t let you export).
  3. GenMAPP Builder scans your PostgreSQL database to see what species are available. Click on the species that you would like to export, then click Next to continue.
  4. Create GenMAPP Database: click on the Save GenMAPP Database File As... button. A default folder and file name are provided; modify these as needed then click on Save.
  5. Click the Next button. This starts the import process.
    • Record the starting and ending times from the black console window. This will take 1-2 hours for a typical bacterial genome, depending on the size of the database, the processor speed, and available memory. Large eukaryotic genomes (like Arabidopsis thaliana) or genomes with many GO annotations (like Saccharomyces cerevisiae) can take much longer, in the range of 12-24 hours. Note: The progress bar is not accurate.

[edit] Check the Quality of your Exported Gene Database

Now you need to check the quality of your exported Gene Database to make sure that all of the data from the XML files made it into the PostgreSQL database and was then exported to the GenMAPP Gene Database. We have created a Gene Database Testing Report Sample to help guide you through this process.

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox