Difference between revisions of "Ksherbina Week 9"

From LMU BioDB 2013
Jump to: navigation, search
(Export a GenMAPP Gene Database Using GenMAPP Builder: Added screenshot)
(Export a GenMAPP Gene Database Using GenMAPP Builder: Added some screenshots.)
Line 147: Line 147:
 
:*Type a name in the Owner field (ex. LMU_Fall2013_BIOL367_KS).
 
:*Type a name in the Owner field (ex. LMU_Fall2013_BIOL367_KS).
 
:*GenMAPP Builder scans your PostgreSQL database to see what species are available. Click on the species that you would like to export, then click Next to continue.
 
:*GenMAPP Builder scans your PostgreSQL database to see what species are available. Click on the species that you would like to export, then click Next to continue.
 +
[[Image:Using the Export Wizard in GenMAPP Builder KS 20131024.jpg|thumb|none|upright=4]]
 
:*Click on the ''Save GenMAPP Database File As...'' button. Make sure that the ''Downloads'' folder appears. Modify the default file name to include your initials and then click on Save.
 
:*Click on the ''Save GenMAPP Database File As...'' button. Make sure that the ''Downloads'' folder appears. Modify the default file name to include your initials and then click on Save.
 +
[[Image:Using the Export Wizard in GenMAPP Builder Second Window KS 20131024.jpg|thumb|none|upright=4]]
 
*Click the Next button to start the export process.
 
*Click the Next button to start the export process.
  

Revision as of 06:04, 25 October 2013

Katrina Sherbina
Class Page    User Page
Assignment Description Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 15
Class Journal Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9
Individual Journal Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11
Other Week 5: Database Wiki
Final Project Team H(oo)KD Project Page Journal Club Presentation Project Individual Journal

Contents

Gene Database Testing Report

The following records the results of going through the procedure outlined in the Lab Notebook section below.

Export Information

Version of GenMAPP Builder: 2.0b70

Computer on which export was run: Personal computer

Postgres Database name: VC_KS_20131022_gmb2b70

UniProt XML filename: UniProt_V_cholerae_KSTV_20131022.xml

  • UniProt XML version (The version information can be found at the UniProt News Page): UniProt release 2013_10
  • Time taken to import: 4.20 minutes

GO OBO-XML filename: go_daily-termdb_KS_20131022.obo-xml

  • GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the GO Download page has been unzipped):
  • Time taken to import: 8.81 minutes
  • Time taken to process: 7.33 minutes

GOA filename: 46.V_cholerae_ATCC_39315_KS_20131022.goa

  • GOA version (News on this page records past releases; current information can be found in the Last modified field on the FTP site): Latest version released 10/16/2013
  • Time taken to import: 0.10 minutes

Name of .gdb file: Vc-Std_KS_20131022.gdb

  • Time taken to export .gdb:
  • Start Time: 6:50 pm
  • End Time: 9:22 pm

Note:

TallyEngine

Run the TallyEngine in GenMAPP Builder and record the number of records for UniProt and GO in the XML data and in the PostgreSQL databases (or you can upload and link to a screenshot of the results).

Using XMLPipeDB match to Validate the XML Results from the TallyEngine

Follow the instructions found on this page to run XMLPipeDB match.

Are your results the same as you got for the TallyEngine? Why or why not?

Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine

Follow the instructions on this page to query the PostgreSQL Database.

OriginalRowCounts Comparison

Within the .gdb file, look at the OriginalRowCounts table to see if the database has the expected tables with the expected number of records. Compare the tables and records with a benchmark .gdb file.

Benchmark .gdb file: (for the Week 9 Assignment, use the "Vc-Std_External_20101022.gdb" as your benchmark, downloadable from here.

Copy the OriginalRowCounts table and paste it here:

Note:

Visual Inspection

Perform visual inspection of individual tables to see if there are any problems.

  • Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?
  • Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?

Note:

.gdb Use in GenMAPP

Note:

Putting a gene on the MAPP using the GeneFinder window

  • Try a sample ID from each of the gene ID systems. Open the Backpage and see if all of the cross-referenced IDs that are supposed to be there are there.

Note:

Creating an Expression Dataset in the Expression Dataset Manager

  • How many of the IDs were imported out of the total IDs in the microarray dataset? How many exceptions were there? Look in the EX.txt file and look at the error codes for the records that were not imported into the Expression Dataset. Do these represent IDs that were present in the UniProt XML, but were somehow not imported? or were they not present in the UniProt XML?

Note:

Coloring a MAPP with expression data

Note:

Running MAPPFinder

Note:

Compare Gene Database to Outside Resource

The OrderedLocusNames IDs in the exported Gene Database are derived from the UniProt XML. It is a good idea to check your list of OrderedLocusNames IDs to see how complete it is using the original source of the data (the sequencing organization, the MOD, etc.) Because UniProt is a protein database, it does not reference any non-protein genome features such as genes that code for functional RNAs, centromeres, telomeres, etc.

Note:

Lab Notebook

  • When the zipped file has downloaded, unzip the folder with 7-zip.
  • When the zipped file has downloaded, unzip the folder with 7-zip.
  • Download the UniProt XML file for Vibrio cholerae
  • When the zipped file has downloaded, unzip the folder with 7-zip.

Import Data into the PostgreSQL Database

  • Launch pgAdmin III.
  • Double-click on PostgreSQL 9.2 (localhost:5432) on the upper left hand side of the window. Type in the password to connect.
  • Right click on Databases and select New Database...
  • Name the new database in the following format and click OK: VC_<your initials>_20131022_gmb2b70 (ex. VC_KS_20131022_gmb2b70).
  • Click on your new database name in the treeview on the left.
  • Click on the SQL icon in the toolbar at the top of the window.
  • Click on the Open File icon in the toolbar (the yellow folder with an arrow).
  • Navigate to the folder in which you unzipped GenMAPP Builder.
  • Open the sql folder and open the file gmbuilder.sql. You should see SQL code appear in the SQL Editor tab.
  • Click the Execute Query icon which looks like a green “Play” triangle button.
  • Close the query window.
  • To double check that all is OK, click the + sign for the database, then the + sign for Schemas, then finally the + sign for public. Under the Tables section, you should see a count of 159 in parentheses.

Export a GenMAPP Gene Database Using GenMAPP Builder

  • Keep pgAdminIII open. Go to the gmbuilder-2.0b70 folder in the Downloads folder and launch gmbuilder-32bit.bat
  • Select the menu item File > Configure Database...
Using GenMAPP Builder KS 20131024.jpg
  • Under the Database Connections tab,
  • The Database Driver defaults to PostgreSQL. Enter the following information into the corresponding fields:
  • Host or address: localhost
  • Port number: 5432
  • Database name: VC_KS_20131022_gmb2b70
  • Username: postgres
  • Password: <enter the password of the PostgreSQL database you created above>
  • Click the OK button.
  • Select File > Import UniProt XML...
  • Navigate to the UniProt XML file that you extracted previously and click the Open button.
  • Select File > Import GO OBO-XML...
  • Navigate to the GO OBO-XML file that you extracted previously. Click the Open button.
  • Click OK to the message asking you to process the GO data.
  • Select File > Import GOA...
  • Navigate to the GOA file that you downloaded previously and click the Import button.
  • Select File > Export to GenMAPP Gene Database...
  • Type a name in the Owner field (ex. LMU_Fall2013_BIOL367_KS).
  • GenMAPP Builder scans your PostgreSQL database to see what species are available. Click on the species that you would like to export, then click Next to continue.
Using the Export Wizard in GenMAPP Builder KS 20131024.jpg
  • Click on the Save GenMAPP Database File As... button. Make sure that the Downloads folder appears. Modify the default file name to include your initials and then click on Save.
Using the Export Wizard in GenMAPP Builder Second Window KS 20131024.jpg
  • Click the Next button to start the export process.

Check the Quality of the Exported Database

  • Use XMLPipeDB Match to count the number of unique gene ideas in the V. cholerae Uniprot XML file.
java -jar xmlpipedb-match-1.1.1.jar "VC_[0-9][0-9][0-9][0-9]" < UniProt_V_cholerae_KSTV_20131022

2738 unique matches were found.

java -jar xmlpipedb-match-1.1.1.jar "VC_(A|)[0-9][0-9][0-9][0-9]" < UniProt_V_cholerae_KSTV_20131022

3831 unique matches were found.

  • Count the number of unique gene IDs using an SQL query:
select count(*) from genenametype where type = 'ordered locus' and value ~ 'VC_(A|)[0-9][0-9][0-9][0-9]'

Count = 3831

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox