Difference between revisions of "Blitvak Week 12"

From LMU BioDB 2015
Jump to: navigation, search
(added some major details)
(added more details about getting the files (GOA))
Line 27: Line 27:
 
*The result was clicked upon and, on the result page, ''UniProtKB'' was clicked upon in the "Map to" section (on left of the page)
 
*The result was clicked upon and, on the result page, ''UniProtKB'' was clicked upon in the "Map to" section (on left of the page)
 
*On the ''UniProtKB results'' page, Download was clicked; in the box that appeared, download all was selected, the format was set to XML, and the file was set to be compressed.
 
*On the ''UniProtKB results'' page, Download was clicked; in the box that appeared, download all was selected, the format was set to XML, and the file was set to be compressed.
**Referencing the entry name that corresponds to J2315, it was noticed that J2315 is also known as: ATCC BAA-245, DSM 16553, LMG 16656, NCTC 13227, and CF5610
 
  
 
===Retrieving the GOA file, Performed on 11/19===
 
===Retrieving the GOA file, Performed on 11/19===
 
*The [http://ftp.ebi.ac.uk/pub/databases/GO/goa/ UniProt-GOA ftp site] was entered
 
*The [http://ftp.ebi.ac.uk/pub/databases/GO/goa/ UniProt-GOA ftp site] was entered
 
*The link to the "proteomes" directory was clicked in the main directory
 
*The link to the "proteomes" directory was clicked in the main directory
*In "proteomes", the GOA corresponding to the J2315 strain was not found.
+
*In "proteomes", the GOA corresponding to the J2315 strain was not found; the GOA files corresponding to other ''B.cenocepacia'' strains, however, were found
*The GO annotations were found using the EMBL Quick GO browser: [[http://www.ebi.ac.uk/QuickGO/GAnnotation?tax=216591]]
+
**By looking over the UniProt [http://www.uniprot.org/taxonomy/216591 Taxonomy] page for J2315, it was found that the Taxon Identifier is '''216591'''
 +
*The UniProt-GOA Proteome Sets page was accessed on the EBI [https://www.ebi.ac.uk/GOA/proteomes website]; it was noticed that there was a Tax ID column and control-F was utilized in order to find an entry that corresponded to 216591. It was found that the file ''31277.B_cepacia.goa'' was the correct GOA file
 +
*31277.B_cepacia.goa was found in the proteomes directory of the [http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ UniProt-GOA ftp site] and downloaded
 
===Retrieving the GO OBO-XML file, Performed on 11/19===
 
===Retrieving the GO OBO-XML file, Performed on 11/19===
 
*The GO OBO-XML file was downloaded from the [http://geneontology.org/page/download-ontology#Legacy_Downloads Gene Ontology download page]
 
*The GO OBO-XML file was downloaded from the [http://geneontology.org/page/download-ontology#Legacy_Downloads Gene Ontology download page]
Line 42: Line 43:
 
**'''All of the downloaded files, if compressed, were extracted using 7-Zip. All required files were placed in one folder'''
 
**'''All of the downloaded files, if compressed, were extracted using 7-Zip. All required files were placed in one folder'''
 
*''Downloaded on 10/27, Summary''
 
*''Downloaded on 10/27, Summary''
**The complete proteome for ''V. cholerae'' was downloaded from [http://www.uniprot.org/uniprot/?query=organism:243277 UniProtKB] in the XML format
+
**The complete proteome for ''B. cenocepacia'' J2315 was downloaded from [http://www.uniprot.org/uniprot/?query=organism:216591 UniProtKB] in the XML format
**The GOA file for ''V. cholerae'' was downloaded from this [http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/46.V_cholerae_ATCC_39315.goa link]
+
**The GOA file for ''B. cenocepacia'' J2315 was downloaded from the [http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/31277.B_cepacia.goa UniProt GOA ftp site]
**The GO OBO-XML formatted file for ''V. cholerae'' was downloaded from the [http://geneontology.org/page/download-ontology#Legacy_Downloads GO website]
+
**The GO OBO-XML formatted file for ''B. cenocepacia'' J2315 was downloaded from the [http://geneontology.org/page/download-ontology#Legacy_Downloads GO website]
 
**The most recent version (3.0.0, build 5) of gmBuilder was downloaded from [https://github.com/lmu-bioinformatics/xmlpipedb/releases/download/gmbuilder-3.0.0-build-5/gmbuilder-3.0.0-build-5.zip GitHub]
 
**The most recent version (3.0.0, build 5) of gmBuilder was downloaded from [https://github.com/lmu-bioinformatics/xmlpipedb/releases/download/gmbuilder-3.0.0-build-5/gmbuilder-3.0.0-build-5.zip GitHub]
  
Line 50: Line 51:
 
===Creating a New Database in PostgreSQL===
 
===Creating a New Database in PostgreSQL===
 
*Steps taken were sourced from the [[Running GenMAPP Builder | Running GenMAPP Builder page]]
 
*Steps taken were sourced from the [[Running GenMAPP Builder | Running GenMAPP Builder page]]
*pgAdmin III was launched and a connection to the server was made. "Databases" was right clicked and select "New Database..." was chosen. The database was given a name, V.cholerae_20151027_gmb3build5, and OK was clicked.
+
*pgAdmin III was launched and a connection to the server was made. "Databases" was right clicked and select "New Database..." was chosen. The database was given a name, B.cenocepacia_J2315_20151119_gmb3build5, and OK was clicked.
 
*The new database was selected and the Query Tool was launched. Open File was clicked in the Query Tool and ''gmbuilder.sql'' in the ''gmbuilder-3.0.0-build-5'' folder (within the ''sql'' folder) was selected. Upon selection of that file, a query was loaded into Query Tool and it was subsequently executed by clicking the green "Execute Query" arrow
 
*The new database was selected and the Query Tool was launched. Open File was clicked in the Query Tool and ''gmbuilder.sql'' in the ''gmbuilder-3.0.0-build-5'' folder (within the ''sql'' folder) was selected. Upon selection of that file, a query was loaded into Query Tool and it was subsequently executed by clicking the green "Execute Query" arrow
 
*This query populates the created database with all of its tables. In order to ensure that the query properly worked, it was checked that 167 tables existed in the database
 
*This query populates the created database with all of its tables. In order to ensure that the query properly worked, it was checked that 167 tables existed in the database
 
===Importing Data===
 
===Importing Data===
 
*gmbuilder.bat in the gmbuilder folder was launched
 
*gmbuilder.bat in the gmbuilder folder was launched
*Under file -> configure database, the host was left as localhost, the port number was left as 5432, database name was set to ''V.cholerae_20151027_gmb3build5'', Username was set to BL, Password was set to the password of the PostgreSQL database that was recently created. OK was clicked.
+
*Under file -> configure database, the host was left as localhost, the port number was left as 5432, database name was set to ''B.cenocepacia_J2315_20151119_gmb3build5'', Username was set to postgres, Password was set to the password of the PostgreSQL database that was recently created. OK was clicked.
===Data Import into ''V.cholerae_20151027_gmb3build5''===
+
===Data Import into ''B.cenocepacia_J2315_20151119_gmb3build5''===
 
*File -> Import UniProt XML was selected
 
*File -> Import UniProt XML was selected
 
**The UniProt XML file that was previously extracted was chosen, open was clicked. The import process was allowed to proceed uninterrupted.
 
**The UniProt XML file that was previously extracted was chosen, open was clicked. The import process was allowed to proceed uninterrupted.
Line 65: Line 66:
 
===Exporting a GenMAPP Gene Database (.gdb file)===
 
===Exporting a GenMAPP Gene Database (.gdb file)===
 
*File -> Export to GenMAPP Gene Database was selected
 
*File -> Export to GenMAPP Gene Database was selected
*BL was typed into the Owner field. The species of interest was selected for export (''V. cholerae'')
+
*BL was typed into the Owner field. The species of interest was selected for export (''B. cenocepacia J2315'')
 
*Next was clicked, the create GenMAPP database file/location was selected, and the boxes for the exporting of Molecular Function, Cellular Component, and Biological Process Gene Ontology Terms were left checked. The export process was initialized by clicking next; the windows were left open for the program to continue and finish with the export process (was estimated to take somewhere between 1-2 hrs).
 
*Next was clicked, the create GenMAPP database file/location was selected, and the boxes for the exporting of Molecular Function, Cellular Component, and Biological Process Gene Ontology Terms were left checked. The export process was initialized by clicking next; the windows were left open for the program to continue and finish with the export process (was estimated to take somewhere between 1-2 hrs).

Revision as of 05:13, 20 November 2015

  • J2315
  • TAXON ID: 623
  • UP000001006

B.cenocepacia_J2315_20151119_gmb3build5

312777.B_cepacia.goa


Initial Export/Import Cycle

Initial Preparations

In preparation for this assignment, it was ensured that these programs were installed on a Windows workstation:

Downloading the Required Files

Retrieving the UniProt XML file, Performed on 11/19

  • The UniProt Complete Proteomes page was entered
  • The Superkingdom Bacteria was selected as a Filter By option
  • "burkholderia cenocepacia J2315" was added to the search bar and search was clicked upon. One result was given that corresponded to J2315.
  • The result was clicked upon and, on the result page, UniProtKB was clicked upon in the "Map to" section (on left of the page)
  • On the UniProtKB results page, Download was clicked; in the box that appeared, download all was selected, the format was set to XML, and the file was set to be compressed.

Retrieving the GOA file, Performed on 11/19

  • The UniProt-GOA ftp site was entered
  • The link to the "proteomes" directory was clicked in the main directory
  • In "proteomes", the GOA corresponding to the J2315 strain was not found; the GOA files corresponding to other B.cenocepacia strains, however, were found
    • By looking over the UniProt Taxonomy page for J2315, it was found that the Taxon Identifier is 216591
  • The UniProt-GOA Proteome Sets page was accessed on the EBI website; it was noticed that there was a Tax ID column and control-F was utilized in order to find an entry that corresponded to 216591. It was found that the file 31277.B_cepacia.goa was the correct GOA file
  • 31277.B_cepacia.goa was found in the proteomes directory of the UniProt-GOA ftp site and downloaded

Retrieving the GO OBO-XML file, Performed on 11/19

Downloading/Updating GenMAPP Builder, Performed on 10/27

Export Process

Creating a New Database in PostgreSQL

  • Steps taken were sourced from the Running GenMAPP Builder page
  • pgAdmin III was launched and a connection to the server was made. "Databases" was right clicked and select "New Database..." was chosen. The database was given a name, B.cenocepacia_J2315_20151119_gmb3build5, and OK was clicked.
  • The new database was selected and the Query Tool was launched. Open File was clicked in the Query Tool and gmbuilder.sql in the gmbuilder-3.0.0-build-5 folder (within the sql folder) was selected. Upon selection of that file, a query was loaded into Query Tool and it was subsequently executed by clicking the green "Execute Query" arrow
  • This query populates the created database with all of its tables. In order to ensure that the query properly worked, it was checked that 167 tables existed in the database

Importing Data

  • gmbuilder.bat in the gmbuilder folder was launched
  • Under file -> configure database, the host was left as localhost, the port number was left as 5432, database name was set to B.cenocepacia_J2315_20151119_gmb3build5, Username was set to postgres, Password was set to the password of the PostgreSQL database that was recently created. OK was clicked.

Data Import into B.cenocepacia_J2315_20151119_gmb3build5

  • File -> Import UniProt XML was selected
    • The UniProt XML file that was previously extracted was chosen, open was clicked. The import process was allowed to proceed uninterrupted.
  • File -> Import GO OBO-XML was selected
    • The GO OBO-XML that was previously extracted was chosen, open was clicked. The import process was allowed to proceed uninterrupted.
  • File -> Import GOA was selected
    • The GOA file that was downloaded previously was chosen, open was clicked, and the import process was allowed to proceed uninterrupted.

Exporting a GenMAPP Gene Database (.gdb file)

  • File -> Export to GenMAPP Gene Database was selected
  • BL was typed into the Owner field. The species of interest was selected for export (B. cenocepacia J2315)
  • Next was clicked, the create GenMAPP database file/location was selected, and the boxes for the exporting of Molecular Function, Cellular Component, and Biological Process Gene Ontology Terms were left checked. The export process was initialized by clicking next; the windows were left open for the program to continue and finish with the export process (was estimated to take somewhere between 1-2 hrs).