Difference between revisions of "Bklein7 Week 12"

From LMU BioDB 2015
Jump to: navigation, search
(updated gene database testing report)
(added presentation pdf)
 
(11 intermediate revisions by 2 users not shown)
Line 1: Line 1:
==TO DO==
+
==Genome Sequencing Paper PowerPoint Presentation==
*Check Export!
+
Presentation File (PDF): [[File:Genomepaper cw20151116.pdf]]
*ZIP and upload files used in creating the GDB
+
==Quality Assurance Work==
*Create a new page for the gene testing report (_cw20151119) and link to it from this page
+
I created and tested our first ''Bordetella pertussis'' gene database, tagged '''cw20151119'''.
*Follow instructions on the Coder page and update electronic lab notebook
+
*Gene Database v1: [[File:Bpertussis-std cw20151119.zip]]
**Consider dividing this page into coder and QA sections
+
*[[Gene Database Testing Report- cw20151119]] (I authored sections 1-4.6. Lena authored sections 4.7 & 4.8)
 +
Work Log:
 +
*Thursday, November 19: I followed the import-export process for the creation of our first ''Bordetella pertussis'' gene database. My protocol was documented on the Gene Database Testing Report page for this database.
 +
*Monday, November 23: I accessed the exported database and went through counting protocols to evaluate its content. Results were posted on the Gene Database Testing Report page for this database.
  
==Files Asked for in the Gene Database Testing Report==
+
==Coder Work==
For convenience, all of the files explicitly asked for in the "Gene Database Testing Report" section were compressed together in this file:
+
Before proceeding, I designated my personal laptop as my development computer.
 +
=== GitHub Repository Clone Setup ===
 +
GitHub Information:
 +
*My GitHub account: [https://github.com/bklein7 bklein7]
 +
*Projects in which I am listed as a developer: [https://github.com/lmu-bioinformatics/xmlpipedb LMU Bioinformatics XMLPipeDB Project]
 +
**My Team: [https://github.com/orgs/lmu-bioinformatics/teams/the-class-whoopers The Class Whoopers]
 +
**My Branch: [https://github.com/lmu-bioinformatics/xmlpipedb/tree/b-pertussis b-pertussis]
 +
GitHub Clone Setup:
 +
#I designated a folder on my Desktop entitled "B. pertussis Project" as the location for my local copy of the [https://github.com/lmu-bioinformatics/xmlpipedb XMLPipeDB GitHub repository]. To enter this location, I opened Terminal and used the following command: <pre>cd /Users/brandonklein/Desktop/B.\ pertussis\ Project</pre>
 +
#I cloned the repository:<pre>git clone https://github.com/lmu-bioinformatics/xmlpipedb.git</pre>
 +
#I entered the clone folder: <pre>cd xmlpipedb</pre>
 +
#I switched to my branch:<pre>git checkout b-pertussis</pre>
  
==Pre-requisites==
+
=== “Developer Rig” Setup and Initial As-Is Build ===
The following set of software was used in the creation and testing of the ''Vibrio cholerae'' gene database:
+
Necessary software was downloaded:
 +
* Java developer tools: [http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html JDK 8] (which, at this writing, is ''JDK 8u65'')
 +
* Any tool that can unpack .gz and .zip files: [http://www.kekaosx.com/en/ Keka] (listed as an equivalent software to [http://www.7-zip.org/ 7-zip] for Mac OS X)
 +
* ''XMLPipeDB Match'' utility
 +
* Development environment: [http://www.eclipse.org Eclipse IDE for Java EE Developers]
 +
==== Eclipse Workspace Setup ====
 +
# I ran Eclipse.
 +
# When prompted to specify a Workspace, I selected my "xmlpipedb" repository clone folder and clicked "Open".
 +
# I verified that my repository clone folder was listed as the ''Workspace:'' and clicked ''OK''.
 +
# When presented with the introductory display, I clicked the ''Workbench'' button.
 +
#* This took me to an empty developer area:
 +
#* [[File:Screen Shot 2015-11-23 at 5.10.28 PM.png]]
 +
==== Java Project Setup ====
 +
# I right-clicked within the empty ''Project Explorer'' tab and chose '''New > Project…''' from the menu that appeared.
 +
# I chose ''Java Project'' from the list of “wizards” and clicked on the ''Next >'' button.
 +
# On the next panel, I entered <code>gmbuilder</code> as the ''Project name:''.
 +
#* The ''JRE'' section showed Java 1.8, confirming that my version of Java was up to date.
 +
# I click on the ''Finish'' button.
 +
#* When asked if I wanted to open the “Java perspective,” I responded with ''Yes''.
 +
# The ''gmbuilder'' project folder was now visible in the ''Project Explorer'' tab.
 +
#* The ''gmbuilder'' project folder did not show a red ''x'' icon.
  
# [http://www.7-zip.org/ 7-zip]tool that for unpacking .gz and .zip files
+
=== Initial Build ===
# [http://www.postgresql.org PostgreSQL] on Windows (version 9.4.x)
+
# I opened the ''gmbuilder'' project by clicking on the gray triangle to the left of its name.
# [https://sourceforge.net/projects/xmlpipedb/files/ GenMAPP Builder]
+
# I right-clicked on ''build.xml'' listing and chose '''Run As > Ant Build...''' from the menu that appears.
# Java JDK 1.8 64-bit
+
# In the ''Edit Configuration'' dialog that appears, I entered the "Targets" tab. There, I checked on the ''clean'' and ''dist'' items in the ''Targets'' tab. The ''Target execution order'' section near the bottom of the dialog displayed ''clean, dist''.
# [https://github.com/GenMAPPCS/genmapp GenMAPP 2]
+
# I clicked the ''Run'' button. After 3 seconds of processing, the build was successful.
# [https://sourceforge.net/projects/xmlpipedb/files/ XMLPipeDB match utility] for counting IDs in XML files
+
# When this was done, I right-clicked on the ''gmbuilder'' project folder and chose '''Refresh''' from the menu that appears.
# Microsoft Access for reading .mdb files
+
# A ''dist'' folder was now present inside the ''gmbuilder'' project folder. This was my personally-built copy of ''GenMAPP Builder''. Its contents correspond to the extracted contents of the ''gmbuilder-3.0.0-build-5.zip'' file that was downloaded in class.
 
+
#*Screenshot of my copy of gmbuilder within the Eclipse working environment:
==Gene Database Creation==
+
#*[[File:Eclipse gmbuilder Ant Build.png]]
===Downloading Data Source Files and GenMAPP Builder===
+
 
+
*I download the UniProt XML, GOA, and GO OBO-XML files for ''Bordetella Pertussis'' along with the GenMAPP Builder program.
+
**All files were saved to the folder ''Bklein7_CW'' on my computer's ThawSpace.
+
**Files that required extraction were unzipped using [http://www.7-zip.org/ 7-zip].
+
**Data files that remained in a folder after unzipping were removed from their folders to facilitate organization and command line processing.
+
 
+
====UniProt XML====
+
 
+
* I went to the [http://www.uniprot.org/taxonomy/complete-proteomes UniProt Complete Proteomes] page.
+
**From there, I navigated to the complete proteome download page for [http://www.uniprot.org/proteomes/UP000002676 Bordetella pertussis (strain Tohama I / ATCC BAA-589 / NCTC 13251)].
+
** I clicked on the "Download" button at the top of the page above and selected the following options:
+
***"Download all"
+
***"XML" from the "Format" drop-down menu
+
***"Compressed" format
+
**I extracted the file using [http://www.7-zip.org/ 7-zip].
+
 
+
====GOA====
+
 
+
* UniProt-GOA files can be downloaded from the [http://ftp.ebi.ac.uk/pub/databases/GO/goa/ UniProt-GOA ftp site].
+
*Within the above site, I navigated to the [http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/145.B_pertussis_ATCC_BAA-589.goa for ''Bordetella pertussis'' strain Tohama I].
+
**This text file was automatically opened by my browser. Therefore, I had to manually download the file.
+
 
+
====GO OBO-XML====
+
 
+
* I downloaded the GO OBO-XML formatted file from the [http://geneontology.org/page/download-ontology#Legacy_Downloads Gene Ontology legacy download page].
+
*I extracted the file using [http://www.7-zip.org/ 7-zip].
+
 
+
====Downloaded GenMAPP Builder====
+
 
+
# I downloaded the GenMAPP Builder zip folder: [https://github.com/lmu-bioinformatics/xmlpipedb/releases/download/untagged-bd04fffc4da853fedf30/gmbuilder-3.0.0-build-5.zip Download gmbuilder-3.0.0-build-5.zip].
+
# I extracted the GenMAPP Builder folder using [http://www.7-zip.org/ 7-zip].
+
 
+
===Creating the New Database in PostgreSQL===
+
 
+
* I launched ''pgAdmin III'' and connected to the PostgreSQL 9.4 server (localhost:5432).
+
** On this server, I created a new database: ''bpertussis_cw20151119_gmb3build5''.
+
** I opened the SQL Editor tab to use an XMLPipeDB query to create the tables in the database.
+
*** I clicked on the Open File icon and selected the file ''gmbuilder.sql''. This imported a series of SQL commands into the editor tab.
+
*** I clicked on the Execute Query icon to run this command.
+
***In viewing the schema for this database, I confirmed that there were 167 tables after running the above command.
+
 
+
===Configuring GenMAPP Builder to Connect to the PostgreSQL Database===
+
 
+
* To begin, I launched gmbuilder.bat.
+
* I selected the "Configure Database" option and entered the following information into the fields below:
+
** Host or address: localhost
+
** Port number: 5432
+
** Database name: bpertussis_cw20151119_gmb3build5
+
** Username: postgres
+
** Password: Welcome1
+
 
+
===Importing Data into the PostgreSQL Database===
+
 
+
*The downloaded data files for ''Bordetella pertussis'' were specified and imported into the database by clicking on the following buttons:
+
** Selected File > Import UniProt XML...
+
** Selected File > Import GO OBO-XML...
+
** Clicked OK to the message asking to process the GO data.
+
** Selected File > Import GOA...
+
 
+
===Exporting a GenMAPP Gene Database (.gdb)===
+
 
+
* I selected File > Export to GenMAPP Gene Database... to begin the export process.
+
* I typed my name in the owner field (Brandon Klein).
+
* I selected "Bordetella pertussis (strain Tohama I/ATCC BAA-589/NCTC 13251), Taxon ID 257313" as the gene database species and then clicked ''Next''.
+
* The database was saved as ''bpertussis-std_cw20151119''.
+
* I checked the boxes for exporting all Molecular Function, Cellular Component, and Biological Process Gene Ontology Terms.
+
* Finally, I clicked the "Next" button to begin the export process.
+
 
+
==Gene Database Testing Report==
+
===Export Information===
+
 
+
Version of GenMAPP Builder: Version 3.0.0 Build 5
+
 
+
Computer on which export was run: Seaver 120- Last computer on the right in the row farthest from the front of the room
+
 
+
Postgres Database name: bpertussis_cw20151119_gmb3build5
+
 
+
UniProt XML filename (give filename and upload and link to compressed file):
+
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]):
+
* UniProt XML download link:
+
* Time taken to import: 2.60 minutes
+
** Note: The import time was similar to that of ''V. cholerae'' in Week 9 (2.92 minutes). No interruptions occurred during this process.
+
 
+
GO OBO-XML filename (give filename and upload and link to compressed file):
+
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped):
+
* GO OBO-XML download link:
+
* Time taken to import: 6.99 minutes
+
* Time taken to process: 4.48 minutes
+
** Note: The import and processing times were similar to those for ''V. cholerae'' in Week 9 (6.88 minutes and 4.49 minutes respectively). No interruptions occurred during these processes.
+
 
+
GOA filename (give filename and upload and link to compressed file):
+
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]):
+
* GOA download link:
+
* Time taken to import: 0.04 minutes
+
** Note: The import time was similar to that of ''V. cholerae'' in Week 9 (0.06 minutes). No interruptions occurred during this process.
+
 
+
Name of .gdb file (give filename and upload and link to compressed file):
+
* Time taken to export:
+
** Start time: 4:06 PM
+
** End time: 4:46 PM
+
** Elapsed time: 40 minutes
+
Note: All export windows remained open when I returned to check the export status. No interruptions occurred during the export process.
+
 
+
===TallyEngine===
+
 
+
* I ran the TallyEngine in GenMAPP Builder and specified the following files:
+
**XML- uniprot-proteome%3AUP000002676_cw20151119.xml
+
**GO- go_daily-termdb_cw20151119.obo-xml
+
*Results:
+
**[[File:TallyEngine cw20151119.png]]
+
**All tally results were consistent across both files.
+
=== Using XMLPipeDB match to Validate the XML Results from the TallyEngine===
+
The following functions were performed using the Windows command line (cmd).
+
*I entered my project folder using the following command:
+
cd /d T:\Bklein7_CW
+
*I used XMLPipeDB match to identify matches of any ordered locus name following the pattern "BP####" in the UniProt XML file. The command sequence used is as follows:\
+
java -jar xmlpipedb-match-1.1.1.jar "BP[0-9][0-9][0-9][0-9]" < "uniprot-proteome%3AUP000002676_cw20151119.xml"
+
*Match Results:
+
**[[File:XMLPipeDBMatch cw20151119.png]]
+
**The total number of unique matches listed above, 3438, differs from the Order Locus Names count of 3435 produced by the Tally Engine. Thus, 3 gene IDs present in the original XML file were not imported into ''bpertussis-std_cw20151119.gdb''.
+
 
+
=== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine===
+
I ran a SQL query designed to match the pattern BP####:
+
 
+
select count (*) from genenametype where type = 'ordered locus' and value ~ 'BP[0-9][0-9][0-9][0-9]';
+
 
+
Results:
+
*[[File:SQLQuery cw20151119.png]]
+
* The number of unique matches yielded by this SQL query, 3435, matched that produced by the Tally Engine. However, this count was also 3 less than that yielded by the XMLPipeDB Match result reported above. This further indicates that there was an error present in importing all gene IDs from the original XML file.
+
 
+
===OriginalRowCounts Comparison===
+
 
+
Within the .gdb file, look at the OriginalRowCounts table to see if the database has the expected tables with the expected number of records. Compare the tables and records with a benchmark .gdb file.
+
 
+
Benchmark .gdb file:
+
 
+
Copy the OriginalRowCounts table from the benchmark and new gdb and paste them here:
+
 
+
Note:
+
 
+
===Visual Inspection===
+
 
+
Perform visual inspection of individual tables to see if there are any problems.
+
 
+
* Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?
+
* Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?
+
 
+
Note:
+
 
+
===.gdb Use in GenMAPP===
+
 
+
<!--Need to add more instructions here.-->
+
 
+
While the above sections perform quality assurance on the exported Gene Database via verifying ID counts, the "proof in the pudding" is to actually use the Gene Database in GenMAPP.  You can follow the instructions in [http://www.openwetware.org/wiki/BIOL367/F10:GenMAPP_and_MAPPFinder_Protocols Part 2 of the ''Vibrio cholerae'' Microarray Data Analysis] to verify that the Gene Database works in GenMAPP. In this case, the emphasis is not on the findings of the data analysis itself, but that the Gene Database functions appropriate in GenMAPP.
+
 
+
For assistance with using the GenMAPP program, the GenMAPP Help is very extensive.  To access it within GenMAPP, go to the menu item Help > GenMAPP Help and either browse or search for your topic of interest.
+
 
+
Note:
+
 
+
====Putting a gene on the MAPP using the GeneFinder window====
+
 
+
* In the main GenMAPP Drafting Board window, left-click on the icon for "Gene" in the upper left corner of the window. Click on the Drafting Board to place the Gene on the MAPP. Now, right-click on the gene to access the GeneFinder window.  Type or paste a gene ID into the Gene ID field.  Select the appropriate Gene ID system from the drop-down menu and click the Search button. For example, for ''Vibrio cholerae'', you could search for the ID "VC0028", which is an OrderedLocusNames ID.  Once the ID has been found, click the OK button to return to the Drafting Board window.
+
** For the Final Project, you will need to try a sample ID from each of the gene ID systems, not just OrderedLocusNames.
+
* Open the Backpage by left-clicking on the gene box on the Drafting Board to see if all of the cross-referenced IDs that are supposed to be there are there.
+
 
+
Note:
+
 
+
====Creating an Expression Dataset in the Expression Dataset Manager====
+
 
+
* How many of the IDs were imported out of the total IDs in the microarray dataset? How many exceptions were there? Look in the EX.txt file and look at the error codes for the records that were not imported into the Expression Dataset.  Do these represent IDs that were present in the UniProt XML, but were somehow not imported? or were they not present in the UniProt XML?
+
 
+
Note:
+
 
+
====Coloring a MAPP with expression data====
+
 
+
Note:
+
 
+
====Running MAPPFinder====
+
 
+
Note:
+
 
+
=== Compare Gene Database to Outside Resource===
+
 
+
'''''Note:''''' This section applies to the Group Final Project and does not need to be completed for the [[Week 9]] assignment.  ''&mdash; [[User:Kdahlquist|Kdahlquist]] ([[User talk:Kdahlquist|talk]]) 15:46, 2 November 2015 (PST)''
+
 
+
The OrderedLocusNames IDs in the exported Gene Database are derived from the UniProt XML. It is a good idea to check your list of OrderedLocusNames IDs to see how complete it is using the original source of the data (the sequencing organization, the MOD, etc.)  Because UniProt is a protein database, it does not reference any non-protein genome features such as genes that code for functional RNAs, centromeres, telomeres, etc.
+
 
+
Note:
+
 
+
[[Category:Group Projects]]
+
  
 
==Links==
 
==Links==

Latest revision as of 08:02, 24 November 2015

Genome Sequencing Paper PowerPoint Presentation

Presentation File (PDF): File:Genomepaper cw20151116.pdf

Quality Assurance Work

I created and tested our first Bordetella pertussis gene database, tagged cw20151119.

Work Log:

  • Thursday, November 19: I followed the import-export process for the creation of our first Bordetella pertussis gene database. My protocol was documented on the Gene Database Testing Report page for this database.
  • Monday, November 23: I accessed the exported database and went through counting protocols to evaluate its content. Results were posted on the Gene Database Testing Report page for this database.

Coder Work

Before proceeding, I designated my personal laptop as my development computer.

GitHub Repository Clone Setup

GitHub Information:

GitHub Clone Setup:

  1. I designated a folder on my Desktop entitled "B. pertussis Project" as the location for my local copy of the XMLPipeDB GitHub repository. To enter this location, I opened Terminal and used the following command:
    cd /Users/brandonklein/Desktop/B.\ pertussis\ Project
  2. I cloned the repository:
    git clone https://github.com/lmu-bioinformatics/xmlpipedb.git
  3. I entered the clone folder:
    cd xmlpipedb
  4. I switched to my branch:
    git checkout b-pertussis

“Developer Rig” Setup and Initial As-Is Build

Necessary software was downloaded:

  • Java developer tools: JDK 8 (which, at this writing, is JDK 8u65)
  • Any tool that can unpack .gz and .zip files: Keka (listed as an equivalent software to 7-zip for Mac OS X)
  • XMLPipeDB Match utility
  • Development environment: Eclipse IDE for Java EE Developers

Eclipse Workspace Setup

  1. I ran Eclipse.
  2. When prompted to specify a Workspace, I selected my "xmlpipedb" repository clone folder and clicked "Open".
  3. I verified that my repository clone folder was listed as the Workspace: and clicked OK.
  4. When presented with the introductory display, I clicked the Workbench button.
    • This took me to an empty developer area:
    • Screen Shot 2015-11-23 at 5.10.28 PM.png

Java Project Setup

  1. I right-clicked within the empty Project Explorer tab and chose New > Project… from the menu that appeared.
  2. I chose Java Project from the list of “wizards” and clicked on the Next > button.
  3. On the next panel, I entered gmbuilder as the Project name:.
    • The JRE section showed Java 1.8, confirming that my version of Java was up to date.
  4. I click on the Finish button.
    • When asked if I wanted to open the “Java perspective,” I responded with Yes.
  5. The gmbuilder project folder was now visible in the Project Explorer tab.
    • The gmbuilder project folder did not show a red x icon.

Initial Build

  1. I opened the gmbuilder project by clicking on the gray triangle to the left of its name.
  2. I right-clicked on build.xml listing and chose Run As > Ant Build... from the menu that appears.
  3. In the Edit Configuration dialog that appears, I entered the "Targets" tab. There, I checked on the clean and dist items in the Targets tab. The Target execution order section near the bottom of the dialog displayed clean, dist.
  4. I clicked the Run button. After 3 seconds of processing, the build was successful.
  5. When this was done, I right-clicked on the gmbuilder project folder and chose Refresh from the menu that appears.
  6. A dist folder was now present inside the gmbuilder project folder. This was my personally-built copy of GenMAPP Builder. Its contents correspond to the extracted contents of the gmbuilder-3.0.0-build-5.zip file that was downloaded in class.
    • Screenshot of my copy of gmbuilder within the Eclipse working environment:
    • Eclipse gmbuilder Ant Build.png

Links

Assignments Pages

Individual Journal Entries

Shared Journal Entries