Msaeedi23 Week 14
Link to completed gene database testing report for most recent created database 1203: Gene Database Testing Report- cw20151203
Contents
Bordetella Pertussis Species Profile Customization
ID inconsistencies
To assess the need for further customization of the Bordetella pertussis species profile, I worked directly with our coder Brandon to compare the gene IDs present in the cw20151201 .gdb file (File:Bpertussis-std cw20151201.zip) and the original .xml file (File:Uniprot-proteome-UP000002676 cw20151201.zip). This was done by copying the "OrderedLocusNames" table from the .gdb file into a new Excel document alongside the list of Bordetella pertussis gene IDs in the original .xml file that was generated using xmlpipedb match. Further details can be found in Mahrad's Week 14 Journal Entry. Overall, we found 4 discrepant IDs. Together we went through and processed each individual gene which had an apparent discrepancy from the original dataset to the exported one.
- BP3167.1- existed in the .gdb file but not in the the xmlpipedb match result
- Additionally, we found that the expected variant of this ID, BP3167, existed in the xmlpipedb match result but not in the .gdb file.
- We hypothesized that the ID existed as BP3167.1 in the original .xml file but was retrieved incompletely in the xmlpipedb match result due to the use of an imprecise regex (BP####).
- BP0101- did not get exported into the .gdb file
- To investigate why this gene was not exported, we looked up this ID in the original .xml file.
- We found that the gene ID was actually listed as both BP0101A and BP0101B, which was not retrieved in the xmlpipedb match result due to the use of an imprecise regex (BP####).
- The gene IDs BP0101A and BP0101B were both listed as open reading frame (ORF) genes in the .xml file as opposed to ordered locus names. This presented a new category of gene IDs that the custom Bordetella pertussis species profile was not designed to export.
- ORF investigation
- To see if there were other ORF genes with the pattern BP####A or BP####B, we ran a query to retrieve such IDs from the original PostgreSQL database: bpertussis_cw20151119_gmb3build5.
-
- This query retrieved a list of 11 ORF genes. Seven of these genes exhibited numerical sequences (####) that were cross-listed as ordered locus names, whereas four of the genes exhibited unique numerical sequences that were only listed as ORF genes. Therefore, only four of the ORF IDs should been discrepant during our comparative analysis between .gdb and .xml IDs. Of these four, two were BP0101A and BP0101B.
-
- To evaluate the total number of genes that should be exported into the .gdb file, including both genes listed under the ordered locus names and ORF tables, we queried the original PostgreSQL database once more:
- To see if there were other ORF genes with the pattern BP####A or BP####B, we ran a query to retrieve such IDs from the original PostgreSQL database: bpertussis_cw20151119_gmb3build5.
- To investigate why this gene was not exported, we looked up this ID in the original .xml file.
- BP0970- did not get exported into the .gdb file
- This was the third ORF gene with a unique numerical sequence. The gene was listed as BP0970A in the original .xml file.
- BP0684- did not get exported into the .gdb file
- This was the third ORF gene with a unique numerical sequence. The gene was listed as BP0684A in the original .xml file.
Conclusion: The custom Bordetella pertussis species profile in GenMAPP Builder needs to be customized to export the 11 ORF genes. The gene BP3167.1 was already being exported and, therefore, does not warrant coding changes. Instead, this unique pattern should be incorporated into the regex used to count Bordetella pertussis gene IDs when running future gene database testing reports.
Changes to the Code
To make changes to the GenMAPP Builder code, Brandon opened the gmbuilder project within the Java perspective using the program Eclipse. The details regarding the developer rig, used can be found on Brandon's page. Week 12 Assignment Page.
Class Whoopers Team Page
Assignment Links
- Week 1 Assignment
- Week 2 Assignment
- Week 3 Assignment
- Week 4 Assignment
- Week 5 Assignment
- Week 6 Assignment
- Week 7 Assignment
- Week 8 Assignment
- Week 9 Assignment
- Week 10 Assignment
- Week 11 Assignment
- Week 12 Assignment
- Week 14 Assignment
- Week 15 Assignment
Individual Journals
- Week 1
- Week 2
- Week 3
- Week 4
- Week 5
- Week 6
- Week 7
- Week 8
- Week 9
- Week 10
- Week 11
- Week 12
- Week 13
- Week 14
- Week 15
- Class Journal Week 1
- Class Journal Week 2
- Class Journal Week 3
- Class Journal Week 4
- Class Journal Week 5
- Class Journal Week 6
- Class Journal Week 7
- Class Journal Week 8
- Class Journal Week 9
- The_Class_Whoopers Week 10
- The_Class_Whoopers Week 11
- The_Class_Whoopers Week 12
- The_Class_Whoopers Week 14
- The_Class_Whoopers 15