Blitvak Individual Assessment and Reflection
From LMU BioDB 2015
Revision as of 07:03, 18 December 2015 by Blitvak (Talk | contribs) (draft 1 of reflection deliverable)
Contents
Statement of Work
- Describe exactly what you did on the project.
- I contributed to the Gene Database project by figuring out the gene ID patterns related to our species (B. cenocepacia str. J2315), finding the MOD, by conducting gene database exports for any modified versions of GenMAPP Builder, by providing my input towards the creation of modified builds of GenMAPP Builder, and by conducting quality assurance on any exported gene databases. I figured out what was going wrong with the initial and 2nd export of the gene database by looking into the UniProt XML file via an XML editor; these findings contributed to the creation of the final, comprehensive, build of GenMAPP Builder by pinpointing a fault with the utilized version of GenMAPP Builder (was grabbing "ordered locus" type gene IDs instead of "ORF" type, which led to exported databases that only accounted for 337 genes). I also designed the final commands that were used with Postgres and Match (
java -jar xmlpipedb-match-1.1.1.jar "p?BCA[LMS]?[0-9][0-9][0-9][Aa]?[0-9]?[A-Z,a-z]?" < "uniprot-taxonomy%3A216591_GEN_BL12_20151119.xml"
for Match, andselect count(*) from genenametype where type = 'ORF' and value ~ 'p?BCA[LMS]?[0-9][0-9][0-9][Aa]?[0-9]?[A-Z,a-z]?';
for Postgres). I also used Excel to figure out why the Match results were giving a result that was 5 off from the number of IDs that was represented by the final export database (5 discrepant counts that were accidental matches of text that was unrelated to gene IDs). I think that my most valuable contribution was the export and validation of all of the databases that were created with this project (provided information that was used to fix problems with GenMAPP Builder and led to the creation of a Gene Database that accounted for all of the desired genes).
- I contributed to the Gene Database project by figuring out the gene ID patterns related to our species (B. cenocepacia str. J2315), finding the MOD, by conducting gene database exports for any modified versions of GenMAPP Builder, by providing my input towards the creation of modified builds of GenMAPP Builder, and by conducting quality assurance on any exported gene databases. I figured out what was going wrong with the initial and 2nd export of the gene database by looking into the UniProt XML file via an XML editor; these findings contributed to the creation of the final, comprehensive, build of GenMAPP Builder by pinpointing a fault with the utilized version of GenMAPP Builder (was grabbing "ordered locus" type gene IDs instead of "ORF" type, which led to exported databases that only accounted for 337 genes). I also designed the final commands that were used with Postgres and Match (
- Provide references or links to artifacts of your work, such as: Wiki pages, Other files or documents, Code or scripts
Journals
- Week 11 Individual Journal - Exploration of the MOD and establishment of the gene IDs for J2315
- Week 12 Individual Journal - Initial Database Export, Background
- Week 14 Individual Journal - Discrepant Match ID analysis with Excel, UniProt XML file exploration that determined the data that should be captured by GenMAPP Builder, Exports of Builds 2, 3, and 4 Gene Databases
- Week 15 Individual Journal - Final project work and exploration of the 6993 UniProt entries, compared to the 7121 gene IDs, via PSQL
Testing Reports
- Initial Export Testing Report
- Build 2 Export Testing Report
- Build 3 Export Testing Report
- Build 4 Export Testing Report
Files
- Compressed Initial Export .gdb - (revealed that only 337 genes ended up in the exported database, all of "ordered locus" type)
- Compressed Build 2 Export .gdb - (by Anu, build 2 added a species profile for J2315)
- Compressed Build 3 Export .gdb - (by Anu, modifications that allowed the capture of ORF data)
- Compressed Build 4 Export .gdb - (by Anu, fixed a big with TallyEngine that was not representing the ORF genes)
Assessment of Project
- What worked and what didn't work?
- What would you do differently if you could do it all over again?
- Content: What is the quality of the work?
- Organization: Comment on the organization of the project and of your group's wiki pages.
- Completeness: Did your team achieve all of the project objectives? Why or why not?
Reflection on the Process
What did you learn?
- With your head (biological or computer science principles)
- With your heart (personal qualities and teamwork qualities that make things work or not work)?
- I came to appreciate biology in light of computer science principles (DNA as biological code)
- With your hands (technical skills)?
- I learned a lot of skills related to the manipulation of text via the command-line, the process of creation and quality assurance tied to databases, and I feel that I became a lot more fluent in Excel.
- What lesson will you take away from this project that you will still use a year from now?
- I really learned the importance of documentation and of research reproducibility. This course taught me a lot about proper data management and