Blitvak Individual Assessment and Reflection
From LMU BioDB 2015
Contents
Statement of Work
- Describe exactly what you did on the project.
- I contributed to the Gene Database project by figuring out the gene ID patterns related to our species (B. cenocepacia str. J2315), finding the MOD, by conducting gene database exports for any modified versions of GenMAPP Builder, by providing my input towards the creation of modified builds of GenMAPP Builder, and by conducting quality assurance on any exported gene databases. I figured out what was going wrong with the initial and 2nd export of the gene database by looking into the UniProt XML file via an XML editor; these findings contributed to the creation of the final build of GenMAPP Builder by pinpointing a fault with the previously utilized version of GenMAPP Builder (was grabbing "ordered locus" type gene IDs instead of "ORF" type, which led to exported databases that only accounted for 337 genes). I also designed the final commands that were used with Postgres and Match (
java -jar xmlpipedb-match-1.1.1.jar "p?BCA[LMS]?[0-9][0-9][0-9][Aa]?[0-9]?[A-Z,a-z]?" < "uniprot-taxonomy%3A216591_GEN_BL12_20151119.xml"
for Match, andselect count(*) from genenametype where type = 'ORF' and value ~ 'p?BCA[LMS]?[0-9][0-9][0-9][Aa]?[0-9]?[A-Z,a-z]?';
for Postgres). I worked with Anu, via Q/A, by figuring out what the goals should be for the next modified build of genMAPP Builder. I also used Excel to figure out why the Match results were giving a result that was 5 off from the number of IDs that was represented by the final export database (5 discrepant counts that were accidental matches of text that was unrelated to gene IDs). I think that my most valuable contribution was the export and validation of all of the databases that were created with this project (provided information that was used to fix problems with GenMAPP Builder and led to the creation of a Gene Database that accounted for all of the desired genes). I also worked with Anu to develop the genome paper presentation and with the group in the creation of the final presentation/final paper (focused on the database exports, database validation procedure, and on the IDs).
- I contributed to the Gene Database project by figuring out the gene ID patterns related to our species (B. cenocepacia str. J2315), finding the MOD, by conducting gene database exports for any modified versions of GenMAPP Builder, by providing my input towards the creation of modified builds of GenMAPP Builder, and by conducting quality assurance on any exported gene databases. I figured out what was going wrong with the initial and 2nd export of the gene database by looking into the UniProt XML file via an XML editor; these findings contributed to the creation of the final build of GenMAPP Builder by pinpointing a fault with the previously utilized version of GenMAPP Builder (was grabbing "ordered locus" type gene IDs instead of "ORF" type, which led to exported databases that only accounted for 337 genes). I also designed the final commands that were used with Postgres and Match (
- Provide references or links to artifacts of your work, such as: Wiki pages, Other files or documents, Code or scripts
Journals
- Week 11 Individual Journal - Exploration of the MOD and establishment of the gene IDs for J2315
- Week 12 Individual Journal - Initial Database Export, Background
- Week 14 Individual Journal - Discrepant Match ID analysis with Excel, UniProt XML file exploration that determined the data that should be captured by GenMAPP Builder, Exports of Builds 2, 3, and 4 Gene Databases
- Week 15 Individual Journal - Final project work and exploration of the 6993 UniProt entries, compared to the 7121 gene IDs, via PSQL
Testing Reports
- Initial Export Testing Report
- Build 2 Export Testing Report
- Build 3 Export Testing Report
- Build 4 Export Testing Report
Files
- Compressed Initial Export .gdb - (revealed that only 337 genes ended up in the exported database, all of "ordered locus" type)
- Compressed Build 2 Export .gdb - (by Anu, build 2 added a species profile for J2315)
- Compressed Build 3 Export .gdb - (by Anu, modifications that allowed the capture of ORF data)
- Compressed Build 4 Export .gdb - (by Anu, fixed a bug with TallyEngine that was not representing the ORF genes)
Assessment of Project
- What worked and what didn't work?
- I think that all aspects of this project worked very well and I feel very good about our final exported database and about our biological conclusions! However, it was fairly difficult to begin/finish the final paper during the last week of class. In retrospect, it would have been easier to start some of the earlier sections (like introduction, methods) earlier than later.
- What would you do differently if you could do it all over again?
- I would have began work on the final deliverables earlier and I would have. Additionally, I realize that it would have been better to immediately open up the UniProt XML after the initial export where most of the gene IDs were not present in the final database/in TallyEngine; this would have led to the identification of the problem earlier in the project (and would have led to a more functional build earlier).
- Content: What is the quality of the work?
- I think that the final exported gene database for J2315 is comprehensive and well-built. It exhibits a really good level of quality, however, the TallyEngine results could be slightly tweaked in order to remove the ouput of "ordered locus" type gene ID counts (since ORF data was solely focused, the reference of ordered locus data is not necessary). In short, no major issues exist with the gene database. I also feel that our final presentation was well put together. Regarding the final paper, I feel that it is pretty good but it was written under significant time constraints; I think that it would have shown a higher level of polish if it was started earlier/if more time was available. I feel that only minor issues exist with the final paper.
- Organization: Comment on the organization of the project and of your group's wiki pages.
- I think that our pages are well organized, but I do feel that the testing reports should be placed with their own section/heading. Having weekly summary tables really helped in organizing our workflow and in planning future assignments. With respect to the organization of the project, I would say that we were pretty well organized; all files are accounted for and (from what I saw) all work was documented.
- Completeness: Did your team achieve all of the project objectives? Why or why not?
- We achieved all of our objectives due to a lot of collaboration and due to some luck (we found some pre-existing genMAPP Builder code that involved the same issue that we were encountering; its modification saved us a lot of time and helped us by providing more time for Q/A or biological analysis via GenMAPP). I really do feel that the whole team was very motivated and excited about this project; that made a lot of difference when we were completing goals/objectives.
Reflection on the Process
What did you learn?
- With your head (biological or computer science principles)
- I learned a lot about the functioning, maintenance, and development of biological databases; I also learned a lot about the peer review process through the review of the NAR paper (which was a really exciting project and a new experience). I learned a lot of CS concepts related to text analysis/modification (and database creation) and much about how code works/computers behave. I also learned a lot about the importance of reproducibility and documentation in research through the Baggerly and Coombes example regarding the Duke case (it really drove home the point that data needs to be properly maintained, formatted, and checked; the conclusions of research are as important as the steps that led to them). The Duke case was the first severe case of research fabrication, with serious effects on the health of numerous people, that I became aware of. Through the use of GenMAPP, I came to understand more about bioinformatics and about the value of analyzing gene data with a program like GenMAPP (it made the biological meaning of data much clearer and easier to visualize).
- With your heart (personal qualities and teamwork qualities that make things work or not work)?
- I came to appreciate biology in light of computer science principles (DNA as biological code). I also learned to communicate and collaborate better with teammates; I feel that teamwork was really crucial in this project (much more so than most "class" projects). Having defined roles for each team member made collaboration a necessity and, through this project, I feel that I have become a better team member. I have learned more about the importance of good communication and the value of dividing work (based on skill-set). This class also made me a bit more determined and more keen on independent exploration (through the weekly assignments and the somewhat open-ended final project). This class also made me realize that I can constructively criticize the work of researchers (with respect to content, statistics, and reproducibility). Seeing the weird statistics (strange significance criteria) related to our microarray paper made me realize that there is a significant amount of research that isn't flawless.
- With your hands (technical skills)?
- I learned a lot of skills related to the manipulation of text via the command-line, the process of creation and quality assurance tied to databases, and I feel that I became a lot more fluent in Excel. I have also learned how to process microarray data and how to analyze it, biologically, using a program like GenMAPP (and GO terms). I also learned how to manage and manipulate data via postgres tables.
- What lesson will you take away from this project that you will still use a year from now?
- I really learned the importance of documentation and of research reproducibility (and of good habits related to the management of data). A year from now, I feel that I will still be reading/working with research papers and, using the skills and insights that this class provided, I think I will be able to consciously evaluate the work that was conducted (especially with respect to the provided "workflow"). I think that I will also continue to apply the skills in data management that this class taught (keeping earlier versions, noting dates, and utilizing clear labels). Regarding the group project, I feel that I learned the importance of group communication and of collaboration. A year from now, I will continue to work with other people and I feel that I will still use effective, and clear, group communication in those situations.
Weekly Group Assignments | Shared Group Journals | Project Links | Team Members |
---|---|---|---|
|
|
|
|
Brandon Litvak
BIOL 367, Fall 2015
Weekly Assignments | Individual Journal Pages | Shared Journal Pages |
---|---|---|
|
|
|