LMU BioDB 2015 - User contributions [en]

Nanguiano Deliverables

2016-12-21T12:30:36Z

Nanguiano: /* Deliverables */ added report link

== Deliverables ==
* [[Media:NA_Final_Powerpoint.pptx|Powerpoint]]
* [[Media:NA_Final_Report.docx|Report]]
* [[Nanguiano_Individual_Assessement | Individual Assessment and Reflection]]

== Additional Files ==
* [[Media:NA_V_Cholerae_Read_Me_2016.pdf|V. cholerae Read Me]]
* [https://github.com/lmu-bioinformatics/xmlpipedb/blob/master/GenMAPP%20Gene%20Databases/V.%20cholerae/%23current/Vc-Std_External_20161009.zip?raw=true|Zipped V. Cholerae GDB] (Github link due to large file size)
** If Link is unavailable, it can be found on the [https://github.com/lmu-bioinformatics/xmlpipedb|XMLPipeDB Github] under GenMAPP Gene Databases > V. cholerae >#current (or V. cholerae 20161009, should an update have occurred)

== Links ==
{{Template:Nanguiano}}

Nanguiano Individual Assessement

2016-12-21T12:30:00Z

Nanguiano: added report link

=== Statement of Work ===
On my project, I completed a successful export of [https://github.com/lmu-bioinformatics/xmlpipedb/tree/master/GenMAPP%20Gene%20Databases/V.%20cholerae/%23current V. cholerae] which solved the problems I had faced in the [[Nanguiano_Week_9|initial export I had attempted]]. To perform the export, I modified an existing [https://github.com/lmu-bioinformatics/xmlpipedb/blob/master/godb/xsd/go_daily-obo-xml-manual.dtd GO OBO-XML DTD schema] to work with [https://github.com/lmu-bioinformatics/xmlpipedb/blob/master/GenMAPP%20Gene%20Databases/V.%20cholerae/%23current/source_files/go_daily-termdb.obo-xml a new OBO-XML file] that used a schema that no longer worked with GenMAPP Builder. I ran [https://github.com/lmu-bioinformatics/xmlpipedb/tree/master/xsd2db xsd2db] on the manually edited DTD file to obtain files that could be used in GenMAPP Builder. I ran [https://github.com/lmu-bioinformatics/xmlpipedb/tree/master/godb/tools GODB Post Processor] on the specified [https://github.com/lmu-bioinformatics/xmlpipedb/blob/master/godb/hbm/To.hbm.xml HBM] and [https://github.com/lmu-bioinformatics/xmlpipedb/blob/master/godb/sql/schema.sql SQL] files to allow them to be properly added to GenMAPP Builder. I replaced the old files in GenMAPP Builder with the new files generated by xsd2db and GODBPostProcessor, and overwrote [https://github.com/lmu-bioinformatics/xmlpipedb/blob/master/gmbuilder/sql/gmbuilder.sql GMBuilder.sql] with the contents of the schema.sql file that was edited by GODB Post Processor so that it could build the proper table initially. I wrote [https://github.com/lmu-bioinformatics/xmlpipedb/wiki/Updating-GMBuilder-to-work-GO's-OBO-XML-Files documentation on how to perform updates to GMBuilder] in the event of an OBO-XML schema change in the future. Following all of this, I performed an export of V. cholerae, which completed successfully. Then I performed quality assurance on the export, the details of which can be found in the [[Media:NA_Final_Report.docx|report]]. Following this, I converted the [https://xmlpipedb.cs.lmu.edu XMLPipeDB website] to Jekyll, and moved it to a [https://github.com/lmu-bioinformatics/xmlpipedb/tree/gh-pages gh-pages branch] on github, giving it a new [http://lmu-bioinformatics.github.io/xmlpipedb/ github domain]. The primary website was redirected to the github site link. Following this, I ran [http://schemaspy.sourceforge.net/ SchemaSpy] on the database, the results of which can be found [http://lmu-bioinformatics.github.io/xmlpipedb/assets/schemaspy/uniprotdb/ here for uniprotdb], [http://lmu-bioinformatics.github.io/xmlpipedb/assets/schemaspy/godb/ here for godb], and [http://lmu-bioinformatics.github.io/xmlpipedb/assets/schemaspy/gmbuilderdb/ here for gmbuilder]. I also wrote up [https://github.com/lmu-bioinformatics/xmlpipedb/wiki/Using-SchemaSpy documentation on how to use SchemaSpy] for future reference.

=== Assessment of Project ===
Overall, I would say the project was a success. The initial goal for the semester was completed, and the following goals of converting the XMLPipeDB website to Jekyll and moving it to github were also complete. The most successful aspects of the project were the parts that combined my knowledge with that of Dr. Dionisio and Dr. Dahlquist. Many of the tasks I completed this semester worked with things I'd never worked with before, like ant, hibernate, and xml. Attempting to complete work on my own was often fruitless, as not knowing what I was doing often made it difficult to search for precisely what I needed to do. Working with the professors to learn what they learn instead of trying to do it all myself allowed things to progress much more quickly. If I were to do it again, I would come to them for aid much sooner than I did so as to avoid days of struggle attempting to figure out errors that were very easily recognizable by others.

I am very pleased with the results of my work on the V. cholera export (and corresponding edits to GMBuilder), the XMLPipeDB site, and the SchemaSpy documentation. The export and quality assurance performed seemed to indicate that the edits to GMBuilder will allow for future GO OBO-XML files to run smoothly without error, provided another DTD schema change does not occur. However, should one occur in the future, the documentation written should make the process much smoother for future developers. Due to the use of Github, I would say the project was very organized. Tasks and accomplishments were clearly marked, and questions and concerns were generally kept in their relevant threads. The entire process from beginning to end is documented fully on [https://github.com/lmu-bioinformatics/xmlpipedb/issues github's issues], in issues both opened and closed as of the time of this writing. Additionally, all code written and changed can be found in the [https://github.com/lmu-bioinformatics/xmlpipedb github repository] for XMLPipeDB. It is my belief that Github allowed for this project to run very smoothly and for all correspondences to be recorded without confusion.

I did find the creation of the report to be very difficult due to the nature of the class this semester, at least within the scope of the existing report guidelines. However, for the most part, I feel as though it was fully complete. The only sections not completed were those using GenMAPP (as GenMAPP crashes upon running on my computer), and the database schema diagram (though the SchemaSpy results on the website may be a substitute for this).

=== Reflection on the Process ===
I feel that this project was an excellent learning experience. Almost everything I worked with this semester was entirely new, allowing me to expand my knowledge with every week.
The primary things on the technical and computer science spectrum that I learned were:
* XML structure and elements, and how to add a new element to an XML file
* How to use Ant, and how to build files with it
* How to edit Ant build files to allow files to build that aren't building
* How to build a website with Maven
* How to convert Maven to Jekyll
* How to update GMBuilder to work with new OBO-XML Files
* How to use SchemaSpy
On the personal side, I learned to ask questions when I have them rather than always trying to answer my questions on my own. This is a lesson I have faced many times throughout my college career, but I feel it was especially relevant here. Days spent struggling to decode cryptic error messages or figure out code that simply wasn't working could have been solved in minutes should I just have asked. While I still value striving to answer questions on my own and feel that it is very important to learning new concepts in computer science, I believe there are times in which it is infinitely more beneficial to ask for help. Asking for help doesn't always mean being given the answer, as I often fear it is. It can often mean just being given a direction on where to go in order to discover an answer. This is a lesson I will carry with me for years to come.

File:NA Final Report.docx

2016-12-21T12:28:41Z

Nanguiano: There comes a time when you just have to stop working on something. ~~~~

There comes a time when you just have to stop working on something. [[User:Nanguiano|Nanguiano]] ([[User talk:Nanguiano|talk]]) 04:28, 21 December 2016 (PST)

Nanguiano Individual Assessement

2016-12-21T12:28:14Z

Nanguiano: added a bit more

=== Statement of Work ===
On my project, I completed a successful export of [https://github.com/lmu-bioinformatics/xmlpipedb/tree/master/GenMAPP%20Gene%20Databases/V.%20cholerae/%23current V. cholerae] which solved the problems I had faced in the [[Nanguiano_Week_9|initial export I had attempted]]. To perform the export, I modified an existing [https://github.com/lmu-bioinformatics/xmlpipedb/blob/master/godb/xsd/go_daily-obo-xml-manual.dtd GO OBO-XML DTD schema] to work with [https://github.com/lmu-bioinformatics/xmlpipedb/blob/master/GenMAPP%20Gene%20Databases/V.%20cholerae/%23current/source_files/go_daily-termdb.obo-xml a new OBO-XML file] that used a schema that no longer worked with GenMAPP Builder. I ran [https://github.com/lmu-bioinformatics/xmlpipedb/tree/master/xsd2db xsd2db] on the manually edited DTD file to obtain files that could be used in GenMAPP Builder. I ran [https://github.com/lmu-bioinformatics/xmlpipedb/tree/master/godb/tools GODB Post Processor] on the specified [https://github.com/lmu-bioinformatics/xmlpipedb/blob/master/godb/hbm/To.hbm.xml HBM] and [https://github.com/lmu-bioinformatics/xmlpipedb/blob/master/godb/sql/schema.sql SQL] files to allow them to be properly added to GenMAPP Builder. I replaced the old files in GenMAPP Builder with the new files generated by xsd2db and GODBPostProcessor, and overwrote [https://github.com/lmu-bioinformatics/xmlpipedb/blob/master/gmbuilder/sql/gmbuilder.sql GMBuilder.sql] with the contents of the schema.sql file that was edited by GODB Post Processor so that it could build the proper table initially. I wrote [https://github.com/lmu-bioinformatics/xmlpipedb/wiki/Updating-GMBuilder-to-work-GO's-OBO-XML-Files documentation on how to perform updates to GMBuilder] in the event of an OBO-XML schema change in the future. Following all of this, I performed an export of V. cholerae, which completed successfully. Then I performed quality assurance on the export, the details of which can be found in the

Report 

. Following this, I converted the [https://xmlpipedb.cs.lmu.edu XMLPipeDB website] to Jekyll, and moved it to a [https://github.com/lmu-bioinformatics/xmlpipedb/tree/gh-pages gh-pages branch] on github, giving it a new [http://lmu-bioinformatics.github.io/xmlpipedb/ github domain]. The primary website was redirected to the github site link. Following this, I ran [http://schemaspy.sourceforge.net/ SchemaSpy] on the database, the results of which can be found [http://lmu-bioinformatics.github.io/xmlpipedb/assets/schemaspy/uniprotdb/ here for uniprotdb], [http://lmu-bioinformatics.github.io/xmlpipedb/assets/schemaspy/godb/ here for godb], and [http://lmu-bioinformatics.github.io/xmlpipedb/assets/schemaspy/gmbuilderdb/ here for gmbuilder]. I also wrote up [https://github.com/lmu-bioinformatics/xmlpipedb/wiki/Using-SchemaSpy documentation on how to use SchemaSpy] for future reference.

=== Assessment of Project ===
Overall, I would say the project was a success. The initial goal for the semester was completed, and the following goals of converting the XMLPipeDB website to Jekyll and moving it to github were also complete. The most successful aspects of the project were the parts that combined my knowledge with that of Dr. Dionisio and Dr. Dahlquist. Many of the tasks I completed this semester worked with things I'd never worked with before, like ant, hibernate, and xml. Attempting to complete work on my own was often fruitless, as not knowing what I was doing often made it difficult to search for precisely what I needed to do. Working with the professors to learn what they learn instead of trying to do it all myself allowed things to progress much more quickly. If I were to do it again, I would come to them for aid much sooner than I did so as to avoid days of struggle attempting to figure out errors that were very easily recognizable by others.

I am very pleased with the results of my work on the V. cholera export (and corresponding edits to GMBuilder), the XMLPipeDB site, and the SchemaSpy documentation. The export and quality assurance performed seemed to indicate that the edits to GMBuilder will allow for future GO OBO-XML files to run smoothly without error, provided another DTD schema change does not occur. However, should one occur in the future, the documentation written should make the process much smoother for future developers. Due to the use of Github, I would say the project was very organized. Tasks and accomplishments were clearly marked, and questions and concerns were generally kept in their relevant threads. The entire process from beginning to end is documented fully on [https://github.com/lmu-bioinformatics/xmlpipedb/issues github's issues], in issues both opened and closed as of the time of this writing. Additionally, all code written and changed can be found in the [https://github.com/lmu-bioinformatics/xmlpipedb github repository] for XMLPipeDB. It is my belief that Github allowed for this project to run very smoothly and for all correspondences to be recorded without confusion.

I did find the creation of the report to be very difficult due to the nature of the class this semester, at least within the scope of the existing report guidelines. However, for the most part, I feel as though it was fully complete. The only sections not completed were those using GenMAPP (as GenMAPP crashes upon running on my computer), and the database schema diagram (though the SchemaSpy results on the website may be a substitute for this).

=== Reflection on the Process ===
I feel that this project was an excellent learning experience. Almost everything I worked with this semester was entirely new, allowing me to expand my knowledge with every week.
The primary things on the technical and computer science spectrum that I learned were:
* XML structure and elements, and how to add a new element to an XML file
* How to use Ant, and how to build files with it
* How to edit Ant build files to allow files to build that aren't building
* How to build a website with Maven
* How to convert Maven to Jekyll
* How to update GMBuilder to work with new OBO-XML Files
* How to use SchemaSpy
On the personal side, I learned to ask questions when I have them rather than always trying to answer my questions on my own. This is a lesson I have faced many times throughout my college career, but I feel it was especially relevant here. Days spent struggling to decode cryptic error messages or figure out code that simply wasn't working could have been solved in minutes should I just have asked. While I still value striving to answer questions on my own and feel that it is very important to learning new concepts in computer science, I believe there are times in which it is infinitely more beneficial to ask for help. Asking for help doesn't always mean being given the answer, as I often fear it is. It can often mean just being given a direction on where to go in order to discover an answer. This is a lesson I will carry with me for years to come.

Nanguiano Individual Assessement

2016-12-21T11:37:52Z

Nanguiano: finished this up. just need link to the report

=== Statement of Work ===
On my project, I completed a successful export of [https://github.com/lmu-bioinformatics/xmlpipedb/tree/master/GenMAPP%20Gene%20Databases/V.%20cholerae/%23current V. cholerae] which solved the problems I had faced in the [[Nanguiano_Week_9|initial export I had attempted]]. To perform the export, I modified an existing [https://github.com/lmu-bioinformatics/xmlpipedb/blob/master/godb/xsd/go_daily-obo-xml-manual.dtd GO OBO-XML DTD schema] to work with [https://github.com/lmu-bioinformatics/xmlpipedb/blob/master/GenMAPP%20Gene%20Databases/V.%20cholerae/%23current/source_files/go_daily-termdb.obo-xml a new OBO-XML file] that used a schema that no longer worked with GenMAPP Builder. I ran [https://github.com/lmu-bioinformatics/xmlpipedb/tree/master/xsd2db xsd2db] on the manually edited DTD file to obtain files that could be used in GenMAPP Builder. I ran [https://github.com/lmu-bioinformatics/xmlpipedb/tree/master/godb/tools GODB Post Processor] on the specified [https://github.com/lmu-bioinformatics/xmlpipedb/blob/master/godb/hbm/To.hbm.xml HBM] and [https://github.com/lmu-bioinformatics/xmlpipedb/blob/master/godb/sql/schema.sql SQL] files to allow them to be properly added to GenMAPP Builder. I replaced the old files in GenMAPP Builder with the new files generated by xsd2db and GODBPostProcessor, and overwrote [https://github.com/lmu-bioinformatics/xmlpipedb/blob/master/gmbuilder/sql/gmbuilder.sql GMBuilder.sql] with the contents of the schema.sql file that was edited by GODB Post Processor so that it could build the proper table initially. I wrote [https://github.com/lmu-bioinformatics/xmlpipedb/wiki/Updating-GMBuilder-to-work-GO's-OBO-XML-Files documentation on how to perform updates to GMBuilder] in the event of an OBO-XML schema change in the future. Following all of this, I performed an export of V. cholerae, which completed successfully. Then I performed quality assurance on the export, the details of which can be found in the

Report 

. Following this, I converted the [https://xmlpipedb.cs.lmu.edu XMLPipeDB website] to Jekyll, and moved it to a [https://github.com/lmu-bioinformatics/xmlpipedb/tree/gh-pages gh-pages branch] on github, giving it a new [http://lmu-bioinformatics.github.io/xmlpipedb/ github domain]. The primary website was redirected to the github site link. Following this, I ran [http://schemaspy.sourceforge.net/ SchemaSpy] on the database, the results of which can be found [http://lmu-bioinformatics.github.io/xmlpipedb/assets/schemaspy/uniprotdb/ here for uniprotdb], [http://lmu-bioinformatics.github.io/xmlpipedb/assets/schemaspy/godb/ here for godb], and [http://lmu-bioinformatics.github.io/xmlpipedb/assets/schemaspy/gmbuilderdb/ here for gmbuilder]. I also wrote up [https://github.com/lmu-bioinformatics/xmlpipedb/wiki/Using-SchemaSpy documentation on how to use SchemaSpy] for future reference.

=== Assessment of Project ===
Overall, I would say the project was a success. The initial goal for the semester was completed, and the following goals of converting the XMLPipeDB website to Jekyll and moving it to github were also complete. The most successful aspects of the project were the parts that combined my knowledge with that of Dr. Dionisio and Dr. Dahlquist. Many of the tasks I completed this semester worked with things I'd never worked with before, like ant, hibernate, and xml. Attempting to complete work on my own was often fruitless, as not knowing what I was doing often made it difficult to search for precisely what I needed to do. Working with the professors to learn what they learn instead of trying to do it all myself allowed things to progress much more quickly. If I were to do it again, I would come to them for aid much sooner than I did so as to avoid days of struggle attempting to figure out errors that were very easily recognizable by others.

I am very pleased with the results of my work on the V. cholera export (and corresponding edits to GMBuilder), the XMLPipeDB site, and the SchemaSpy documentation. The export and quality assurance performed seemed to indicate that the edits to GMBuilder will allow for future GO OBO-XML files to run smoothly without error, provided another DTD schema change does not occur. However, should one occur in the future, the documentation written should make the process much smoother for future developers. Due to the use of Github, I would say the project was very organized. Tasks and accomplishments were clearly marked, and questions and concerns were generally kept in their relevant threads. The entire process from beginning to end is documented fully on [https://github.com/lmu-bioinformatics/xmlpipedb/issues github's issues], in issues both opened and closed as of the time of this writing. Additionally, all code written and changed can be found in the [https://github.com/lmu-bioinformatics/xmlpipedb github repository] for XMLPipeDB. It is my belief that Github allowed for this project to run very smoothly and for all correspondences to be recorded without confusion.

I did find the creation of the report to be very difficult due to the nature of the class this semester. However, for the most part, I feel as though it was fully complete. The only sections not completed were those using GenMAPP (as GenMAPP crashes upon running on my computer), and the database schema diagram (though the SchemaSpy results on the website may be a substitute for this).

=== Reflection on the Process ===
I feel that this project was an excellent learning experience. Almost everything I worked with this semester was entirely new, allowing me to expand my knowledge with every week.
The primary things on the technical and computer science spectrum that I learned were:
* XML structure and elements, and how to add a new element to an XML file
* How to use Ant, and how to build files with it
* How to edit Ant build files to allow files to build that aren't building
* How to build a website with Maven
* How to convert Maven to Jekyll
* How to update GMBuilder to work with new OBO-XML Files
* How to use SchemaSpy
On the personal side, I learned to ask questions when I have them rather than always trying to answer my questions on my own. This is a lesson I have faced many times throughout my college career, but I feel it was especially relevant here. Days spent struggling to decode cryptic error messages or figure out code that simply wasn't working could have been solved in minutes should I just have asked. While I still value striving to answer questions on my own and feel that it is very important to learning new concepts in computer science, I believe there are times in which it is infinitely more beneficial to ask for help. Asking for help doesn't always mean being given the answer, as I often fear it is. It can often mean just being given a direction on where to go in order to discover an answer. This is a lesson I will carry with me for years to come.

Nanguiano Individual Assessement

2016-12-21T09:56:36Z

Nanguiano: Added reflection

=== Statement of Work ===
On my project, I completed a successful export of [https://github.com/lmu-bioinformatics/xmlpipedb/tree/master/GenMAPP%20Gene%20Databases/V.%20cholerae/%23current V. cholerae] which solved the problems I had faced in the [[Nanguiano_Week_9|initial export I had attempted]]. To perform the export, I modified an existing [https://github.com/lmu-bioinformatics/xmlpipedb/blob/master/godb/xsd/go_daily-obo-xml-manual.dtd GO OBO-XML DTD schema] to work with [https://github.com/lmu-bioinformatics/xmlpipedb/blob/master/GenMAPP%20Gene%20Databases/V.%20cholerae/%23current/source_files/go_daily-termdb.obo-xml a new OBO-XML file] that used a schema that no longer worked with GenMAPP Builder. I ran [https://github.com/lmu-bioinformatics/xmlpipedb/tree/master/xsd2db xsd2db] on the manually edited DTD file to obtain files that could be used in GenMAPP Builder. I ran [https://github.com/lmu-bioinformatics/xmlpipedb/tree/master/godb/tools GODB Post Processor] on the specified [https://github.com/lmu-bioinformatics/xmlpipedb/blob/master/godb/hbm/To.hbm.xml HBM] and [https://github.com/lmu-bioinformatics/xmlpipedb/blob/master/godb/sql/schema.sql SQL] files to allow them to be properly added to GenMAPP Builder. I replaced the old files in GenMAPP Builder with the new files generated by xsd2db and GODBPostProcessor, and overwrote [https://github.com/lmu-bioinformatics/xmlpipedb/blob/master/gmbuilder/sql/gmbuilder.sql GMBuilder.sql] with the contents of the schema.sql file that was edited by GODB Post Processor so that it could build the proper table initially. I wrote [https://github.com/lmu-bioinformatics/xmlpipedb/wiki/Updating-GMBuilder-to-work-GO's-OBO-XML-Files documentation on how to perform updates to GMBuilder] in the event of an OBO-XML schema change in the future. Following all of this, I performed an export of V. cholerae, which completed successfully. Then I performed quality assurance on the export, the details of which can be found in the

Report 

. Following this, I converted the [https://xmlpipedb.cs.lmu.edu XMLPipeDB website] to Jekyll, and moved it to a [https://github.com/lmu-bioinformatics/xmlpipedb/tree/gh-pages gh-pages branch] on github, giving it a new [http://lmu-bioinformatics.github.io/xmlpipedb/ github domain]. The primary website was redirected to the github site link. Following this, I ran [http://schemaspy.sourceforge.net/ SchemaSpy] on the database, the results of which can be found [http://lmu-bioinformatics.github.io/xmlpipedb/assets/schemaspy/uniprotdb/ here for uniprotdb], [http://lmu-bioinformatics.github.io/xmlpipedb/assets/schemaspy/godb/ here for godb], and [http://lmu-bioinformatics.github.io/xmlpipedb/assets/schemaspy/gmbuilderdb/ here for gmbuilder]. I also wrote up [https://github.com/lmu-bioinformatics/xmlpipedb/wiki/Using-SchemaSpy documentation on how to use SchemaSpy] for future reference.

=== Assessment of Project ===
Overall, I would say the project was a success. The initial goal for the semester was completed, and the following goals of converting the XMLPipeDB website to Jekyll and moving it to github were also complete. The most successful aspects of the project were the parts that combined my knowledge with that of Dr. Dionisio and Dr. Dahlquist. Many of the tasks I completed this semester worked with things I'd never worked with before, like ant, hibernate, and xml. Attempting to complete work on my own was often fruitless, as not knowing what I was doing often made it difficult to search for precisely what I needed to do. Working with the professors to learn what they learn instead of trying to do it all myself allowed things to progress much more quickly. If I were to do it again, I would come to them for aid much sooner than I did so as to avoid days of struggle attempting to figure out errors that were very easily recognizable by others.

I am very pleased with the results of my work on the V. cholera export (and corresponding edits to GMBuilder), the XMLPipeDB site, and the SchemaSpy documentation. The export and quality assurance performed seemed to indicate that the edits to GMBuilder will allow for future GO OBO-XML files to run smoothly without error, provided another DTD schema change does not occur. However, should one occur in the future, the documentation written should make the process much smoother for future developers. Due to the use of Github, I would say the project was very organized. Tasks and accomplishments were clearly marked, and questions and concerns were generally kept in their relevant threads. The entire process from beginning to end is documented fully on [https://github.com/lmu-bioinformatics/xmlpipedb/issues github's issues], in issues both opened and closed as of the time of this writing. Additionally, all code written and changed can be found in the [https://github.com/lmu-bioinformatics/xmlpipedb github repository] for XMLPipeDB. It is my belief that Github allowed for this project to run very smoothly and for all correspondences to be recorded without confusion.



=== Reflection on the Process ===
I feel that this project was an excellent learning experience. Almost everything I worked with this semester was entirely new, allowing me to expand my knowledge with every week.
The primary things on the technical and computer science spectrum that I learned were:
* XML structure and elements, and how to add a new element to an XML file
* How to use Ant, and how to build files with it
* How to edit Ant build files to allow files to build that aren't building
* How to build a website with Maven
* How to convert Maven to Jekyll
* How to update GMBuilder to work with new OBO-XML Files
* How to use SchemaSpy
On the personal side, I learned to ask questions when I have them rather than always trying to answer my questions on my own. This is a lesson I have faced many times throughout my college career, but I feel it was especially relevant here. Days spent struggling to decode cryptic error messages or figure out code that simply wasn't working could have been solved in minutes should I just have asked. While I still value striving to answer questions on my own and feel that it is very important to learning new concepts in computer science, I believe there are times in which it is infinitely more beneficial to ask for help. Asking for help doesn't always mean being given the answer, as I often fear it is. It can often mean just being given a direction on where to go in order to discover an answer. This is a lesson I will carry with me for years to come.

Nanguiano Individual Assessement

2016-12-21T09:49:13Z

Nanguiano: Started assessment of project, will get back to it soon

=== Statement of Work ===
On my project, I completed a successful export of [https://github.com/lmu-bioinformatics/xmlpipedb/tree/master/GenMAPP%20Gene%20Databases/V.%20cholerae/%23current V. cholerae] which solved the problems I had faced in the [[Nanguiano_Week_9|initial export I had attempted]]. To perform the export, I modified an existing [https://github.com/lmu-bioinformatics/xmlpipedb/blob/master/godb/xsd/go_daily-obo-xml-manual.dtd GO OBO-XML DTD schema] to work with [https://github.com/lmu-bioinformatics/xmlpipedb/blob/master/GenMAPP%20Gene%20Databases/V.%20cholerae/%23current/source_files/go_daily-termdb.obo-xml a new OBO-XML file] that used a schema that no longer worked with GenMAPP Builder. I ran [https://github.com/lmu-bioinformatics/xmlpipedb/tree/master/xsd2db xsd2db] on the manually edited DTD file to obtain files that could be used in GenMAPP Builder. I ran [https://github.com/lmu-bioinformatics/xmlpipedb/tree/master/godb/tools GODB Post Processor] on the specified [https://github.com/lmu-bioinformatics/xmlpipedb/blob/master/godb/hbm/To.hbm.xml HBM] and [https://github.com/lmu-bioinformatics/xmlpipedb/blob/master/godb/sql/schema.sql SQL] files to allow them to be properly added to GenMAPP Builder. I replaced the old files in GenMAPP Builder with the new files generated by xsd2db and GODBPostProcessor, and overwrote [https://github.com/lmu-bioinformatics/xmlpipedb/blob/master/gmbuilder/sql/gmbuilder.sql GMBuilder.sql] with the contents of the schema.sql file that was edited by GODB Post Processor so that it could build the proper table initially. I wrote [https://github.com/lmu-bioinformatics/xmlpipedb/wiki/Updating-GMBuilder-to-work-GO's-OBO-XML-Files documentation on how to perform updates to GMBuilder] in the event of an OBO-XML schema change in the future. Following all of this, I performed an export of V. cholerae, which completed successfully. Then I performed quality assurance on the export, the details of which can be found in the

Report 

. Following this, I converted the [https://xmlpipedb.cs.lmu.edu XMLPipeDB website] to Jekyll, and moved it to a [https://github.com/lmu-bioinformatics/xmlpipedb/tree/gh-pages gh-pages branch] on github, giving it a new [http://lmu-bioinformatics.github.io/xmlpipedb/ github domain]. The primary website was redirected to the github site link. Following this, I ran [http://schemaspy.sourceforge.net/ SchemaSpy] on the database, the results of which can be found [http://lmu-bioinformatics.github.io/xmlpipedb/assets/schemaspy/uniprotdb/ here for uniprotdb], [http://lmu-bioinformatics.github.io/xmlpipedb/assets/schemaspy/godb/ here for godb], and [http://lmu-bioinformatics.github.io/xmlpipedb/assets/schemaspy/gmbuilderdb/ here for gmbuilder]. I also wrote up [https://github.com/lmu-bioinformatics/xmlpipedb/wiki/Using-SchemaSpy documentation on how to use SchemaSpy] for future reference.

=== Assessment of Project ===
Overall, I would say the project was a success. The initial goal for the semester was completed, and the following goals of converting the XMLPipeDB website to Jekyll and moving it to github were also complete. The most successful aspects of the project were the parts that combined my knowledge with that of Dr. Dionisio and Dr. Dahlquist. Many of the tasks I completed this semester worked with things I'd never worked with before, like ant, hibernate, and xml. Attempting to complete work on my own was often fruitless, as not knowing what I was doing often made it difficult to search for precisely what I needed to do. Working with the professors to learn what they learn instead of trying to do it all myself allowed things to progress much more quickly. If I were to do it again, I would come to them for aid much sooner than I did so as to avoid days of struggle attempting to figure out errors that were very easily recognizable by others.

I am very pleased with the results of my work on the V. cholera export (and corresponding edits to GMBuilder), the XMLPipeDB site, and the SchemaSpy documentation. The export and quality assurance performed seemed to indicate that the edits to GMBuilder will allow for future GO OBO-XML files to run smoothly without error, provided another DTD schema change does not occur. However, should one occur in the future, the documentation written should make the process much smoother for future developers. Due to the use of Github, I would say the project was very organized. Tasks and accomplishments were clearly marked, and questions and concerns were generally kept in their relevant threads. The entire process from beginning to end is documented fully on [https://github.com/lmu-bioinformatics/xmlpipedb/issues github's issues], in issues both opened and closed as of the time of this writing. Additionally, all code written and changed can be found in the [https://github.com/lmu-bioinformatics/xmlpipedb github repository] for XMLPipeDB. It is my belief that Github allowed for this project to run very smoothly and for all correspondences to be recorded without confusion.

=== Reflection on the Process ===

Nanguiano Individual Assessement

2016-12-21T09:40:25Z

Nanguiano: Added statement of work

Nanguiano Deliverables

2016-12-21T09:21:09Z

Nanguiano: /* Deliverables */ Added links

== Deliverables ==
* [[Media:NA_Final_Powerpoint.pptx|Powerpoint]]
* Report 
* [[Nanguiano_Individual_Assessement | Individual Assessment and Reflection]]

== Additional Files ==
* [[Media:NA_V_Cholerae_Read_Me_2016.pdf|V. cholerae Read Me]]
* [https://github.com/lmu-bioinformatics/xmlpipedb/blob/master/GenMAPP%20Gene%20Databases/V.%20cholerae/%23current/Vc-Std_External_20161009.zip?raw=true|Zipped V. Cholerae GDB] (Github link due to large file size)
** If Link is unavailable, it can be found on the [https://github.com/lmu-bioinformatics/xmlpipedb|XMLPipeDB Github] under GenMAPP Gene Databases > V. cholerae >#current (or V. cholerae 20161009, should an update have occurred)

== Links ==
{{Template:Nanguiano}}

File:NA V Cholerae Read Me 2016.pdf

2016-12-21T09:16:01Z

Nanguiano: Read me for V. Cholerae ~~~~

Read me for V. Cholerae [[User:Nanguiano|Nanguiano]] ([[User talk:Nanguiano|talk]]) 01:16, 21 December 2016 (PST)

Nanguiano Deliverables

2016-12-21T09:11:19Z

Nanguiano: create page, add main links

Template:Nanguiano

2016-12-21T09:11:03Z

Nanguiano: /* Individual Journals */ removed report link

[[User:Nanguiano | Nicole Anguiano]]<br>
[[Main_Page | BIOL 367, Fall 2015]]

=====Assignment Links=====
*[[Week_1 | Week 1 Assignment]]
*[[Week_2 | Week 2 Assignment]]
*[[Week_3 | Week 3 Assignment]]
*[[Week_4 | Week 4 Assignment]]
*[[Week_5 | Week 5 Assignment]]
*[[Week_6 | Week 6 Assignment]]
*[[Week_7 | Week 7 Assignment]]
*[[Week_8 | Week 8 Assignment]]
*[[Week_9 | Week 9 Assignment]]
*[[Week_10 | Week 10 Assignment]]
*[[Week_11 | Week 11 Assignment]]
*[[Week_12 | Week 12 Assignment]]
*[[Week_14 | Week 14 Assignment]]
*[[Week_15 | Week 15 Assignment]]

=====Individual Journals=====
*[[Nanguiano_Week_2 | Individual Journal Week 2]]
*[[Nanguiano_Week_3 | Individual Journal Week 3]]
*[[Nanguiano_Week_4 | Individual Journal Week 4]]
*[[Nanguiano_Week_5 | Individual Journal Week 5]]
*[[Nanguiano_Week_6 | Individual Journal Week 6]]
*[[Nanguiano_Week_7 | Individual Journal Week 7]]
*[[Nanguiano_Week_8 | Individual Journal Week 8]]
*[[Nanguiano_Week_9 | Individual Journal Week 9]]
*[[Nanguiano_Week_10 | Individual Journal Week 10]]
*[[Nanguiano_Week_11 | Individual Journal Week 11]]
*[[Nanguiano_Individual_Assessement | Individual Assessment]]
*[[Nanguiano_Deliverables | Deliverables]]

=====Shared Journals=====
*[[Class_Journal_Week_1 | Class Journal Week 1]]
*[[Class_Journal_Week_2 | Class Journal Week 2]]
*[[Class_Journal_Week_3 | Class Journal Week 3]]
*[[Class_Journal_Week_4 | Class Journal Week 4]]
*[[Class_Journal_Week_5 | Class Journal Week 5]]
*[[Class_Journal_Week_6 | Class Journal Week 6]]
*[[Class_Journal_Week_7 | Class Journal Week 7]]
*[[Class_Journal_Week_8 | Class Journal Week 8]]
*[[Class_Journal_Week_9 | Class Journal Week 9]]
*[[Class_Journal_Week_10 | Class Journal Week 10]]
*[[Class_Journal_Week_11 | Class Journal Week 11]]
*[[Class_Journal_Week_12 | Class Journal Week 12]]
*[[Class_Journal_Week_14 | Class Journal Week 14]]
*[[Class_Journal_Week_15 | Class Journal Week 15]]

[[Category:Journal Entry]]

File:NA Final Powerpoint.pptx

2016-12-21T09:09:34Z

Nanguiano: Final powerpoint for the semester. ~~~~

Final powerpoint for the semester.

[[User:Nanguiano|Nanguiano]] ([[User talk:Nanguiano|talk]]) 01:09, 21 December 2016 (PST)

Template:Nanguiano

2016-12-21T08:48:29Z

Nanguiano: /* Individual Journals */ added deliverables

[[User:Nanguiano | Nicole Anguiano]]<br>
[[Main_Page | BIOL 367, Fall 2015]]

=====Assignment Links=====
*[[Week_1 | Week 1 Assignment]]
*[[Week_2 | Week 2 Assignment]]
*[[Week_3 | Week 3 Assignment]]
*[[Week_4 | Week 4 Assignment]]
*[[Week_5 | Week 5 Assignment]]
*[[Week_6 | Week 6 Assignment]]
*[[Week_7 | Week 7 Assignment]]
*[[Week_8 | Week 8 Assignment]]
*[[Week_9 | Week 9 Assignment]]
*[[Week_10 | Week 10 Assignment]]
*[[Week_11 | Week 11 Assignment]]
*[[Week_12 | Week 12 Assignment]]
*[[Week_14 | Week 14 Assignment]]
*[[Week_15 | Week 15 Assignment]]

=====Individual Journals=====
*[[Nanguiano_Week_2 | Individual Journal Week 2]]
*[[Nanguiano_Week_3 | Individual Journal Week 3]]
*[[Nanguiano_Week_4 | Individual Journal Week 4]]
*[[Nanguiano_Week_5 | Individual Journal Week 5]]
*[[Nanguiano_Week_6 | Individual Journal Week 6]]
*[[Nanguiano_Week_7 | Individual Journal Week 7]]
*[[Nanguiano_Week_8 | Individual Journal Week 8]]
*[[Nanguiano_Week_9 | Individual Journal Week 9]]
*[[Nanguiano_Week_10 | Individual Journal Week 10]]
*[[Nanguiano_Week_11 | Individual Journal Week 11]]
*[[Nanguiano_Individual_Assessement | Individual Assessment]]
*[[Nanguiano_Report | Report]]
*[[Nanguiano_Deliverables | Deliverables]]

=====Shared Journals=====
*[[Class_Journal_Week_1 | Class Journal Week 1]]
*[[Class_Journal_Week_2 | Class Journal Week 2]]
*[[Class_Journal_Week_3 | Class Journal Week 3]]
*[[Class_Journal_Week_4 | Class Journal Week 4]]
*[[Class_Journal_Week_5 | Class Journal Week 5]]
*[[Class_Journal_Week_6 | Class Journal Week 6]]
*[[Class_Journal_Week_7 | Class Journal Week 7]]
*[[Class_Journal_Week_8 | Class Journal Week 8]]
*[[Class_Journal_Week_9 | Class Journal Week 9]]
*[[Class_Journal_Week_10 | Class Journal Week 10]]
*[[Class_Journal_Week_11 | Class Journal Week 11]]
*[[Class_Journal_Week_12 | Class Journal Week 12]]
*[[Class_Journal_Week_14 | Class Journal Week 14]]
*[[Class_Journal_Week_15 | Class Journal Week 15]]

[[Category:Journal Entry]]

Template:Nanguiano

2016-12-21T08:47:55Z

Nanguiano: /* Individual Journals */ renamed statement of work to individual assessment

[[User:Nanguiano | Nicole Anguiano]]<br>
[[Main_Page | BIOL 367, Fall 2015]]

=====Assignment Links=====
*[[Week_1 | Week 1 Assignment]]
*[[Week_2 | Week 2 Assignment]]
*[[Week_3 | Week 3 Assignment]]
*[[Week_4 | Week 4 Assignment]]
*[[Week_5 | Week 5 Assignment]]
*[[Week_6 | Week 6 Assignment]]
*[[Week_7 | Week 7 Assignment]]
*[[Week_8 | Week 8 Assignment]]
*[[Week_9 | Week 9 Assignment]]
*[[Week_10 | Week 10 Assignment]]
*[[Week_11 | Week 11 Assignment]]
*[[Week_12 | Week 12 Assignment]]
*[[Week_14 | Week 14 Assignment]]
*[[Week_15 | Week 15 Assignment]]

=====Individual Journals=====
*[[Nanguiano_Week_2 | Individual Journal Week 2]]
*[[Nanguiano_Week_3 | Individual Journal Week 3]]
*[[Nanguiano_Week_4 | Individual Journal Week 4]]
*[[Nanguiano_Week_5 | Individual Journal Week 5]]
*[[Nanguiano_Week_6 | Individual Journal Week 6]]
*[[Nanguiano_Week_7 | Individual Journal Week 7]]
*[[Nanguiano_Week_8 | Individual Journal Week 8]]
*[[Nanguiano_Week_9 | Individual Journal Week 9]]
*[[Nanguiano_Week_10 | Individual Journal Week 10]]
*[[Nanguiano_Week_11 | Individual Journal Week 11]]
*[[Nanguiano_Individual_Assessement | Individual Assessment]]
*[[Nanguiano_Report | Report]]

=====Shared Journals=====
*[[Class_Journal_Week_1 | Class Journal Week 1]]
*[[Class_Journal_Week_2 | Class Journal Week 2]]
*[[Class_Journal_Week_3 | Class Journal Week 3]]
*[[Class_Journal_Week_4 | Class Journal Week 4]]
*[[Class_Journal_Week_5 | Class Journal Week 5]]
*[[Class_Journal_Week_6 | Class Journal Week 6]]
*[[Class_Journal_Week_7 | Class Journal Week 7]]
*[[Class_Journal_Week_8 | Class Journal Week 8]]
*[[Class_Journal_Week_9 | Class Journal Week 9]]
*[[Class_Journal_Week_10 | Class Journal Week 10]]
*[[Class_Journal_Week_11 | Class Journal Week 11]]
*[[Class_Journal_Week_12 | Class Journal Week 12]]
*[[Class_Journal_Week_14 | Class Journal Week 14]]
*[[Class_Journal_Week_15 | Class Journal Week 15]]

[[Category:Journal Entry]]

Template:Nanguiano

2016-12-21T08:47:23Z

Nanguiano: /* Individual Journals */ added statement of work/report links

[[User:Nanguiano | Nicole Anguiano]]<br>
[[Main_Page | BIOL 367, Fall 2015]]

=====Assignment Links=====
*[[Week_1 | Week 1 Assignment]]
*[[Week_2 | Week 2 Assignment]]
*[[Week_3 | Week 3 Assignment]]
*[[Week_4 | Week 4 Assignment]]
*[[Week_5 | Week 5 Assignment]]
*[[Week_6 | Week 6 Assignment]]
*[[Week_7 | Week 7 Assignment]]
*[[Week_8 | Week 8 Assignment]]
*[[Week_9 | Week 9 Assignment]]
*[[Week_10 | Week 10 Assignment]]
*[[Week_11 | Week 11 Assignment]]
*[[Week_12 | Week 12 Assignment]]
*[[Week_14 | Week 14 Assignment]]
*[[Week_15 | Week 15 Assignment]]

=====Individual Journals=====
*[[Nanguiano_Week_2 | Individual Journal Week 2]]
*[[Nanguiano_Week_3 | Individual Journal Week 3]]
*[[Nanguiano_Week_4 | Individual Journal Week 4]]
*[[Nanguiano_Week_5 | Individual Journal Week 5]]
*[[Nanguiano_Week_6 | Individual Journal Week 6]]
*[[Nanguiano_Week_7 | Individual Journal Week 7]]
*[[Nanguiano_Week_8 | Individual Journal Week 8]]
*[[Nanguiano_Week_9 | Individual Journal Week 9]]
*[[Nanguiano_Week_10 | Individual Journal Week 10]]
*[[Nanguiano_Week_11 | Individual Journal Week 11]]
*[[Nanguiano_Statement_Of_Work | Statement of Work]]
*[[Nanguiano_Report | Report]]

=====Shared Journals=====
*[[Class_Journal_Week_1 | Class Journal Week 1]]
*[[Class_Journal_Week_2 | Class Journal Week 2]]
*[[Class_Journal_Week_3 | Class Journal Week 3]]
*[[Class_Journal_Week_4 | Class Journal Week 4]]
*[[Class_Journal_Week_5 | Class Journal Week 5]]
*[[Class_Journal_Week_6 | Class Journal Week 6]]
*[[Class_Journal_Week_7 | Class Journal Week 7]]
*[[Class_Journal_Week_8 | Class Journal Week 8]]
*[[Class_Journal_Week_9 | Class Journal Week 9]]
*[[Class_Journal_Week_10 | Class Journal Week 10]]
*[[Class_Journal_Week_11 | Class Journal Week 11]]
*[[Class_Journal_Week_12 | Class Journal Week 12]]
*[[Class_Journal_Week_14 | Class Journal Week 14]]
*[[Class_Journal_Week_15 | Class Journal Week 15]]

[[Category:Journal Entry]]

Nanguiano Week 9

2016-09-17T04:43:05Z

Nanguiano: Add link to week 9 files

== Running GenMAPP Builder ==

* The following zip file contains all of the files from this export and QA: [[Media:Week9_Files_NA.zip|Week 9 Files]]

I followed the process from the [[Running_GenMAPP_Builder | Running GenMAPP Builder Tutorial Page]].

=== Software ===

The following software was utilized:
# Any tool that can unpack .gz and .zip files
#* We use [http://www.7-zip.org/ 7-zip]
# [http://www.postgresql.org PostgreSQL] on Windows (http://www.enterprisedb.com/products-services-training/pgdownload)
# GenMAPP Builder version 3.0.0 build 5 (https://github.com/lmu-bioinformatics/xmlpipedb/releases)
# Java JDK 1.8 64-bit
#* [http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html Download page]
#* File to download is: jdk-8u65-windows-x64.exe
# GenMAPP 2 can be downloaded [https://github.com/GenMAPPCS/genmapp here]. The file to download is "GenMAPPv2Setup.exe".
# XMLPipeDB ''match'' utility (https://sourceforge.net/projects/xmlpipedb/files/) for counting IDs in XML files
# Microsoft Access or any other tool that can read .mdb files

==Quality Assurance==

The template for this information was taken from [[Gene_Database_Testing_Report_Sample | the Gene Database Testing Report Sample Page]].

===Export Information===

Version of GenMAPP Builder: '''3.0.0 build 5'''

Computer on which export was run: '''Seaver 120-14, HP LV2311 Windows 7 Enterprise'''

Postgres Database name: '''V_Cholerae_20160916_gmbuilder03b5'''

UniProt XML filename (give filename and upload and link to compressed file): [[Media:Uniprot-organism-243277-NA.zip|uniprot-organism%3A243277.xml]]
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): '''UniProt Release 2016_08'''
* UniProt XML download link: http://www.uniprot.org/uniprot/?query=organism:243277
* Time taken to import: '''2.92 minutes'''
** Note: ''None''

GO OBO-XML filename (give filename and upload and link to compressed file): [[Media:Go_daily-termdb-old-AV-NA.zip|go_daily-termdb.obo-xml]]
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): '''10/27/2015, 2:24am'''
* GO OBO-XML download link: http://geneontology.org/page/download-ontology#Legacy_Downloads
* Time taken to import: '''6.87 minutes'''
* Time taken to process: '''4.26 minutes'''
** Note: ''The GO OBO-XML file and version is from [[Anuvarsh_Week_9 | Anu Varshneya's Week 9 Journal]]. Currently available GO OBO-XML files do not run properly in GenMAPP builder.''

GOA filename (give filename and upload and link to compressed file): [[Media:46.V_cholerae_ATCC_39315_NA.goa.zip|46.V_cholerae_ATCC_39315.goa]]
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): '''07/05/16'''
* GOA download link: http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/46.V_cholerae_ATCC_39315.goa
* Time taken to import: '''0.06 minutes'''
** Note: ''None''

Name of .gdb file (give filename and upload and link to compressed file): [[Media:Vc-Std_20160916_NA.zip|Vc-Std_20160916_NA.gdb]]
* Time taken to export: '''5.15 hours'''
** Start time: '''09/16/16, 12:26:08 PM'''
** End time: '''09/16/16, 7:16:23 PM'''
*** Note: There was a time during the export in which the computer was locked and not running the export, from 2:56pm - 4:19pm.

===TallyEngine===
* The procedure for Tally Engine and the rest of the checks is transcribed from [[How_Do_I_Count_Thee%3F_Let_Me_Count_The_Ways|How Do I Count Thee? Let me Count The Ways]].
* I ran TallyEngine in GenMAPP Builder to record the number of records for UniProt and GO in the XML data and in the Postgres databases.
** I chose the menu item Tallies > Run XML and Database Tallies for UniProt and GO...
[[Image:Tallyengine_Vcholerae_results_NA.png]]
** Count for Ordered Locus: 3,831

=== Using XMLPipeDB match to Validate the XML Results from the TallyEngine===

* After entering into the directory containing the match utility and the uniprot xml file, I ran the command
java -jar xmlpipedb-match-1.1.1.jar "VC_[0-9][0-9][0-9][0-9]" < uniprot-organism%3A243277.xml
* It returned a total of 2,738 unique matches.
[[Image:Xmlpipedb_match_results_NA.png]]
* My results were not the same as my results for Tally Engine. Tally Engine returned 3,789 unique matches, while this returned 2,738. This is likely due to potentially due to differently formatted names in the xml file, as we are only checking for names in the structure VC_####.

=== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine===

*Next, I checked the values in SQL to see if they would validate the results from Tally Engine or the match utility.
* In PgAdminIII, I ran the command:
select count(*) from genenametype where type = 'ordered locus' and value ~ 'VC_[0-9][0-9][0-9][0-9]';
[[Image:SQL_Query_Results_NA.png]]
* The results were not the same as the ones reported by Tally Engine, nor were they the same as the ones reported by the match utility. It returned 2,737 matches, one less than the result returned by the match utility. While it is less than the Tally Engine perhaps due to a different naming convention, I don't fully understand why it would be off by one from the match utility.

===OriginalRowCounts Comparison===

* Within the .gdb file, I looked at the OriginalRowCounts table to see if the database had the expected tables with the expected number of records. I compared it with a benchmark .gdb file.
* Benchmark .gdb file (2010): [[Media:Vc-Std_External_20101022.gdb|Vc-Std_External_20101022]]

*Original GDB:
<gallery>
Image:Week9_Benchmark_gdb_tables_1_NA.png
Image:Week9_Benchmark_gdb_tables_2_NA.png
</gallery>

*New GDB:
<gallery>
Image:Week9_New_gdb_tables_1_NA.png
Image:Week9_New_gdb_tables_2_NA.png
</gallery>

* The new gdb had 10 additional tables.
* The have the same number of OrderedLocusNames, 7,664, but most of the other values are different.

===Visual Inspection===

Perform visual inspection of individual tables to see if there are any problems.

* Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?
** No. The majority of systems to not have a date included in their date field.
* Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?
** The UniProt IDs all look correct.
** The RefSeq IDs also all look correct.
** The IDs in OrderedLocusNames appear to take two forms: one which is of the form VC_####, and one of the form VC_A####. This seems to prove the hypothesis that the difference between Tally Engine and the match/SQL results was due to a difference in searched names.
*** To test this, the command <code>java -jar xmlpipedb-match-1.1.1.jar "VC_A?[0-9][0-9][0-9][0-9]" < uniprot-organism%3A243277.xml</code> returns 3,831 unique matches, the same as the Tally Engine. Running the command <code>select count(*) from genenametype where type = 'ordered locus' and value ~ 'VC_A?[0-9][0-9][0-9][0-9';</code> in SQL also returns 3,831. Therefore, I am decently confident that this was the reason the utilities were returning different results. I am still uncertain as to why SQL returned one less value than the match utility at first, before this new filter.

===.gdb Use in GenMAPP===

====Putting a gene on the MAPP using the GeneFinder window====

* In the main GenMAPP Drafting Board window, left-click on the icon for "Gene" in the upper left corner of the window. Click on the Drafting Board to place the Gene on the MAPP. Now, right-click on the gene to access the GeneFinder window. Type or paste a gene ID into the Gene ID field. Select the appropriate Gene ID system from the drop-down menu and click the Search button.
** I chose the Gene ID "VC0017" from the "OrderedLocusNames" system.
* Double-clicking the gene to get the backpage returned the following webpage:
[[Image:Week9_VC0017_Backpage_NA.png]]
* All crosslinks that were supposed to be present were present.

====Creating an Expression Dataset in the Expression Dataset Manager====

* The following instructions came from [http://www.openwetware.org/wiki/BIOL367/F10:GenMAPP_and_MAPPFinder_Protocols|The GenMAPP and MAPPFinder Protocols on OpenWetWare].

* I selected "Data" then "Expression Dataset Manager". I selected "Expression Dataset" then "New Dataset". I first attempted using the tab-delineated text file [[Media:Merrell_Compiled_Raw_Data_Vibrio_NA_20151015_GenMAPP.txt| I had created last year]], but it would not function properly with GenMAPP. I defaulted to using a file I was certain functioned properly, the Merrel compiled raw data from [[Media:Week8HW-AV.zip|Anu Varshneya's Week 8 Files]].
* No columns contained text, so I selected "Ok", and waited for the raw expression data to be converted.
** '''121 errors were detected in the raw data'''.
** All of the error codes in the file stated ''Gene not found in OrderedLocusNames or any related system''.
** 5,221 total IDs were imported.
** I checked the UniProt-OrderedLocusNames table for four genes listed as errors: VC2209, VC2338, VCA0595, and VC0284. None of them were found in the table. From this, it can be assumed that the missing genes are not part of the UniPort XML.
* I typed in "AvgLogFCAll" as the Name of the Color Set, and selected "Avg_LogFC_All" as the gene value.
* I defined two criteria:
** Increased: <code>[Avg_LogFC_all] > 0.25 AND [Pvalue] < 0.05</code>, Color: Green
** Decreased: <code>[Avg_LogFC_all] < -0.25 AND [Pvalue] < 0.05</code>, Color: Red
* Then, I selected "Expression Datasets" -> "Save".

====Coloring a MAPP with expression data====

* I launched the MAPPFinder tool from Tools -> MAPPFinder. Then I selected "Calculate New Results". The .gex file was correct, so I hit "Ok."
* I selected the color set "AvgLogFCAll", and the Criteria "Increased". I selected the box next to "Gene Ontology" and "Click here to calculate p values".
* I chose to save the results as "MAPPFinder-Increased-09162016-NA". Then I selected "Run MAPPFinder".

====Running MAPPFinder====

* After running MAPPFinder, these were my main results:
[[Image:Week9_MAPPFinder_01_Main_NA.png]]
* Opening the biological process dropdown gave this result:
[[Image:Week9_MAPPFinder_BioProc_NA.png]]
* I attempted to open one of the processes in MAPPFinder but obtained the following error:
[[Image:Week9_MAPPFinder_Error_NA.png]]
* Selecting Show Ranked List showed the following result:
[[Image:Week9_MAPPFinder_GO_results_NA.png]]

== Links ==
{{Template:Nanguiano}}

File:Week9 Files NA.zip

2016-09-17T04:41:27Z

Nanguiano: files for week 9 ~~~~

files for week 9 [[User:Nanguiano|Nanguiano]] ([[User talk:Nanguiano|talk]]) 21:41, 16 September 2016 (PDT)

Nanguiano Week 9

2016-09-17T04:37:20Z

Nanguiano: /* Running MAPPFinder */ Added images

== Running GenMAPP Builder ==

I followed the process from the [[Running_GenMAPP_Builder | Running GenMAPP Builder Tutorial Page]].

=== Software ===

The following software was utilized:
# Any tool that can unpack .gz and .zip files
#* We use [http://www.7-zip.org/ 7-zip]
# [http://www.postgresql.org PostgreSQL] on Windows (http://www.enterprisedb.com/products-services-training/pgdownload)
# GenMAPP Builder version 3.0.0 build 5 (https://github.com/lmu-bioinformatics/xmlpipedb/releases)
# Java JDK 1.8 64-bit
#* [http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html Download page]
#* File to download is: jdk-8u65-windows-x64.exe
# GenMAPP 2 can be downloaded [https://github.com/GenMAPPCS/genmapp here]. The file to download is "GenMAPPv2Setup.exe".
# XMLPipeDB ''match'' utility (https://sourceforge.net/projects/xmlpipedb/files/) for counting IDs in XML files
# Microsoft Access or any other tool that can read .mdb files

==Quality Assurance==

The template for this information was taken from [[Gene_Database_Testing_Report_Sample | the Gene Database Testing Report Sample Page]].

===Export Information===

Version of GenMAPP Builder: '''3.0.0 build 5'''

Computer on which export was run: '''Seaver 120-14, HP LV2311 Windows 7 Enterprise'''

Postgres Database name: '''V_Cholerae_20160916_gmbuilder03b5'''

UniProt XML filename (give filename and upload and link to compressed file): [[Media:Uniprot-organism-243277-NA.zip|uniprot-organism%3A243277.xml]]
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): '''UniProt Release 2016_08'''
* UniProt XML download link: http://www.uniprot.org/uniprot/?query=organism:243277
* Time taken to import: '''2.92 minutes'''
** Note: ''None''

GO OBO-XML filename (give filename and upload and link to compressed file): [[Media:Go_daily-termdb-old-AV-NA.zip|go_daily-termdb.obo-xml]]
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): '''10/27/2015, 2:24am'''
* GO OBO-XML download link: http://geneontology.org/page/download-ontology#Legacy_Downloads
* Time taken to import: '''6.87 minutes'''
* Time taken to process: '''4.26 minutes'''
** Note: ''The GO OBO-XML file and version is from [[Anuvarsh_Week_9 | Anu Varshneya's Week 9 Journal]]. Currently available GO OBO-XML files do not run properly in GenMAPP builder.''

GOA filename (give filename and upload and link to compressed file): [[Media:46.V_cholerae_ATCC_39315_NA.goa.zip|46.V_cholerae_ATCC_39315.goa]]
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): '''07/05/16'''
* GOA download link: http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/46.V_cholerae_ATCC_39315.goa
* Time taken to import: '''0.06 minutes'''
** Note: ''None''

Name of .gdb file (give filename and upload and link to compressed file): [[Media:Vc-Std_20160916_NA.zip|Vc-Std_20160916_NA.gdb]]
* Time taken to export: '''5.15 hours'''
** Start time: '''09/16/16, 12:26:08 PM'''
** End time: '''09/16/16, 7:16:23 PM'''
*** Note: There was a time during the export in which the computer was locked and not running the export, from 2:56pm - 4:19pm.

===TallyEngine===
* The procedure for Tally Engine and the rest of the checks is transcribed from [[How_Do_I_Count_Thee%3F_Let_Me_Count_The_Ways|How Do I Count Thee? Let me Count The Ways]].
* I ran TallyEngine in GenMAPP Builder to record the number of records for UniProt and GO in the XML data and in the Postgres databases.
** I chose the menu item Tallies > Run XML and Database Tallies for UniProt and GO...
[[Image:Tallyengine_Vcholerae_results_NA.png]]
** Count for Ordered Locus: 3,831

=== Using XMLPipeDB match to Validate the XML Results from the TallyEngine===

* After entering into the directory containing the match utility and the uniprot xml file, I ran the command
java -jar xmlpipedb-match-1.1.1.jar "VC_[0-9][0-9][0-9][0-9]" < uniprot-organism%3A243277.xml
* It returned a total of 2,738 unique matches.
[[Image:Xmlpipedb_match_results_NA.png]]
* My results were not the same as my results for Tally Engine. Tally Engine returned 3,789 unique matches, while this returned 2,738. This is likely due to potentially due to differently formatted names in the xml file, as we are only checking for names in the structure VC_####.

=== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine===

*Next, I checked the values in SQL to see if they would validate the results from Tally Engine or the match utility.
* In PgAdminIII, I ran the command:
select count(*) from genenametype where type = 'ordered locus' and value ~ 'VC_[0-9][0-9][0-9][0-9]';
[[Image:SQL_Query_Results_NA.png]]
* The results were not the same as the ones reported by Tally Engine, nor were they the same as the ones reported by the match utility. It returned 2,737 matches, one less than the result returned by the match utility. While it is less than the Tally Engine perhaps due to a different naming convention, I don't fully understand why it would be off by one from the match utility.

===OriginalRowCounts Comparison===

* Within the .gdb file, I looked at the OriginalRowCounts table to see if the database had the expected tables with the expected number of records. I compared it with a benchmark .gdb file.
* Benchmark .gdb file (2010): [[Media:Vc-Std_External_20101022.gdb|Vc-Std_External_20101022]]

*Original GDB:
<gallery>
Image:Week9_Benchmark_gdb_tables_1_NA.png
Image:Week9_Benchmark_gdb_tables_2_NA.png
</gallery>

*New GDB:
<gallery>
Image:Week9_New_gdb_tables_1_NA.png
Image:Week9_New_gdb_tables_2_NA.png
</gallery>

* The new gdb had 10 additional tables.
* The have the same number of OrderedLocusNames, 7,664, but most of the other values are different.

===Visual Inspection===

Perform visual inspection of individual tables to see if there are any problems.

* Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?
** No. The majority of systems to not have a date included in their date field.
* Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?
** The UniProt IDs all look correct.
** The RefSeq IDs also all look correct.
** The IDs in OrderedLocusNames appear to take two forms: one which is of the form VC_####, and one of the form VC_A####. This seems to prove the hypothesis that the difference between Tally Engine and the match/SQL results was due to a difference in searched names.
*** To test this, the command <code>java -jar xmlpipedb-match-1.1.1.jar "VC_A?[0-9][0-9][0-9][0-9]" < uniprot-organism%3A243277.xml</code> returns 3,831 unique matches, the same as the Tally Engine. Running the command <code>select count(*) from genenametype where type = 'ordered locus' and value ~ 'VC_A?[0-9][0-9][0-9][0-9';</code> in SQL also returns 3,831. Therefore, I am decently confident that this was the reason the utilities were returning different results. I am still uncertain as to why SQL returned one less value than the match utility at first, before this new filter.

===.gdb Use in GenMAPP===

====Putting a gene on the MAPP using the GeneFinder window====

* In the main GenMAPP Drafting Board window, left-click on the icon for "Gene" in the upper left corner of the window. Click on the Drafting Board to place the Gene on the MAPP. Now, right-click on the gene to access the GeneFinder window. Type or paste a gene ID into the Gene ID field. Select the appropriate Gene ID system from the drop-down menu and click the Search button.
** I chose the Gene ID "VC0017" from the "OrderedLocusNames" system.
* Double-clicking the gene to get the backpage returned the following webpage:
[[Image:Week9_VC0017_Backpage_NA.png]]
* All crosslinks that were supposed to be present were present.

====Creating an Expression Dataset in the Expression Dataset Manager====

* The following instructions came from [http://www.openwetware.org/wiki/BIOL367/F10:GenMAPP_and_MAPPFinder_Protocols|The GenMAPP and MAPPFinder Protocols on OpenWetWare].

* I selected "Data" then "Expression Dataset Manager". I selected "Expression Dataset" then "New Dataset". I first attempted using the tab-delineated text file [[Media:Merrell_Compiled_Raw_Data_Vibrio_NA_20151015_GenMAPP.txt| I had created last year]], but it would not function properly with GenMAPP. I defaulted to using a file I was certain functioned properly, the Merrel compiled raw data from [[Media:Week8HW-AV.zip|Anu Varshneya's Week 8 Files]].
* No columns contained text, so I selected "Ok", and waited for the raw expression data to be converted.
** '''121 errors were detected in the raw data'''.
** All of the error codes in the file stated ''Gene not found in OrderedLocusNames or any related system''.
** 5,221 total IDs were imported.
** I checked the UniProt-OrderedLocusNames table for four genes listed as errors: VC2209, VC2338, VCA0595, and VC0284. None of them were found in the table. From this, it can be assumed that the missing genes are not part of the UniPort XML.
* I typed in "AvgLogFCAll" as the Name of the Color Set, and selected "Avg_LogFC_All" as the gene value.
* I defined two criteria:
** Increased: <code>[Avg_LogFC_all] > 0.25 AND [Pvalue] < 0.05</code>, Color: Green
** Decreased: <code>[Avg_LogFC_all] < -0.25 AND [Pvalue] < 0.05</code>, Color: Red
* Then, I selected "Expression Datasets" -> "Save".

====Coloring a MAPP with expression data====

* I launched the MAPPFinder tool from Tools -> MAPPFinder. Then I selected "Calculate New Results". The .gex file was correct, so I hit "Ok."
* I selected the color set "AvgLogFCAll", and the Criteria "Increased". I selected the box next to "Gene Ontology" and "Click here to calculate p values".
* I chose to save the results as "MAPPFinder-Increased-09162016-NA". Then I selected "Run MAPPFinder".

====Running MAPPFinder====

* After running MAPPFinder, these were my main results:
[[Image:Week9_MAPPFinder_01_Main_NA.png]]
* Opening the biological process dropdown gave this result:
[[Image:Week9_MAPPFinder_BioProc_NA.png]]
* I attempted to open one of the processes in MAPPFinder but obtained the following error:
[[Image:Week9_MAPPFinder_Error_NA.png]]
* Selecting Show Ranked List showed the following result:
[[Image:Week9_MAPPFinder_GO_results_NA.png]]

== Links ==
{{Template:Nanguiano}}

File:Week9 MAPPFinder GO results NA.png

2016-09-17T04:36:37Z

Nanguiano: the GO results sorted by z score and p value. ~~~~

the GO results sorted by z score and p value. [[User:Nanguiano|Nanguiano]] ([[User talk:Nanguiano|talk]]) 21:36, 16 September 2016 (PDT)

File:Week9 MAPPFinder Error NA.png

2016-09-17T04:34:47Z

Nanguiano: An error that occurred when attempting to run a GO Term in MAPPFinder. ~~~~

An error that occurred when attempting to run a GO Term in MAPPFinder. [[User:Nanguiano|Nanguiano]] ([[User talk:Nanguiano|talk]]) 21:34, 16 September 2016 (PDT)

File:Week9 MAPPFinder BioProc NA.png

2016-09-17T04:32:28Z

Nanguiano: Go terms under Biological Processes. ~~~~

Go terms under Biological Processes. [[User:Nanguiano|Nanguiano]] ([[User talk:Nanguiano|talk]]) 21:32, 16 September 2016 (PDT)

File:Week9 MAPPFinder 01 Main NA.png

2016-09-17T04:31:03Z

Nanguiano: The main GO terms in mappfinder. ~~~~

The main GO terms in mappfinder. [[User:Nanguiano|Nanguiano]] ([[User talk:Nanguiano|talk]]) 21:31, 16 September 2016 (PDT)

Nanguiano Week 9

2016-09-17T04:29:39Z

Nanguiano: /* Coloring a MAPP with expression data */ Added information

Nanguiano Week 9

2016-09-17T04:11:41Z

Nanguiano: /* Creating an Expression Dataset in the Expression Dataset Manager */ Added information

== Running GenMAPP Builder ==

I followed the process from the [[Running_GenMAPP_Builder | Running GenMAPP Builder Tutorial Page]].

=== Software ===

The following software was utilized:
# Any tool that can unpack .gz and .zip files
#* We use [http://www.7-zip.org/ 7-zip]
# [http://www.postgresql.org PostgreSQL] on Windows (http://www.enterprisedb.com/products-services-training/pgdownload)
# GenMAPP Builder version 3.0.0 build 5 (https://github.com/lmu-bioinformatics/xmlpipedb/releases)
# Java JDK 1.8 64-bit
#* [http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html Download page]
#* File to download is: jdk-8u65-windows-x64.exe
# GenMAPP 2 can be downloaded [https://github.com/GenMAPPCS/genmapp here]. The file to download is "GenMAPPv2Setup.exe".
# XMLPipeDB ''match'' utility (https://sourceforge.net/projects/xmlpipedb/files/) for counting IDs in XML files
# Microsoft Access or any other tool that can read .mdb files

==Quality Assurance==

The template for this information was taken from [[Gene_Database_Testing_Report_Sample | the Gene Database Testing Report Sample Page]].

===Export Information===

Version of GenMAPP Builder: '''3.0.0 build 5'''

Computer on which export was run: '''Seaver 120-14, HP LV2311 Windows 7 Enterprise'''

Postgres Database name: '''V_Cholerae_20160916_gmbuilder03b5'''

UniProt XML filename (give filename and upload and link to compressed file): [[Media:Uniprot-organism-243277-NA.zip|uniprot-organism%3A243277.xml]]
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): '''UniProt Release 2016_08'''
* UniProt XML download link: http://www.uniprot.org/uniprot/?query=organism:243277
* Time taken to import: '''2.92 minutes'''
** Note: ''None''

GO OBO-XML filename (give filename and upload and link to compressed file): [[Media:Go_daily-termdb-old-AV-NA.zip|go_daily-termdb.obo-xml]]
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): '''10/27/2015, 2:24am'''
* GO OBO-XML download link: http://geneontology.org/page/download-ontology#Legacy_Downloads
* Time taken to import: '''6.87 minutes'''
* Time taken to process: '''4.26 minutes'''
** Note: ''The GO OBO-XML file and version is from [[Anuvarsh_Week_9 | Anu Varshneya's Week 9 Journal]]. Currently available GO OBO-XML files do not run properly in GenMAPP builder.''

GOA filename (give filename and upload and link to compressed file): [[Media:46.V_cholerae_ATCC_39315_NA.goa.zip|46.V_cholerae_ATCC_39315.goa]]
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): '''07/05/16'''
* GOA download link: http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/46.V_cholerae_ATCC_39315.goa
* Time taken to import: '''0.06 minutes'''
** Note: ''None''

Name of .gdb file (give filename and upload and link to compressed file): [[Media:Vc-Std_20160916_NA.zip|Vc-Std_20160916_NA.gdb]]
* Time taken to export: '''5.15 hours'''
** Start time: '''09/16/16, 12:26:08 PM'''
** End time: '''09/16/16, 7:16:23 PM'''
*** Note: There was a time during the export in which the computer was locked and not running the export, from 2:56pm - 4:19pm.

===TallyEngine===
* The procedure for Tally Engine and the rest of the checks is transcribed from [[How_Do_I_Count_Thee%3F_Let_Me_Count_The_Ways|How Do I Count Thee? Let me Count The Ways]].
* I ran TallyEngine in GenMAPP Builder to record the number of records for UniProt and GO in the XML data and in the Postgres databases.
** I chose the menu item Tallies > Run XML and Database Tallies for UniProt and GO...
[[Image:Tallyengine_Vcholerae_results_NA.png]]
** Count for Ordered Locus: 3,831

=== Using XMLPipeDB match to Validate the XML Results from the TallyEngine===

* After entering into the directory containing the match utility and the uniprot xml file, I ran the command
java -jar xmlpipedb-match-1.1.1.jar "VC_[0-9][0-9][0-9][0-9]" < uniprot-organism%3A243277.xml
* It returned a total of 2,738 unique matches.
[[Image:Xmlpipedb_match_results_NA.png]]
* My results were not the same as my results for Tally Engine. Tally Engine returned 3,789 unique matches, while this returned 2,738. This is likely due to potentially due to differently formatted names in the xml file, as we are only checking for names in the structure VC_####.

=== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine===

*Next, I checked the values in SQL to see if they would validate the results from Tally Engine or the match utility.
* In PgAdminIII, I ran the command:
select count(*) from genenametype where type = 'ordered locus' and value ~ 'VC_[0-9][0-9][0-9][0-9]';
[[Image:SQL_Query_Results_NA.png]]
* The results were not the same as the ones reported by Tally Engine, nor were they the same as the ones reported by the match utility. It returned 2,737 matches, one less than the result returned by the match utility. While it is less than the Tally Engine perhaps due to a different naming convention, I don't fully understand why it would be off by one from the match utility.

===OriginalRowCounts Comparison===

* Within the .gdb file, I looked at the OriginalRowCounts table to see if the database had the expected tables with the expected number of records. I compared it with a benchmark .gdb file.
* Benchmark .gdb file (2010): [[Media:Vc-Std_External_20101022.gdb|Vc-Std_External_20101022]]

*Original GDB:
<gallery>
Image:Week9_Benchmark_gdb_tables_1_NA.png
Image:Week9_Benchmark_gdb_tables_2_NA.png
</gallery>

*New GDB:
<gallery>
Image:Week9_New_gdb_tables_1_NA.png
Image:Week9_New_gdb_tables_2_NA.png
</gallery>

* The new gdb had 10 additional tables.
* The have the same number of OrderedLocusNames, 7,664, but most of the other values are different.

===Visual Inspection===

Perform visual inspection of individual tables to see if there are any problems.

* Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?
** No. The majority of systems to not have a date included in their date field.
* Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?
** The UniProt IDs all look correct.
** The RefSeq IDs also all look correct.
** The IDs in OrderedLocusNames appear to take two forms: one which is of the form VC_####, and one of the form VC_A####. This seems to prove the hypothesis that the difference between Tally Engine and the match/SQL results was due to a difference in searched names.
*** To test this, the command <code>java -jar xmlpipedb-match-1.1.1.jar "VC_A?[0-9][0-9][0-9][0-9]" < uniprot-organism%3A243277.xml</code> returns 3,831 unique matches, the same as the Tally Engine. Running the command <code>select count(*) from genenametype where type = 'ordered locus' and value ~ 'VC_A?[0-9][0-9][0-9][0-9';</code> in SQL also returns 3,831. Therefore, I am decently confident that this was the reason the utilities were returning different results. I am still uncertain as to why SQL returned one less value than the match utility at first, before this new filter.

===.gdb Use in GenMAPP===

====Putting a gene on the MAPP using the GeneFinder window====

* In the main GenMAPP Drafting Board window, left-click on the icon for "Gene" in the upper left corner of the window. Click on the Drafting Board to place the Gene on the MAPP. Now, right-click on the gene to access the GeneFinder window. Type or paste a gene ID into the Gene ID field. Select the appropriate Gene ID system from the drop-down menu and click the Search button.
** I chose the Gene ID "VC0017" from the "OrderedLocusNames" system.
* Double-clicking the gene to get the backpage returned the following webpage:
[[Image:Week9_VC0017_Backpage_NA.png]]
* All crosslinks that were supposed to be present were present.

====Creating an Expression Dataset in the Expression Dataset Manager====

* The following instructions came from [http://www.openwetware.org/wiki/BIOL367/F10:GenMAPP_and_MAPPFinder_Protocols|The GenMAPP and MAPPFinder Protocols on OpenWetWare].

* I selected "Data" then "Expression Dataset Manager". I selected "Expression Dataset" then "New Dataset". I first attempted using the tab-delineated text file [[Media:Merrell_Compiled_Raw_Data_Vibrio_NA_20151015_GenMAPP.txt| I had created last year]], but it would not function properly with GenMAPP. I defaulted to using a file I was certain functioned properly, the Merrel compiled raw data from [[Media:Week8HW-AV.zip|Anu Varshneya's Week 8 Files]].
* No columns contained text, so I selected "Ok", and waited for the raw expression data to be converted.
** '''121 errors were detected in the raw data'''.
** All of the error codes in the file stated ''Gene not found in OrderedLocusNames or any related system''.
** 5,221 total IDs were imported.
** I checked the UniProt-OrderedLocusNames table for four genes listed as errors: VC2209, VC2338, VCA0595, and VC0284. None of them were found in the table. From this, it can be assumed that the missing genes are not part of the UniPort XML.
* I typed in "AvgLogFCAll" as the Name of the Color Set, and selected "Avg_LogFC_All" as the gene value.
* I defined two criteria:
** Increased: <code>[Avg_LogFC_all] > 0.25 AND [Pvalue] < 0.05</code>, Color: Green
** Decreased: <code>[Avg_LogFC_all] < -0.25 AND [Pvalue] < 0.05</code>, Color: Red
* Then, I selected "Expression Datasets" -> "Save".

====Coloring a MAPP with expression data====

Note:

====Running MAPPFinder====

Note:

== Links ==
{{Template:Nanguiano}}

Nanguiano Week 9

2016-09-17T03:36:32Z

Nanguiano: /* OriginalRowCounts Comparison */ Personalized steps

== Running GenMAPP Builder ==

I followed the process from the [[Running_GenMAPP_Builder | Running GenMAPP Builder Tutorial Page]].

=== Software ===

The following software was utilized:
# Any tool that can unpack .gz and .zip files
#* We use [http://www.7-zip.org/ 7-zip]
# [http://www.postgresql.org PostgreSQL] on Windows (http://www.enterprisedb.com/products-services-training/pgdownload)
# GenMAPP Builder version 3.0.0 build 5 (https://github.com/lmu-bioinformatics/xmlpipedb/releases)
# Java JDK 1.8 64-bit
#* [http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html Download page]
#* File to download is: jdk-8u65-windows-x64.exe
# GenMAPP 2 can be downloaded [https://github.com/GenMAPPCS/genmapp here]. The file to download is "GenMAPPv2Setup.exe".
# XMLPipeDB ''match'' utility (https://sourceforge.net/projects/xmlpipedb/files/) for counting IDs in XML files
# Microsoft Access or any other tool that can read .mdb files

==Quality Assurance==

The template for this information was taken from [[Gene_Database_Testing_Report_Sample | the Gene Database Testing Report Sample Page]].

===Export Information===

Version of GenMAPP Builder: '''3.0.0 build 5'''

Computer on which export was run: '''Seaver 120-14, HP LV2311 Windows 7 Enterprise'''

Postgres Database name: '''V_Cholerae_20160916_gmbuilder03b5'''

UniProt XML filename (give filename and upload and link to compressed file): [[Media:Uniprot-organism-243277-NA.zip|uniprot-organism%3A243277.xml]]
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): '''UniProt Release 2016_08'''
* UniProt XML download link: http://www.uniprot.org/uniprot/?query=organism:243277
* Time taken to import: '''2.92 minutes'''
** Note: ''None''

GO OBO-XML filename (give filename and upload and link to compressed file): [[Media:Go_daily-termdb-old-AV-NA.zip|go_daily-termdb.obo-xml]]
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): '''10/27/2015, 2:24am'''
* GO OBO-XML download link: http://geneontology.org/page/download-ontology#Legacy_Downloads
* Time taken to import: '''6.87 minutes'''
* Time taken to process: '''4.26 minutes'''
** Note: ''The GO OBO-XML file and version is from [[Anuvarsh_Week_9 | Anu Varshneya's Week 9 Journal]]. Currently available GO OBO-XML files do not run properly in GenMAPP builder.''

GOA filename (give filename and upload and link to compressed file): [[Media:46.V_cholerae_ATCC_39315_NA.goa.zip|46.V_cholerae_ATCC_39315.goa]]
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): '''07/05/16'''
* GOA download link: http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/46.V_cholerae_ATCC_39315.goa
* Time taken to import: '''0.06 minutes'''
** Note: ''None''

Name of .gdb file (give filename and upload and link to compressed file): [[Media:Vc-Std_20160916_NA.zip|Vc-Std_20160916_NA.gdb]]
* Time taken to export: '''5.15 hours'''
** Start time: '''09/16/16, 12:26:08 PM'''
** End time: '''09/16/16, 7:16:23 PM'''
*** Note: There was a time during the export in which the computer was locked and not running the export, from 2:56pm - 4:19pm.

===TallyEngine===
* The procedure for Tally Engine and the rest of the checks is transcribed from [[How_Do_I_Count_Thee%3F_Let_Me_Count_The_Ways|How Do I Count Thee? Let me Count The Ways]].
* I ran TallyEngine in GenMAPP Builder to record the number of records for UniProt and GO in the XML data and in the Postgres databases.
** I chose the menu item Tallies > Run XML and Database Tallies for UniProt and GO...
[[Image:Tallyengine_Vcholerae_results_NA.png]]
** Count for Ordered Locus: 3,831

=== Using XMLPipeDB match to Validate the XML Results from the TallyEngine===

* After entering into the directory containing the match utility and the uniprot xml file, I ran the command
java -jar xmlpipedb-match-1.1.1.jar "VC_[0-9][0-9][0-9][0-9]" < uniprot-organism%3A243277.xml
* It returned a total of 2,738 unique matches.
[[Image:Xmlpipedb_match_results_NA.png]]
* My results were not the same as my results for Tally Engine. Tally Engine returned 3,789 unique matches, while this returned 2,738. This is likely due to potentially due to differently formatted names in the xml file, as we are only checking for names in the structure VC_####.

=== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine===

*Next, I checked the values in SQL to see if they would validate the results from Tally Engine or the match utility.
* In PgAdminIII, I ran the command:
select count(*) from genenametype where type = 'ordered locus' and value ~ 'VC_[0-9][0-9][0-9][0-9]';
[[Image:SQL_Query_Results_NA.png]]
* The results were not the same as the ones reported by Tally Engine, nor were they the same as the ones reported by the match utility. It returned 2,737 matches, one less than the result returned by the match utility. While it is less than the Tally Engine perhaps due to a different naming convention, I don't fully understand why it would be off by one from the match utility.

===OriginalRowCounts Comparison===

* Within the .gdb file, I looked at the OriginalRowCounts table to see if the database had the expected tables with the expected number of records. I compared it with a benchmark .gdb file.
* Benchmark .gdb file (2010): [[Media:Vc-Std_External_20101022.gdb|Vc-Std_External_20101022]]

*Original GDB:
<gallery>
Image:Week9_Benchmark_gdb_tables_1_NA.png
Image:Week9_Benchmark_gdb_tables_2_NA.png
</gallery>

*New GDB:
<gallery>
Image:Week9_New_gdb_tables_1_NA.png
Image:Week9_New_gdb_tables_2_NA.png
</gallery>

* The new gdb had 10 additional tables.
* The have the same number of OrderedLocusNames, 7,664, but most of the other values are different.

===Visual Inspection===

Perform visual inspection of individual tables to see if there are any problems.

* Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?
** No. The majority of systems to not have a date included in their date field.
* Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?
** The UniProt IDs all look correct.
** The RefSeq IDs also all look correct.
** The IDs in OrderedLocusNames appear to take two forms: one which is of the form VC_####, and one of the form VC_A####. This seems to prove the hypothesis that the difference between Tally Engine and the match/SQL results was due to a difference in searched names.
*** To test this, the command <code>java -jar xmlpipedb-match-1.1.1.jar "VC_A?[0-9][0-9][0-9][0-9]" < uniprot-organism%3A243277.xml</code> returns 3,831 unique matches, the same as the Tally Engine. Running the command <code>select count(*) from genenametype where type = 'ordered locus' and value ~ 'VC_A?[0-9][0-9][0-9][0-9';</code> in SQL also returns 3,831. Therefore, I am decently confident that this was the reason the utilities were returning different results. I am still uncertain as to why SQL returned one less value than the match utility at first, before this new filter.

===.gdb Use in GenMAPP===

====Putting a gene on the MAPP using the GeneFinder window====

* In the main GenMAPP Drafting Board window, left-click on the icon for "Gene" in the upper left corner of the window. Click on the Drafting Board to place the Gene on the MAPP. Now, right-click on the gene to access the GeneFinder window. Type or paste a gene ID into the Gene ID field. Select the appropriate Gene ID system from the drop-down menu and click the Search button.
** I chose the Gene ID "VC0017" from the "OrderedLocusNames" system.
* Double-clicking the gene to get the backpage returned the following webpage:
[[Image:Week9_VC0017_Backpage_NA.png]]
* All crosslinks that were supposed to be present were present.

====Creating an Expression Dataset in the Expression Dataset Manager====

* For my expression dataset, I downloaded the [[Media:Merrell_Compiled_Raw_Data_Vibrio_NA_20151015_GenMAPP.txt|text file]] I had created for GenMAPP last year from the Merrell, et al data.

How many of the IDs were imported out of the total IDs in the microarray dataset? How many exceptions were there? Look in the EX.txt file and look at the error codes for the records that were not imported into the Expression Dataset. Do these represent IDs that were present in the UniProt XML, but were somehow not imported? or were they not present in the UniProt XML?

Note:

====Coloring a MAPP with expression data====

Note:

====Running MAPPFinder====

Note:

== Links ==
{{Template:Nanguiano}}

Nanguiano Week 9

2016-09-17T03:35:49Z

Nanguiano: /* Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine */ removed question

== Running GenMAPP Builder ==

I followed the process from the [[Running_GenMAPP_Builder | Running GenMAPP Builder Tutorial Page]].

=== Software ===

The following software was utilized:
# Any tool that can unpack .gz and .zip files
#* We use [http://www.7-zip.org/ 7-zip]
# [http://www.postgresql.org PostgreSQL] on Windows (http://www.enterprisedb.com/products-services-training/pgdownload)
# GenMAPP Builder version 3.0.0 build 5 (https://github.com/lmu-bioinformatics/xmlpipedb/releases)
# Java JDK 1.8 64-bit
#* [http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html Download page]
#* File to download is: jdk-8u65-windows-x64.exe
# GenMAPP 2 can be downloaded [https://github.com/GenMAPPCS/genmapp here]. The file to download is "GenMAPPv2Setup.exe".
# XMLPipeDB ''match'' utility (https://sourceforge.net/projects/xmlpipedb/files/) for counting IDs in XML files
# Microsoft Access or any other tool that can read .mdb files

==Quality Assurance==

The template for this information was taken from [[Gene_Database_Testing_Report_Sample | the Gene Database Testing Report Sample Page]].

===Export Information===

Version of GenMAPP Builder: '''3.0.0 build 5'''

Computer on which export was run: '''Seaver 120-14, HP LV2311 Windows 7 Enterprise'''

Postgres Database name: '''V_Cholerae_20160916_gmbuilder03b5'''

UniProt XML filename (give filename and upload and link to compressed file): [[Media:Uniprot-organism-243277-NA.zip|uniprot-organism%3A243277.xml]]
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): '''UniProt Release 2016_08'''
* UniProt XML download link: http://www.uniprot.org/uniprot/?query=organism:243277
* Time taken to import: '''2.92 minutes'''
** Note: ''None''

GO OBO-XML filename (give filename and upload and link to compressed file): [[Media:Go_daily-termdb-old-AV-NA.zip|go_daily-termdb.obo-xml]]
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): '''10/27/2015, 2:24am'''
* GO OBO-XML download link: http://geneontology.org/page/download-ontology#Legacy_Downloads
* Time taken to import: '''6.87 minutes'''
* Time taken to process: '''4.26 minutes'''
** Note: ''The GO OBO-XML file and version is from [[Anuvarsh_Week_9 | Anu Varshneya's Week 9 Journal]]. Currently available GO OBO-XML files do not run properly in GenMAPP builder.''

GOA filename (give filename and upload and link to compressed file): [[Media:46.V_cholerae_ATCC_39315_NA.goa.zip|46.V_cholerae_ATCC_39315.goa]]
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): '''07/05/16'''
* GOA download link: http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/46.V_cholerae_ATCC_39315.goa
* Time taken to import: '''0.06 minutes'''
** Note: ''None''

Name of .gdb file (give filename and upload and link to compressed file): [[Media:Vc-Std_20160916_NA.zip|Vc-Std_20160916_NA.gdb]]
* Time taken to export: '''5.15 hours'''
** Start time: '''09/16/16, 12:26:08 PM'''
** End time: '''09/16/16, 7:16:23 PM'''
*** Note: There was a time during the export in which the computer was locked and not running the export, from 2:56pm - 4:19pm.

===TallyEngine===
* The procedure for Tally Engine and the rest of the checks is transcribed from [[How_Do_I_Count_Thee%3F_Let_Me_Count_The_Ways|How Do I Count Thee? Let me Count The Ways]].
* I ran TallyEngine in GenMAPP Builder to record the number of records for UniProt and GO in the XML data and in the Postgres databases.
** I chose the menu item Tallies > Run XML and Database Tallies for UniProt and GO...
[[Image:Tallyengine_Vcholerae_results_NA.png]]
** Count for Ordered Locus: 3,831

=== Using XMLPipeDB match to Validate the XML Results from the TallyEngine===

* After entering into the directory containing the match utility and the uniprot xml file, I ran the command
java -jar xmlpipedb-match-1.1.1.jar "VC_[0-9][0-9][0-9][0-9]" < uniprot-organism%3A243277.xml
* It returned a total of 2,738 unique matches.
[[Image:Xmlpipedb_match_results_NA.png]]
* My results were not the same as my results for Tally Engine. Tally Engine returned 3,789 unique matches, while this returned 2,738. This is likely due to potentially due to differently formatted names in the xml file, as we are only checking for names in the structure VC_####.

=== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine===

*Next, I checked the values in SQL to see if they would validate the results from Tally Engine or the match utility.
* In PgAdminIII, I ran the command:
select count(*) from genenametype where type = 'ordered locus' and value ~ 'VC_[0-9][0-9][0-9][0-9]';
[[Image:SQL_Query_Results_NA.png]]
* The results were not the same as the ones reported by Tally Engine, nor were they the same as the ones reported by the match utility. It returned 2,737 matches, one less than the result returned by the match utility. While it is less than the Tally Engine perhaps due to a different naming convention, I don't fully understand why it would be off by one from the match utility.

===OriginalRowCounts Comparison===

* Within the .gdb file, look at the OriginalRowCounts table to see if the database has the expected tables with the expected number of records. Compare the tables and records with a benchmark .gdb file.
* Benchmark .gdb file (2010): [[Media:Vc-Std_External_20101022.gdb|Vc-Std_External_20101022]]

Copy the OriginalRowCounts table from the benchmark and new gdb and paste them here:

*Original GDB:
<gallery>
Image:Week9_Benchmark_gdb_tables_1_NA.png
Image:Week9_Benchmark_gdb_tables_2_NA.png
</gallery>

*New GDB:
<gallery>
Image:Week9_New_gdb_tables_1_NA.png
Image:Week9_New_gdb_tables_2_NA.png
</gallery>

* The new gdb had 10 additional tables.
* The have the same number of OrderedLocusNames, 7,664, but most of the other values are different.

Note:

===Visual Inspection===

Perform visual inspection of individual tables to see if there are any problems.

* Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?
** No. The majority of systems to not have a date included in their date field.
* Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?
** The UniProt IDs all look correct.
** The RefSeq IDs also all look correct.
** The IDs in OrderedLocusNames appear to take two forms: one which is of the form VC_####, and one of the form VC_A####. This seems to prove the hypothesis that the difference between Tally Engine and the match/SQL results was due to a difference in searched names.
*** To test this, the command <code>java -jar xmlpipedb-match-1.1.1.jar "VC_A?[0-9][0-9][0-9][0-9]" < uniprot-organism%3A243277.xml</code> returns 3,831 unique matches, the same as the Tally Engine. Running the command <code>select count(*) from genenametype where type = 'ordered locus' and value ~ 'VC_A?[0-9][0-9][0-9][0-9';</code> in SQL also returns 3,831. Therefore, I am decently confident that this was the reason the utilities were returning different results. I am still uncertain as to why SQL returned one less value than the match utility at first, before this new filter.

===.gdb Use in GenMAPP===

====Putting a gene on the MAPP using the GeneFinder window====

* In the main GenMAPP Drafting Board window, left-click on the icon for "Gene" in the upper left corner of the window. Click on the Drafting Board to place the Gene on the MAPP. Now, right-click on the gene to access the GeneFinder window. Type or paste a gene ID into the Gene ID field. Select the appropriate Gene ID system from the drop-down menu and click the Search button.
** I chose the Gene ID "VC0017" from the "OrderedLocusNames" system.
* Double-clicking the gene to get the backpage returned the following webpage:
[[Image:Week9_VC0017_Backpage_NA.png]]
* All crosslinks that were supposed to be present were present.

====Creating an Expression Dataset in the Expression Dataset Manager====

* For my expression dataset, I downloaded the [[Media:Merrell_Compiled_Raw_Data_Vibrio_NA_20151015_GenMAPP.txt|text file]] I had created for GenMAPP last year from the Merrell, et al data.

How many of the IDs were imported out of the total IDs in the microarray dataset? How many exceptions were there? Look in the EX.txt file and look at the error codes for the records that were not imported into the Expression Dataset. Do these represent IDs that were present in the UniProt XML, but were somehow not imported? or were they not present in the UniProt XML?

Note:

====Coloring a MAPP with expression data====

Note:

====Running MAPPFinder====

Note:

== Links ==
{{Template:Nanguiano}}

Nanguiano Week 9

2016-09-17T03:35:30Z

Nanguiano: /* Using XMLPipeDB match to Validate the XML Results from the TallyEngine */ removed question

== Running GenMAPP Builder ==

I followed the process from the [[Running_GenMAPP_Builder | Running GenMAPP Builder Tutorial Page]].

=== Software ===

The following software was utilized:
# Any tool that can unpack .gz and .zip files
#* We use [http://www.7-zip.org/ 7-zip]
# [http://www.postgresql.org PostgreSQL] on Windows (http://www.enterprisedb.com/products-services-training/pgdownload)
# GenMAPP Builder version 3.0.0 build 5 (https://github.com/lmu-bioinformatics/xmlpipedb/releases)
# Java JDK 1.8 64-bit
#* [http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html Download page]
#* File to download is: jdk-8u65-windows-x64.exe
# GenMAPP 2 can be downloaded [https://github.com/GenMAPPCS/genmapp here]. The file to download is "GenMAPPv2Setup.exe".
# XMLPipeDB ''match'' utility (https://sourceforge.net/projects/xmlpipedb/files/) for counting IDs in XML files
# Microsoft Access or any other tool that can read .mdb files

==Quality Assurance==

The template for this information was taken from [[Gene_Database_Testing_Report_Sample | the Gene Database Testing Report Sample Page]].

===Export Information===

Version of GenMAPP Builder: '''3.0.0 build 5'''

Computer on which export was run: '''Seaver 120-14, HP LV2311 Windows 7 Enterprise'''

Postgres Database name: '''V_Cholerae_20160916_gmbuilder03b5'''

UniProt XML filename (give filename and upload and link to compressed file): [[Media:Uniprot-organism-243277-NA.zip|uniprot-organism%3A243277.xml]]
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): '''UniProt Release 2016_08'''
* UniProt XML download link: http://www.uniprot.org/uniprot/?query=organism:243277
* Time taken to import: '''2.92 minutes'''
** Note: ''None''

GO OBO-XML filename (give filename and upload and link to compressed file): [[Media:Go_daily-termdb-old-AV-NA.zip|go_daily-termdb.obo-xml]]
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): '''10/27/2015, 2:24am'''
* GO OBO-XML download link: http://geneontology.org/page/download-ontology#Legacy_Downloads
* Time taken to import: '''6.87 minutes'''
* Time taken to process: '''4.26 minutes'''
** Note: ''The GO OBO-XML file and version is from [[Anuvarsh_Week_9 | Anu Varshneya's Week 9 Journal]]. Currently available GO OBO-XML files do not run properly in GenMAPP builder.''

GOA filename (give filename and upload and link to compressed file): [[Media:46.V_cholerae_ATCC_39315_NA.goa.zip|46.V_cholerae_ATCC_39315.goa]]
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): '''07/05/16'''
* GOA download link: http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/46.V_cholerae_ATCC_39315.goa
* Time taken to import: '''0.06 minutes'''
** Note: ''None''

Name of .gdb file (give filename and upload and link to compressed file): [[Media:Vc-Std_20160916_NA.zip|Vc-Std_20160916_NA.gdb]]
* Time taken to export: '''5.15 hours'''
** Start time: '''09/16/16, 12:26:08 PM'''
** End time: '''09/16/16, 7:16:23 PM'''
*** Note: There was a time during the export in which the computer was locked and not running the export, from 2:56pm - 4:19pm.

===TallyEngine===
* The procedure for Tally Engine and the rest of the checks is transcribed from [[How_Do_I_Count_Thee%3F_Let_Me_Count_The_Ways|How Do I Count Thee? Let me Count The Ways]].
* I ran TallyEngine in GenMAPP Builder to record the number of records for UniProt and GO in the XML data and in the Postgres databases.
** I chose the menu item Tallies > Run XML and Database Tallies for UniProt and GO...
[[Image:Tallyengine_Vcholerae_results_NA.png]]
** Count for Ordered Locus: 3,831

=== Using XMLPipeDB match to Validate the XML Results from the TallyEngine===

* After entering into the directory containing the match utility and the uniprot xml file, I ran the command
java -jar xmlpipedb-match-1.1.1.jar "VC_[0-9][0-9][0-9][0-9]" < uniprot-organism%3A243277.xml
* It returned a total of 2,738 unique matches.
[[Image:Xmlpipedb_match_results_NA.png]]
* My results were not the same as my results for Tally Engine. Tally Engine returned 3,789 unique matches, while this returned 2,738. This is likely due to potentially due to differently formatted names in the xml file, as we are only checking for names in the structure VC_####.

=== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine===

*Next, I checked the values in SQL to see if they would validate the results from Tally Engine or the match utility.
* In PgAdminIII, I ran the command:
select count(*) from genenametype where type = 'ordered locus' and value ~ 'VC_[0-9][0-9][0-9][0-9]';
[[Image:SQL_Query_Results_NA.png]]

*Are your results the same as reported by the TallyEngine? Why or why not?
** The results were not the same as the ones reported by Tally Engine, nor were they the same as the ones reported by the match utility. It returned 2,737 matches, one less than the result returned by the match utility. While it is less than the Tally Engine perhaps due to a different naming convention, I don't fully understand why it would be off by one from the match utility.

===OriginalRowCounts Comparison===

* Within the .gdb file, look at the OriginalRowCounts table to see if the database has the expected tables with the expected number of records. Compare the tables and records with a benchmark .gdb file.
* Benchmark .gdb file (2010): [[Media:Vc-Std_External_20101022.gdb|Vc-Std_External_20101022]]

Copy the OriginalRowCounts table from the benchmark and new gdb and paste them here:

*Original GDB:
<gallery>
Image:Week9_Benchmark_gdb_tables_1_NA.png
Image:Week9_Benchmark_gdb_tables_2_NA.png
</gallery>

*New GDB:
<gallery>
Image:Week9_New_gdb_tables_1_NA.png
Image:Week9_New_gdb_tables_2_NA.png
</gallery>

* The new gdb had 10 additional tables.
* The have the same number of OrderedLocusNames, 7,664, but most of the other values are different.

Note:

===Visual Inspection===

Perform visual inspection of individual tables to see if there are any problems.

* Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?
** No. The majority of systems to not have a date included in their date field.
* Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?
** The UniProt IDs all look correct.
** The RefSeq IDs also all look correct.
** The IDs in OrderedLocusNames appear to take two forms: one which is of the form VC_####, and one of the form VC_A####. This seems to prove the hypothesis that the difference between Tally Engine and the match/SQL results was due to a difference in searched names.
*** To test this, the command <code>java -jar xmlpipedb-match-1.1.1.jar "VC_A?[0-9][0-9][0-9][0-9]" < uniprot-organism%3A243277.xml</code> returns 3,831 unique matches, the same as the Tally Engine. Running the command <code>select count(*) from genenametype where type = 'ordered locus' and value ~ 'VC_A?[0-9][0-9][0-9][0-9';</code> in SQL also returns 3,831. Therefore, I am decently confident that this was the reason the utilities were returning different results. I am still uncertain as to why SQL returned one less value than the match utility at first, before this new filter.

===.gdb Use in GenMAPP===

====Putting a gene on the MAPP using the GeneFinder window====

* In the main GenMAPP Drafting Board window, left-click on the icon for "Gene" in the upper left corner of the window. Click on the Drafting Board to place the Gene on the MAPP. Now, right-click on the gene to access the GeneFinder window. Type or paste a gene ID into the Gene ID field. Select the appropriate Gene ID system from the drop-down menu and click the Search button.
** I chose the Gene ID "VC0017" from the "OrderedLocusNames" system.
* Double-clicking the gene to get the backpage returned the following webpage:
[[Image:Week9_VC0017_Backpage_NA.png]]
* All crosslinks that were supposed to be present were present.

====Creating an Expression Dataset in the Expression Dataset Manager====

* For my expression dataset, I downloaded the [[Media:Merrell_Compiled_Raw_Data_Vibrio_NA_20151015_GenMAPP.txt|text file]] I had created for GenMAPP last year from the Merrell, et al data.

How many of the IDs were imported out of the total IDs in the microarray dataset? How many exceptions were there? Look in the EX.txt file and look at the error codes for the records that were not imported into the Expression Dataset. Do these represent IDs that were present in the UniProt XML, but were somehow not imported? or were they not present in the UniProt XML?

Note:

====Coloring a MAPP with expression data====

Note:

====Running MAPPFinder====

Note:

== Links ==
{{Template:Nanguiano}}

Nanguiano Week 9

2016-09-17T03:35:05Z

Nanguiano: /* TallyEngine */ personalized steps

== Running GenMAPP Builder ==

I followed the process from the [[Running_GenMAPP_Builder | Running GenMAPP Builder Tutorial Page]].

=== Software ===

The following software was utilized:
# Any tool that can unpack .gz and .zip files
#* We use [http://www.7-zip.org/ 7-zip]
# [http://www.postgresql.org PostgreSQL] on Windows (http://www.enterprisedb.com/products-services-training/pgdownload)
# GenMAPP Builder version 3.0.0 build 5 (https://github.com/lmu-bioinformatics/xmlpipedb/releases)
# Java JDK 1.8 64-bit
#* [http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html Download page]
#* File to download is: jdk-8u65-windows-x64.exe
# GenMAPP 2 can be downloaded [https://github.com/GenMAPPCS/genmapp here]. The file to download is "GenMAPPv2Setup.exe".
# XMLPipeDB ''match'' utility (https://sourceforge.net/projects/xmlpipedb/files/) for counting IDs in XML files
# Microsoft Access or any other tool that can read .mdb files

==Quality Assurance==

The template for this information was taken from [[Gene_Database_Testing_Report_Sample | the Gene Database Testing Report Sample Page]].

===Export Information===

Version of GenMAPP Builder: '''3.0.0 build 5'''

Computer on which export was run: '''Seaver 120-14, HP LV2311 Windows 7 Enterprise'''

Postgres Database name: '''V_Cholerae_20160916_gmbuilder03b5'''

UniProt XML filename (give filename and upload and link to compressed file): [[Media:Uniprot-organism-243277-NA.zip|uniprot-organism%3A243277.xml]]
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): '''UniProt Release 2016_08'''
* UniProt XML download link: http://www.uniprot.org/uniprot/?query=organism:243277
* Time taken to import: '''2.92 minutes'''
** Note: ''None''

GO OBO-XML filename (give filename and upload and link to compressed file): [[Media:Go_daily-termdb-old-AV-NA.zip|go_daily-termdb.obo-xml]]
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): '''10/27/2015, 2:24am'''
* GO OBO-XML download link: http://geneontology.org/page/download-ontology#Legacy_Downloads
* Time taken to import: '''6.87 minutes'''
* Time taken to process: '''4.26 minutes'''
** Note: ''The GO OBO-XML file and version is from [[Anuvarsh_Week_9 | Anu Varshneya's Week 9 Journal]]. Currently available GO OBO-XML files do not run properly in GenMAPP builder.''

GOA filename (give filename and upload and link to compressed file): [[Media:46.V_cholerae_ATCC_39315_NA.goa.zip|46.V_cholerae_ATCC_39315.goa]]
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): '''07/05/16'''
* GOA download link: http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/46.V_cholerae_ATCC_39315.goa
* Time taken to import: '''0.06 minutes'''
** Note: ''None''

Name of .gdb file (give filename and upload and link to compressed file): [[Media:Vc-Std_20160916_NA.zip|Vc-Std_20160916_NA.gdb]]
* Time taken to export: '''5.15 hours'''
** Start time: '''09/16/16, 12:26:08 PM'''
** End time: '''09/16/16, 7:16:23 PM'''
*** Note: There was a time during the export in which the computer was locked and not running the export, from 2:56pm - 4:19pm.

===TallyEngine===
* The procedure for Tally Engine and the rest of the checks is transcribed from [[How_Do_I_Count_Thee%3F_Let_Me_Count_The_Ways|How Do I Count Thee? Let me Count The Ways]].
* I ran TallyEngine in GenMAPP Builder to record the number of records for UniProt and GO in the XML data and in the Postgres databases.
** I chose the menu item Tallies > Run XML and Database Tallies for UniProt and GO...
[[Image:Tallyengine_Vcholerae_results_NA.png]]
** Count for Ordered Locus: 3,831

=== Using XMLPipeDB match to Validate the XML Results from the TallyEngine===

* After entering into the directory containing the match utility and the uniprot xml file, I ran the command
java -jar xmlpipedb-match-1.1.1.jar "VC_[0-9][0-9][0-9][0-9]" < uniprot-organism%3A243277.xml
* It returned a total of 2,738 unique matches.
[[Image:Xmlpipedb_match_results_NA.png]]

Are your results the same as you got for the TallyEngine? Why or why not?
* My results were not the same as my results for Tally Engine. Tally Engine returned 3,789 unique matches, while this returned 2,738. This is likely due to potentially due to differently formatted names in the xml file, as we are only checking for names in the structure VC_####.

=== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine===

*Next, I checked the values in SQL to see if they would validate the results from Tally Engine or the match utility.
* In PgAdminIII, I ran the command:
select count(*) from genenametype where type = 'ordered locus' and value ~ 'VC_[0-9][0-9][0-9][0-9]';
[[Image:SQL_Query_Results_NA.png]]

*Are your results the same as reported by the TallyEngine? Why or why not?
** The results were not the same as the ones reported by Tally Engine, nor were they the same as the ones reported by the match utility. It returned 2,737 matches, one less than the result returned by the match utility. While it is less than the Tally Engine perhaps due to a different naming convention, I don't fully understand why it would be off by one from the match utility.

===OriginalRowCounts Comparison===

* Within the .gdb file, look at the OriginalRowCounts table to see if the database has the expected tables with the expected number of records. Compare the tables and records with a benchmark .gdb file.
* Benchmark .gdb file (2010): [[Media:Vc-Std_External_20101022.gdb|Vc-Std_External_20101022]]

Copy the OriginalRowCounts table from the benchmark and new gdb and paste them here:

*Original GDB:
<gallery>
Image:Week9_Benchmark_gdb_tables_1_NA.png
Image:Week9_Benchmark_gdb_tables_2_NA.png
</gallery>

*New GDB:
<gallery>
Image:Week9_New_gdb_tables_1_NA.png
Image:Week9_New_gdb_tables_2_NA.png
</gallery>

* The new gdb had 10 additional tables.
* The have the same number of OrderedLocusNames, 7,664, but most of the other values are different.

Note:

===Visual Inspection===

Perform visual inspection of individual tables to see if there are any problems.

* Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?
** No. The majority of systems to not have a date included in their date field.
* Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?
** The UniProt IDs all look correct.
** The RefSeq IDs also all look correct.
** The IDs in OrderedLocusNames appear to take two forms: one which is of the form VC_####, and one of the form VC_A####. This seems to prove the hypothesis that the difference between Tally Engine and the match/SQL results was due to a difference in searched names.
*** To test this, the command <code>java -jar xmlpipedb-match-1.1.1.jar "VC_A?[0-9][0-9][0-9][0-9]" < uniprot-organism%3A243277.xml</code> returns 3,831 unique matches, the same as the Tally Engine. Running the command <code>select count(*) from genenametype where type = 'ordered locus' and value ~ 'VC_A?[0-9][0-9][0-9][0-9';</code> in SQL also returns 3,831. Therefore, I am decently confident that this was the reason the utilities were returning different results. I am still uncertain as to why SQL returned one less value than the match utility at first, before this new filter.

===.gdb Use in GenMAPP===

====Putting a gene on the MAPP using the GeneFinder window====

* In the main GenMAPP Drafting Board window, left-click on the icon for "Gene" in the upper left corner of the window. Click on the Drafting Board to place the Gene on the MAPP. Now, right-click on the gene to access the GeneFinder window. Type or paste a gene ID into the Gene ID field. Select the appropriate Gene ID system from the drop-down menu and click the Search button.
** I chose the Gene ID "VC0017" from the "OrderedLocusNames" system.
* Double-clicking the gene to get the backpage returned the following webpage:
[[Image:Week9_VC0017_Backpage_NA.png]]
* All crosslinks that were supposed to be present were present.

====Creating an Expression Dataset in the Expression Dataset Manager====

* For my expression dataset, I downloaded the [[Media:Merrell_Compiled_Raw_Data_Vibrio_NA_20151015_GenMAPP.txt|text file]] I had created for GenMAPP last year from the Merrell, et al data.

How many of the IDs were imported out of the total IDs in the microarray dataset? How many exceptions were there? Look in the EX.txt file and look at the error codes for the records that were not imported into the Expression Dataset. Do these represent IDs that were present in the UniProt XML, but were somehow not imported? or were they not present in the UniProt XML?

Note:

====Coloring a MAPP with expression data====

Note:

====Running MAPPFinder====

Note:

== Links ==
{{Template:Nanguiano}}

Nanguiano Week 9

2016-09-17T03:32:44Z

Nanguiano: /* Quality Assurance */ added link to how do i count thee

== Running GenMAPP Builder ==

I followed the process from the [[Running_GenMAPP_Builder | Running GenMAPP Builder Tutorial Page]].

=== Software ===

The following software was utilized:
# Any tool that can unpack .gz and .zip files
#* We use [http://www.7-zip.org/ 7-zip]
# [http://www.postgresql.org PostgreSQL] on Windows (http://www.enterprisedb.com/products-services-training/pgdownload)
# GenMAPP Builder version 3.0.0 build 5 (https://github.com/lmu-bioinformatics/xmlpipedb/releases)
# Java JDK 1.8 64-bit
#* [http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html Download page]
#* File to download is: jdk-8u65-windows-x64.exe
# GenMAPP 2 can be downloaded [https://github.com/GenMAPPCS/genmapp here]. The file to download is "GenMAPPv2Setup.exe".
# XMLPipeDB ''match'' utility (https://sourceforge.net/projects/xmlpipedb/files/) for counting IDs in XML files
# Microsoft Access or any other tool that can read .mdb files

==Quality Assurance==

The template for this information was taken from [[Gene_Database_Testing_Report_Sample | the Gene Database Testing Report Sample Page]].

===Export Information===

Version of GenMAPP Builder: '''3.0.0 build 5'''

Computer on which export was run: '''Seaver 120-14, HP LV2311 Windows 7 Enterprise'''

Postgres Database name: '''V_Cholerae_20160916_gmbuilder03b5'''

UniProt XML filename (give filename and upload and link to compressed file): [[Media:Uniprot-organism-243277-NA.zip|uniprot-organism%3A243277.xml]]
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): '''UniProt Release 2016_08'''
* UniProt XML download link: http://www.uniprot.org/uniprot/?query=organism:243277
* Time taken to import: '''2.92 minutes'''
** Note: ''None''

GO OBO-XML filename (give filename and upload and link to compressed file): [[Media:Go_daily-termdb-old-AV-NA.zip|go_daily-termdb.obo-xml]]
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): '''10/27/2015, 2:24am'''
* GO OBO-XML download link: http://geneontology.org/page/download-ontology#Legacy_Downloads
* Time taken to import: '''6.87 minutes'''
* Time taken to process: '''4.26 minutes'''
** Note: ''The GO OBO-XML file and version is from [[Anuvarsh_Week_9 | Anu Varshneya's Week 9 Journal]]. Currently available GO OBO-XML files do not run properly in GenMAPP builder.''

GOA filename (give filename and upload and link to compressed file): [[Media:46.V_cholerae_ATCC_39315_NA.goa.zip|46.V_cholerae_ATCC_39315.goa]]
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): '''07/05/16'''
* GOA download link: http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/46.V_cholerae_ATCC_39315.goa
* Time taken to import: '''0.06 minutes'''
** Note: ''None''

Name of .gdb file (give filename and upload and link to compressed file): [[Media:Vc-Std_20160916_NA.zip|Vc-Std_20160916_NA.gdb]]
* Time taken to export: '''5.15 hours'''
** Start time: '''09/16/16, 12:26:08 PM'''
** End time: '''09/16/16, 7:16:23 PM'''
*** Note: There was a time during the export in which the computer was locked and not running the export, from 2:56pm - 4:19pm.

===TallyEngine===
* The procedure for Tally Engine and the rest of the checks is transcribed from [[How_Do_I_Count_Thee%3F_Let_Me_Count_The_Ways|How Do I Count Thee? Let me Count The Ways]].
* Run the TallyEngine in GenMAPP Builder and record the number of records for UniProt and GO in the XML data and in the Postgres databases.
** Choose the menu item Tallies > Run XML and Database Tallies for UniProt and GO...
** Take a screenshot of the results. Upload the image to the wiki and display it on this page.
[[Image:Tallyengine_Vcholerae_results_NA.png]]

=== Using XMLPipeDB match to Validate the XML Results from the TallyEngine===

* After entering into the directory containing the match utility and the uniprot xml file, I ran the command
java -jar xmlpipedb-match-1.1.1.jar "VC_[0-9][0-9][0-9][0-9]" < uniprot-organism%3A243277.xml
* It returned a total of 2,738 unique matches.
[[Image:Xmlpipedb_match_results_NA.png]]

Are your results the same as you got for the TallyEngine? Why or why not?
* My results were not the same as my results for Tally Engine. Tally Engine returned 3,789 unique matches, while this returned 2,738. This is likely due to potentially due to differently formatted names in the xml file, as we are only checking for names in the structure VC_####.

=== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine===

*Next, I checked the values in SQL to see if they would validate the results from Tally Engine or the match utility.
* In PgAdminIII, I ran the command:
select count(*) from genenametype where type = 'ordered locus' and value ~ 'VC_[0-9][0-9][0-9][0-9]';
[[Image:SQL_Query_Results_NA.png]]

*Are your results the same as reported by the TallyEngine? Why or why not?
** The results were not the same as the ones reported by Tally Engine, nor were they the same as the ones reported by the match utility. It returned 2,737 matches, one less than the result returned by the match utility. While it is less than the Tally Engine perhaps due to a different naming convention, I don't fully understand why it would be off by one from the match utility.

===OriginalRowCounts Comparison===

* Within the .gdb file, look at the OriginalRowCounts table to see if the database has the expected tables with the expected number of records. Compare the tables and records with a benchmark .gdb file.
* Benchmark .gdb file (2010): [[Media:Vc-Std_External_20101022.gdb|Vc-Std_External_20101022]]

Copy the OriginalRowCounts table from the benchmark and new gdb and paste them here:

*Original GDB:
<gallery>
Image:Week9_Benchmark_gdb_tables_1_NA.png
Image:Week9_Benchmark_gdb_tables_2_NA.png
</gallery>

*New GDB:
<gallery>
Image:Week9_New_gdb_tables_1_NA.png
Image:Week9_New_gdb_tables_2_NA.png
</gallery>

* The new gdb had 10 additional tables.
* The have the same number of OrderedLocusNames, 7,664, but most of the other values are different.

Note:

===Visual Inspection===

Perform visual inspection of individual tables to see if there are any problems.

* Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?
** No. The majority of systems to not have a date included in their date field.
* Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?
** The UniProt IDs all look correct.
** The RefSeq IDs also all look correct.
** The IDs in OrderedLocusNames appear to take two forms: one which is of the form VC_####, and one of the form VC_A####. This seems to prove the hypothesis that the difference between Tally Engine and the match/SQL results was due to a difference in searched names.
*** To test this, the command <code>java -jar xmlpipedb-match-1.1.1.jar "VC_A?[0-9][0-9][0-9][0-9]" < uniprot-organism%3A243277.xml</code> returns 3,831 unique matches, the same as the Tally Engine. Running the command <code>select count(*) from genenametype where type = 'ordered locus' and value ~ 'VC_A?[0-9][0-9][0-9][0-9';</code> in SQL also returns 3,831. Therefore, I am decently confident that this was the reason the utilities were returning different results. I am still uncertain as to why SQL returned one less value than the match utility at first, before this new filter.

===.gdb Use in GenMAPP===

====Putting a gene on the MAPP using the GeneFinder window====

* In the main GenMAPP Drafting Board window, left-click on the icon for "Gene" in the upper left corner of the window. Click on the Drafting Board to place the Gene on the MAPP. Now, right-click on the gene to access the GeneFinder window. Type or paste a gene ID into the Gene ID field. Select the appropriate Gene ID system from the drop-down menu and click the Search button.
** I chose the Gene ID "VC0017" from the "OrderedLocusNames" system.
* Double-clicking the gene to get the backpage returned the following webpage:
[[Image:Week9_VC0017_Backpage_NA.png]]
* All crosslinks that were supposed to be present were present.

====Creating an Expression Dataset in the Expression Dataset Manager====

* For my expression dataset, I downloaded the [[Media:Merrell_Compiled_Raw_Data_Vibrio_NA_20151015_GenMAPP.txt|text file]] I had created for GenMAPP last year from the Merrell, et al data.

How many of the IDs were imported out of the total IDs in the microarray dataset? How many exceptions were there? Look in the EX.txt file and look at the error codes for the records that were not imported into the Expression Dataset. Do these represent IDs that were present in the UniProt XML, but were somehow not imported? or were they not present in the UniProt XML?

Note:

====Coloring a MAPP with expression data====

Note:

====Running MAPPFinder====

Note:

== Links ==
{{Template:Nanguiano}}

Nanguiano Week 9

2016-09-17T03:29:30Z

Nanguiano: /* .gdb Use in GenMAPP */ removed unneeded information

== Running GenMAPP Builder ==

I followed the process from the [[Running_GenMAPP_Builder | Running GenMAPP Builder Tutorial Page]].

=== Software ===

The following software was utilized:
# Any tool that can unpack .gz and .zip files
#* We use [http://www.7-zip.org/ 7-zip]
# [http://www.postgresql.org PostgreSQL] on Windows (http://www.enterprisedb.com/products-services-training/pgdownload)
# GenMAPP Builder version 3.0.0 build 5 (https://github.com/lmu-bioinformatics/xmlpipedb/releases)
# Java JDK 1.8 64-bit
#* [http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html Download page]
#* File to download is: jdk-8u65-windows-x64.exe
# GenMAPP 2 can be downloaded [https://github.com/GenMAPPCS/genmapp here]. The file to download is "GenMAPPv2Setup.exe".
# XMLPipeDB ''match'' utility (https://sourceforge.net/projects/xmlpipedb/files/) for counting IDs in XML files
# Microsoft Access or any other tool that can read .mdb files

==Quality Assurance==

The template for this information was taken from [[Gene_Database_Testing_Report_Sample | the Gene Database Testing Report Sample Page]].

===Export Information===

Version of GenMAPP Builder: '''3.0.0 build 5'''

Computer on which export was run: '''Seaver 120-14, HP LV2311 Windows 7 Enterprise'''

Postgres Database name: '''V_Cholerae_20160916_gmbuilder03b5'''

UniProt XML filename (give filename and upload and link to compressed file): [[Media:Uniprot-organism-243277-NA.zip|uniprot-organism%3A243277.xml]]
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): '''UniProt Release 2016_08'''
* UniProt XML download link: http://www.uniprot.org/uniprot/?query=organism:243277
* Time taken to import: '''2.92 minutes'''
** Note: ''None''

GO OBO-XML filename (give filename and upload and link to compressed file): [[Media:Go_daily-termdb-old-AV-NA.zip|go_daily-termdb.obo-xml]]
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): '''10/27/2015, 2:24am'''
* GO OBO-XML download link: http://geneontology.org/page/download-ontology#Legacy_Downloads
* Time taken to import: '''6.87 minutes'''
* Time taken to process: '''4.26 minutes'''
** Note: ''The GO OBO-XML file and version is from [[Anuvarsh_Week_9 | Anu Varshneya's Week 9 Journal]]. Currently available GO OBO-XML files do not run properly in GenMAPP builder.''

GOA filename (give filename and upload and link to compressed file): [[Media:46.V_cholerae_ATCC_39315_NA.goa.zip|46.V_cholerae_ATCC_39315.goa]]
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): '''07/05/16'''
* GOA download link: http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/46.V_cholerae_ATCC_39315.goa
* Time taken to import: '''0.06 minutes'''
** Note: ''None''

Name of .gdb file (give filename and upload and link to compressed file): [[Media:Vc-Std_20160916_NA.zip|Vc-Std_20160916_NA.gdb]]
* Time taken to export: '''5.15 hours'''
** Start time: '''09/16/16, 12:26:08 PM'''
** End time: '''09/16/16, 7:16:23 PM'''
*** Note: There was a time during the export in which the computer was locked and not running the export, from 2:56pm - 4:19pm.

===TallyEngine===

* Run the TallyEngine in GenMAPP Builder and record the number of records for UniProt and GO in the XML data and in the Postgres databases.
** Choose the menu item Tallies > Run XML and Database Tallies for UniProt and GO...
** Take a screenshot of the results. Upload the image to the wiki and display it on this page.
[[Image:Tallyengine_Vcholerae_results_NA.png]]

=== Using XMLPipeDB match to Validate the XML Results from the TallyEngine===

* After entering into the directory containing the match utility and the uniprot xml file, I ran the command
java -jar xmlpipedb-match-1.1.1.jar "VC_[0-9][0-9][0-9][0-9]" < uniprot-organism%3A243277.xml
* It returned a total of 2,738 unique matches.
[[Image:Xmlpipedb_match_results_NA.png]]

Are your results the same as you got for the TallyEngine? Why or why not?
* My results were not the same as my results for Tally Engine. Tally Engine returned 3,789 unique matches, while this returned 2,738. This is likely due to potentially due to differently formatted names in the xml file, as we are only checking for names in the structure VC_####.

=== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine===

*Next, I checked the values in SQL to see if they would validate the results from Tally Engine or the match utility.
* In PgAdminIII, I ran the command:
select count(*) from genenametype where type = 'ordered locus' and value ~ 'VC_[0-9][0-9][0-9][0-9]';
[[Image:SQL_Query_Results_NA.png]]

*Are your results the same as reported by the TallyEngine? Why or why not?
** The results were not the same as the ones reported by Tally Engine, nor were they the same as the ones reported by the match utility. It returned 2,737 matches, one less than the result returned by the match utility. While it is less than the Tally Engine perhaps due to a different naming convention, I don't fully understand why it would be off by one from the match utility.

===OriginalRowCounts Comparison===

* Within the .gdb file, look at the OriginalRowCounts table to see if the database has the expected tables with the expected number of records. Compare the tables and records with a benchmark .gdb file.
* Benchmark .gdb file (2010): [[Media:Vc-Std_External_20101022.gdb|Vc-Std_External_20101022]]

Copy the OriginalRowCounts table from the benchmark and new gdb and paste them here:

*Original GDB:
<gallery>
Image:Week9_Benchmark_gdb_tables_1_NA.png
Image:Week9_Benchmark_gdb_tables_2_NA.png
</gallery>

*New GDB:
<gallery>
Image:Week9_New_gdb_tables_1_NA.png
Image:Week9_New_gdb_tables_2_NA.png
</gallery>

* The new gdb had 10 additional tables.
* The have the same number of OrderedLocusNames, 7,664, but most of the other values are different.

Note:

===Visual Inspection===

Perform visual inspection of individual tables to see if there are any problems.

* Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?
** No. The majority of systems to not have a date included in their date field.
* Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?
** The UniProt IDs all look correct.
** The RefSeq IDs also all look correct.
** The IDs in OrderedLocusNames appear to take two forms: one which is of the form VC_####, and one of the form VC_A####. This seems to prove the hypothesis that the difference between Tally Engine and the match/SQL results was due to a difference in searched names.
*** To test this, the command <code>java -jar xmlpipedb-match-1.1.1.jar "VC_A?[0-9][0-9][0-9][0-9]" < uniprot-organism%3A243277.xml</code> returns 3,831 unique matches, the same as the Tally Engine. Running the command <code>select count(*) from genenametype where type = 'ordered locus' and value ~ 'VC_A?[0-9][0-9][0-9][0-9';</code> in SQL also returns 3,831. Therefore, I am decently confident that this was the reason the utilities were returning different results. I am still uncertain as to why SQL returned one less value than the match utility at first, before this new filter.

===.gdb Use in GenMAPP===

====Putting a gene on the MAPP using the GeneFinder window====

* In the main GenMAPP Drafting Board window, left-click on the icon for "Gene" in the upper left corner of the window. Click on the Drafting Board to place the Gene on the MAPP. Now, right-click on the gene to access the GeneFinder window. Type or paste a gene ID into the Gene ID field. Select the appropriate Gene ID system from the drop-down menu and click the Search button.
** I chose the Gene ID "VC0017" from the "OrderedLocusNames" system.
* Double-clicking the gene to get the backpage returned the following webpage:
[[Image:Week9_VC0017_Backpage_NA.png]]
* All crosslinks that were supposed to be present were present.

====Creating an Expression Dataset in the Expression Dataset Manager====

* For my expression dataset, I downloaded the [[Media:Merrell_Compiled_Raw_Data_Vibrio_NA_20151015_GenMAPP.txt|text file]] I had created for GenMAPP last year from the Merrell, et al data.

How many of the IDs were imported out of the total IDs in the microarray dataset? How many exceptions were there? Look in the EX.txt file and look at the error codes for the records that were not imported into the Expression Dataset. Do these represent IDs that were present in the UniProt XML, but were somehow not imported? or were they not present in the UniProt XML?

Note:

====Coloring a MAPP with expression data====

Note:

====Running MAPPFinder====

Note:

== Links ==
{{Template:Nanguiano}}

Nanguiano Week 9

2016-09-17T03:29:10Z

Nanguiano: /* Creating an Expression Dataset in the Expression Dataset Manager */ added text file link

== Running GenMAPP Builder ==

I followed the process from the [[Running_GenMAPP_Builder | Running GenMAPP Builder Tutorial Page]].

=== Software ===

The following software was utilized:
# Any tool that can unpack .gz and .zip files
#* We use [http://www.7-zip.org/ 7-zip]
# [http://www.postgresql.org PostgreSQL] on Windows (http://www.enterprisedb.com/products-services-training/pgdownload)
# GenMAPP Builder version 3.0.0 build 5 (https://github.com/lmu-bioinformatics/xmlpipedb/releases)
# Java JDK 1.8 64-bit
#* [http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html Download page]
#* File to download is: jdk-8u65-windows-x64.exe
# GenMAPP 2 can be downloaded [https://github.com/GenMAPPCS/genmapp here]. The file to download is "GenMAPPv2Setup.exe".
# XMLPipeDB ''match'' utility (https://sourceforge.net/projects/xmlpipedb/files/) for counting IDs in XML files
# Microsoft Access or any other tool that can read .mdb files

==Quality Assurance==

The template for this information was taken from [[Gene_Database_Testing_Report_Sample | the Gene Database Testing Report Sample Page]].

===Export Information===

Version of GenMAPP Builder: '''3.0.0 build 5'''

Computer on which export was run: '''Seaver 120-14, HP LV2311 Windows 7 Enterprise'''

Postgres Database name: '''V_Cholerae_20160916_gmbuilder03b5'''

UniProt XML filename (give filename and upload and link to compressed file): [[Media:Uniprot-organism-243277-NA.zip|uniprot-organism%3A243277.xml]]
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): '''UniProt Release 2016_08'''
* UniProt XML download link: http://www.uniprot.org/uniprot/?query=organism:243277
* Time taken to import: '''2.92 minutes'''
** Note: ''None''

GO OBO-XML filename (give filename and upload and link to compressed file): [[Media:Go_daily-termdb-old-AV-NA.zip|go_daily-termdb.obo-xml]]
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): '''10/27/2015, 2:24am'''
* GO OBO-XML download link: http://geneontology.org/page/download-ontology#Legacy_Downloads
* Time taken to import: '''6.87 minutes'''
* Time taken to process: '''4.26 minutes'''
** Note: ''The GO OBO-XML file and version is from [[Anuvarsh_Week_9 | Anu Varshneya's Week 9 Journal]]. Currently available GO OBO-XML files do not run properly in GenMAPP builder.''

GOA filename (give filename and upload and link to compressed file): [[Media:46.V_cholerae_ATCC_39315_NA.goa.zip|46.V_cholerae_ATCC_39315.goa]]
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): '''07/05/16'''
* GOA download link: http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/46.V_cholerae_ATCC_39315.goa
* Time taken to import: '''0.06 minutes'''
** Note: ''None''

Name of .gdb file (give filename and upload and link to compressed file): [[Media:Vc-Std_20160916_NA.zip|Vc-Std_20160916_NA.gdb]]
* Time taken to export: '''5.15 hours'''
** Start time: '''09/16/16, 12:26:08 PM'''
** End time: '''09/16/16, 7:16:23 PM'''
*** Note: There was a time during the export in which the computer was locked and not running the export, from 2:56pm - 4:19pm.

===TallyEngine===

* Run the TallyEngine in GenMAPP Builder and record the number of records for UniProt and GO in the XML data and in the Postgres databases.
** Choose the menu item Tallies > Run XML and Database Tallies for UniProt and GO...
** Take a screenshot of the results. Upload the image to the wiki and display it on this page.
[[Image:Tallyengine_Vcholerae_results_NA.png]]

=== Using XMLPipeDB match to Validate the XML Results from the TallyEngine===

* After entering into the directory containing the match utility and the uniprot xml file, I ran the command
java -jar xmlpipedb-match-1.1.1.jar "VC_[0-9][0-9][0-9][0-9]" < uniprot-organism%3A243277.xml
* It returned a total of 2,738 unique matches.
[[Image:Xmlpipedb_match_results_NA.png]]

Are your results the same as you got for the TallyEngine? Why or why not?
* My results were not the same as my results for Tally Engine. Tally Engine returned 3,789 unique matches, while this returned 2,738. This is likely due to potentially due to differently formatted names in the xml file, as we are only checking for names in the structure VC_####.

=== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine===

*Next, I checked the values in SQL to see if they would validate the results from Tally Engine or the match utility.
* In PgAdminIII, I ran the command:
select count(*) from genenametype where type = 'ordered locus' and value ~ 'VC_[0-9][0-9][0-9][0-9]';
[[Image:SQL_Query_Results_NA.png]]

*Are your results the same as reported by the TallyEngine? Why or why not?
** The results were not the same as the ones reported by Tally Engine, nor were they the same as the ones reported by the match utility. It returned 2,737 matches, one less than the result returned by the match utility. While it is less than the Tally Engine perhaps due to a different naming convention, I don't fully understand why it would be off by one from the match utility.

===OriginalRowCounts Comparison===

* Within the .gdb file, look at the OriginalRowCounts table to see if the database has the expected tables with the expected number of records. Compare the tables and records with a benchmark .gdb file.
* Benchmark .gdb file (2010): [[Media:Vc-Std_External_20101022.gdb|Vc-Std_External_20101022]]

Copy the OriginalRowCounts table from the benchmark and new gdb and paste them here:

*Original GDB:
<gallery>
Image:Week9_Benchmark_gdb_tables_1_NA.png
Image:Week9_Benchmark_gdb_tables_2_NA.png
</gallery>

*New GDB:
<gallery>
Image:Week9_New_gdb_tables_1_NA.png
Image:Week9_New_gdb_tables_2_NA.png
</gallery>

* The new gdb had 10 additional tables.
* The have the same number of OrderedLocusNames, 7,664, but most of the other values are different.

Note:

===Visual Inspection===

Perform visual inspection of individual tables to see if there are any problems.

* Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?
** No. The majority of systems to not have a date included in their date field.
* Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?
** The UniProt IDs all look correct.
** The RefSeq IDs also all look correct.
** The IDs in OrderedLocusNames appear to take two forms: one which is of the form VC_####, and one of the form VC_A####. This seems to prove the hypothesis that the difference between Tally Engine and the match/SQL results was due to a difference in searched names.
*** To test this, the command <code>java -jar xmlpipedb-match-1.1.1.jar "VC_A?[0-9][0-9][0-9][0-9]" < uniprot-organism%3A243277.xml</code> returns 3,831 unique matches, the same as the Tally Engine. Running the command <code>select count(*) from genenametype where type = 'ordered locus' and value ~ 'VC_A?[0-9][0-9][0-9][0-9';</code> in SQL also returns 3,831. Therefore, I am decently confident that this was the reason the utilities were returning different results. I am still uncertain as to why SQL returned one less value than the match utility at first, before this new filter.

===.gdb Use in GenMAPP===



While the above sections perform quality assurance on the exported Gene Database via verifying ID counts, the "proof in the pudding" is to actually use the Gene Database in GenMAPP. You can follow the instructions in [http://www.openwetware.org/wiki/BIOL367/F10:GenMAPP_and_MAPPFinder_Protocols Part 2 of the ''Vibrio cholerae'' Microarray Data Analysis] to verify that the Gene Database works in GenMAPP. In this case, the emphasis is not on the findings of the data analysis itself, but that the Gene Database functions appropriate in GenMAPP.

For assistance with using the GenMAPP program, the GenMAPP Help is very extensive. To access it within GenMAPP, go to the menu item Help > GenMAPP Help and either browse or search for your topic of interest.

Note: The text file I used to import into GenMAPP was the [[Merrell_Compiled_Raw_Data_Vibrio_NA_20151015_GenMAPP.txt|text file]] I created last year from the Merrell, et al. data.

====Putting a gene on the MAPP using the GeneFinder window====

* In the main GenMAPP Drafting Board window, left-click on the icon for "Gene" in the upper left corner of the window. Click on the Drafting Board to place the Gene on the MAPP. Now, right-click on the gene to access the GeneFinder window. Type or paste a gene ID into the Gene ID field. Select the appropriate Gene ID system from the drop-down menu and click the Search button.
** I chose the Gene ID "VC0017" from the "OrderedLocusNames" system.
* Double-clicking the gene to get the backpage returned the following webpage:
[[Image:Week9_VC0017_Backpage_NA.png]]
* All crosslinks that were supposed to be present were present.

====Creating an Expression Dataset in the Expression Dataset Manager====

* For my expression dataset, I downloaded the [[Media:Merrell_Compiled_Raw_Data_Vibrio_NA_20151015_GenMAPP.txt|text file]] I had created for GenMAPP last year from the Merrell, et al data.

How many of the IDs were imported out of the total IDs in the microarray dataset? How many exceptions were there? Look in the EX.txt file and look at the error codes for the records that were not imported into the Expression Dataset. Do these represent IDs that were present in the UniProt XML, but were somehow not imported? or were they not present in the UniProt XML?

Note:

====Coloring a MAPP with expression data====

Note:

====Running MAPPFinder====

Note:

== Links ==
{{Template:Nanguiano}}

Nanguiano Week 9

2016-09-17T03:25:21Z

Nanguiano: /* .gdb Use in GenMAPP */ Added putting a gene on the MAPP

== Running GenMAPP Builder ==

I followed the process from the [[Running_GenMAPP_Builder | Running GenMAPP Builder Tutorial Page]].

=== Software ===

The following software was utilized:
# Any tool that can unpack .gz and .zip files
#* We use [http://www.7-zip.org/ 7-zip]
# [http://www.postgresql.org PostgreSQL] on Windows (http://www.enterprisedb.com/products-services-training/pgdownload)
# GenMAPP Builder version 3.0.0 build 5 (https://github.com/lmu-bioinformatics/xmlpipedb/releases)
# Java JDK 1.8 64-bit
#* [http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html Download page]
#* File to download is: jdk-8u65-windows-x64.exe
# GenMAPP 2 can be downloaded [https://github.com/GenMAPPCS/genmapp here]. The file to download is "GenMAPPv2Setup.exe".
# XMLPipeDB ''match'' utility (https://sourceforge.net/projects/xmlpipedb/files/) for counting IDs in XML files
# Microsoft Access or any other tool that can read .mdb files

==Quality Assurance==

The template for this information was taken from [[Gene_Database_Testing_Report_Sample | the Gene Database Testing Report Sample Page]].

===Export Information===

Version of GenMAPP Builder: '''3.0.0 build 5'''

Computer on which export was run: '''Seaver 120-14, HP LV2311 Windows 7 Enterprise'''

Postgres Database name: '''V_Cholerae_20160916_gmbuilder03b5'''

UniProt XML filename (give filename and upload and link to compressed file): [[Media:Uniprot-organism-243277-NA.zip|uniprot-organism%3A243277.xml]]
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): '''UniProt Release 2016_08'''
* UniProt XML download link: http://www.uniprot.org/uniprot/?query=organism:243277
* Time taken to import: '''2.92 minutes'''
** Note: ''None''

GO OBO-XML filename (give filename and upload and link to compressed file): [[Media:Go_daily-termdb-old-AV-NA.zip|go_daily-termdb.obo-xml]]
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): '''10/27/2015, 2:24am'''
* GO OBO-XML download link: http://geneontology.org/page/download-ontology#Legacy_Downloads
* Time taken to import: '''6.87 minutes'''
* Time taken to process: '''4.26 minutes'''
** Note: ''The GO OBO-XML file and version is from [[Anuvarsh_Week_9 | Anu Varshneya's Week 9 Journal]]. Currently available GO OBO-XML files do not run properly in GenMAPP builder.''

GOA filename (give filename and upload and link to compressed file): [[Media:46.V_cholerae_ATCC_39315_NA.goa.zip|46.V_cholerae_ATCC_39315.goa]]
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): '''07/05/16'''
* GOA download link: http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/46.V_cholerae_ATCC_39315.goa
* Time taken to import: '''0.06 minutes'''
** Note: ''None''

Name of .gdb file (give filename and upload and link to compressed file): [[Media:Vc-Std_20160916_NA.zip|Vc-Std_20160916_NA.gdb]]
* Time taken to export: '''5.15 hours'''
** Start time: '''09/16/16, 12:26:08 PM'''
** End time: '''09/16/16, 7:16:23 PM'''
*** Note: There was a time during the export in which the computer was locked and not running the export, from 2:56pm - 4:19pm.

===TallyEngine===

* Run the TallyEngine in GenMAPP Builder and record the number of records for UniProt and GO in the XML data and in the Postgres databases.
** Choose the menu item Tallies > Run XML and Database Tallies for UniProt and GO...
** Take a screenshot of the results. Upload the image to the wiki and display it on this page.
[[Image:Tallyengine_Vcholerae_results_NA.png]]

=== Using XMLPipeDB match to Validate the XML Results from the TallyEngine===

* After entering into the directory containing the match utility and the uniprot xml file, I ran the command
java -jar xmlpipedb-match-1.1.1.jar "VC_[0-9][0-9][0-9][0-9]" < uniprot-organism%3A243277.xml
* It returned a total of 2,738 unique matches.
[[Image:Xmlpipedb_match_results_NA.png]]

Are your results the same as you got for the TallyEngine? Why or why not?
* My results were not the same as my results for Tally Engine. Tally Engine returned 3,789 unique matches, while this returned 2,738. This is likely due to potentially due to differently formatted names in the xml file, as we are only checking for names in the structure VC_####.

=== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine===

*Next, I checked the values in SQL to see if they would validate the results from Tally Engine or the match utility.
* In PgAdminIII, I ran the command:
select count(*) from genenametype where type = 'ordered locus' and value ~ 'VC_[0-9][0-9][0-9][0-9]';
[[Image:SQL_Query_Results_NA.png]]

*Are your results the same as reported by the TallyEngine? Why or why not?
** The results were not the same as the ones reported by Tally Engine, nor were they the same as the ones reported by the match utility. It returned 2,737 matches, one less than the result returned by the match utility. While it is less than the Tally Engine perhaps due to a different naming convention, I don't fully understand why it would be off by one from the match utility.

===OriginalRowCounts Comparison===

* Within the .gdb file, look at the OriginalRowCounts table to see if the database has the expected tables with the expected number of records. Compare the tables and records with a benchmark .gdb file.
* Benchmark .gdb file (2010): [[Media:Vc-Std_External_20101022.gdb|Vc-Std_External_20101022]]

Copy the OriginalRowCounts table from the benchmark and new gdb and paste them here:

*Original GDB:
<gallery>
Image:Week9_Benchmark_gdb_tables_1_NA.png
Image:Week9_Benchmark_gdb_tables_2_NA.png
</gallery>

*New GDB:
<gallery>
Image:Week9_New_gdb_tables_1_NA.png
Image:Week9_New_gdb_tables_2_NA.png
</gallery>

* The new gdb had 10 additional tables.
* The have the same number of OrderedLocusNames, 7,664, but most of the other values are different.

Note:

===Visual Inspection===

Perform visual inspection of individual tables to see if there are any problems.

* Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?
** No. The majority of systems to not have a date included in their date field.
* Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?
** The UniProt IDs all look correct.
** The RefSeq IDs also all look correct.
** The IDs in OrderedLocusNames appear to take two forms: one which is of the form VC_####, and one of the form VC_A####. This seems to prove the hypothesis that the difference between Tally Engine and the match/SQL results was due to a difference in searched names.
*** To test this, the command <code>java -jar xmlpipedb-match-1.1.1.jar "VC_A?[0-9][0-9][0-9][0-9]" < uniprot-organism%3A243277.xml</code> returns 3,831 unique matches, the same as the Tally Engine. Running the command <code>select count(*) from genenametype where type = 'ordered locus' and value ~ 'VC_A?[0-9][0-9][0-9][0-9';</code> in SQL also returns 3,831. Therefore, I am decently confident that this was the reason the utilities were returning different results. I am still uncertain as to why SQL returned one less value than the match utility at first, before this new filter.

===.gdb Use in GenMAPP===



While the above sections perform quality assurance on the exported Gene Database via verifying ID counts, the "proof in the pudding" is to actually use the Gene Database in GenMAPP. You can follow the instructions in [http://www.openwetware.org/wiki/BIOL367/F10:GenMAPP_and_MAPPFinder_Protocols Part 2 of the ''Vibrio cholerae'' Microarray Data Analysis] to verify that the Gene Database works in GenMAPP. In this case, the emphasis is not on the findings of the data analysis itself, but that the Gene Database functions appropriate in GenMAPP.

For assistance with using the GenMAPP program, the GenMAPP Help is very extensive. To access it within GenMAPP, go to the menu item Help > GenMAPP Help and either browse or search for your topic of interest.

Note: The text file I used to import into GenMAPP was the [[Merrell_Compiled_Raw_Data_Vibrio_NA_20151015_GenMAPP.txt|text file]] I created last year from the Merrell, et al. data.

====Putting a gene on the MAPP using the GeneFinder window====

* In the main GenMAPP Drafting Board window, left-click on the icon for "Gene" in the upper left corner of the window. Click on the Drafting Board to place the Gene on the MAPP. Now, right-click on the gene to access the GeneFinder window. Type or paste a gene ID into the Gene ID field. Select the appropriate Gene ID system from the drop-down menu and click the Search button.
** I chose the Gene ID "VC0017" from the "OrderedLocusNames" system.
* Double-clicking the gene to get the backpage returned the following webpage:
[[Image:Week9_VC0017_Backpage_NA.png]]
* All crosslinks that were supposed to be present were present.

====Creating an Expression Dataset in the Expression Dataset Manager====

* How many of the IDs were imported out of the total IDs in the microarray dataset? How many exceptions were there? Look in the EX.txt file and look at the error codes for the records that were not imported into the Expression Dataset. Do these represent IDs that were present in the UniProt XML, but were somehow not imported? or were they not present in the UniProt XML?

Note:

====Coloring a MAPP with expression data====

Note:

====Running MAPPFinder====

Note:

== Links ==
{{Template:Nanguiano}}

File:Week9 VC0017 Backpage NA.png

2016-09-17T03:24:37Z

Nanguiano: VC0017 Backpage from GenMAPP. ~~~~

VC0017 Backpage from GenMAPP. [[User:Nanguiano|Nanguiano]] ([[User talk:Nanguiano|talk]]) 20:24, 16 September 2016 (PDT)

Nanguiano Week 9

2016-09-17T03:06:22Z

Nanguiano: /* Visual Inspection */ Added information

== Running GenMAPP Builder ==

I followed the process from the [[Running_GenMAPP_Builder | Running GenMAPP Builder Tutorial Page]].

=== Software ===

The following software was utilized:
# Any tool that can unpack .gz and .zip files
#* We use [http://www.7-zip.org/ 7-zip]
# [http://www.postgresql.org PostgreSQL] on Windows (http://www.enterprisedb.com/products-services-training/pgdownload)
# GenMAPP Builder version 3.0.0 build 5 (https://github.com/lmu-bioinformatics/xmlpipedb/releases)
# Java JDK 1.8 64-bit
#* [http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html Download page]
#* File to download is: jdk-8u65-windows-x64.exe
# GenMAPP 2 can be downloaded [https://github.com/GenMAPPCS/genmapp here]. The file to download is "GenMAPPv2Setup.exe".
# XMLPipeDB ''match'' utility (https://sourceforge.net/projects/xmlpipedb/files/) for counting IDs in XML files
# Microsoft Access or any other tool that can read .mdb files

==Quality Assurance==

The template for this information was taken from [[Gene_Database_Testing_Report_Sample | the Gene Database Testing Report Sample Page]].

===Export Information===

Version of GenMAPP Builder: '''3.0.0 build 5'''

Computer on which export was run: '''Seaver 120-14, HP LV2311 Windows 7 Enterprise'''

Postgres Database name: '''V_Cholerae_20160916_gmbuilder03b5'''

UniProt XML filename (give filename and upload and link to compressed file): [[Media:Uniprot-organism-243277-NA.zip|uniprot-organism%3A243277.xml]]
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): '''UniProt Release 2016_08'''
* UniProt XML download link: http://www.uniprot.org/uniprot/?query=organism:243277
* Time taken to import: '''2.92 minutes'''
** Note: ''None''

GO OBO-XML filename (give filename and upload and link to compressed file): [[Media:Go_daily-termdb-old-AV-NA.zip|go_daily-termdb.obo-xml]]
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): '''10/27/2015, 2:24am'''
* GO OBO-XML download link: http://geneontology.org/page/download-ontology#Legacy_Downloads
* Time taken to import: '''6.87 minutes'''
* Time taken to process: '''4.26 minutes'''
** Note: ''The GO OBO-XML file and version is from [[Anuvarsh_Week_9 | Anu Varshneya's Week 9 Journal]]. Currently available GO OBO-XML files do not run properly in GenMAPP builder.''

GOA filename (give filename and upload and link to compressed file): [[Media:46.V_cholerae_ATCC_39315_NA.goa.zip|46.V_cholerae_ATCC_39315.goa]]
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): '''07/05/16'''
* GOA download link: http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/46.V_cholerae_ATCC_39315.goa
* Time taken to import: '''0.06 minutes'''
** Note: ''None''

Name of .gdb file (give filename and upload and link to compressed file): [[Media:Vc-Std_20160916_NA.zip|Vc-Std_20160916_NA.gdb]]
* Time taken to export: '''5.15 hours'''
** Start time: '''09/16/16, 12:26:08 PM'''
** End time: '''09/16/16, 7:16:23 PM'''
*** Note: There was a time during the export in which the computer was locked and not running the export, from 2:56pm - 4:19pm.

===TallyEngine===

* Run the TallyEngine in GenMAPP Builder and record the number of records for UniProt and GO in the XML data and in the Postgres databases.
** Choose the menu item Tallies > Run XML and Database Tallies for UniProt and GO...
** Take a screenshot of the results. Upload the image to the wiki and display it on this page.
[[Image:Tallyengine_Vcholerae_results_NA.png]]

=== Using XMLPipeDB match to Validate the XML Results from the TallyEngine===

* After entering into the directory containing the match utility and the uniprot xml file, I ran the command
java -jar xmlpipedb-match-1.1.1.jar "VC_[0-9][0-9][0-9][0-9]" < uniprot-organism%3A243277.xml
* It returned a total of 2,738 unique matches.
[[Image:Xmlpipedb_match_results_NA.png]]

Are your results the same as you got for the TallyEngine? Why or why not?
* My results were not the same as my results for Tally Engine. Tally Engine returned 3,789 unique matches, while this returned 2,738. This is likely due to potentially due to differently formatted names in the xml file, as we are only checking for names in the structure VC_####.

=== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine===

*Next, I checked the values in SQL to see if they would validate the results from Tally Engine or the match utility.
* In PgAdminIII, I ran the command:
select count(*) from genenametype where type = 'ordered locus' and value ~ 'VC_[0-9][0-9][0-9][0-9]';
[[Image:SQL_Query_Results_NA.png]]

*Are your results the same as reported by the TallyEngine? Why or why not?
** The results were not the same as the ones reported by Tally Engine, nor were they the same as the ones reported by the match utility. It returned 2,737 matches, one less than the result returned by the match utility. While it is less than the Tally Engine perhaps due to a different naming convention, I don't fully understand why it would be off by one from the match utility.

===OriginalRowCounts Comparison===

* Within the .gdb file, look at the OriginalRowCounts table to see if the database has the expected tables with the expected number of records. Compare the tables and records with a benchmark .gdb file.
* Benchmark .gdb file (2010): [[Media:Vc-Std_External_20101022.gdb|Vc-Std_External_20101022]]

Copy the OriginalRowCounts table from the benchmark and new gdb and paste them here:

*Original GDB:
<gallery>
Image:Week9_Benchmark_gdb_tables_1_NA.png
Image:Week9_Benchmark_gdb_tables_2_NA.png
</gallery>

*New GDB:
<gallery>
Image:Week9_New_gdb_tables_1_NA.png
Image:Week9_New_gdb_tables_2_NA.png
</gallery>

* The new gdb had 10 additional tables.
* The have the same number of OrderedLocusNames, 7,664, but most of the other values are different.

Note:

===Visual Inspection===

Perform visual inspection of individual tables to see if there are any problems.

* Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?
** No. The majority of systems to not have a date included in their date field.
* Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?
** The UniProt IDs all look correct.
** The RefSeq IDs also all look correct.
** The IDs in OrderedLocusNames appear to take two forms: one which is of the form VC_####, and one of the form VC_A####. This seems to prove the hypothesis that the difference between Tally Engine and the match/SQL results was due to a difference in searched names.
*** To test this, the command <code>java -jar xmlpipedb-match-1.1.1.jar "VC_A?[0-9][0-9][0-9][0-9]" < uniprot-organism%3A243277.xml</code> returns 3,831 unique matches, the same as the Tally Engine. Running the command <code>select count(*) from genenametype where type = 'ordered locus' and value ~ 'VC_A?[0-9][0-9][0-9][0-9';</code> in SQL also returns 3,831. Therefore, I am decently confident that this was the reason the utilities were returning different results. I am still uncertain as to why SQL returned one less value than the match utility at first, before this new filter.

===.gdb Use in GenMAPP===



While the above sections perform quality assurance on the exported Gene Database via verifying ID counts, the "proof in the pudding" is to actually use the Gene Database in GenMAPP. You can follow the instructions in [http://www.openwetware.org/wiki/BIOL367/F10:GenMAPP_and_MAPPFinder_Protocols Part 2 of the ''Vibrio cholerae'' Microarray Data Analysis] to verify that the Gene Database works in GenMAPP. In this case, the emphasis is not on the findings of the data analysis itself, but that the Gene Database functions appropriate in GenMAPP.

For assistance with using the GenMAPP program, the GenMAPP Help is very extensive. To access it within GenMAPP, go to the menu item Help > GenMAPP Help and either browse or search for your topic of interest.

Note:

====Putting a gene on the MAPP using the GeneFinder window====

* In the main GenMAPP Drafting Board window, left-click on the icon for "Gene" in the upper left corner of the window. Click on the Drafting Board to place the Gene on the MAPP. Now, right-click on the gene to access the GeneFinder window. Type or paste a gene ID into the Gene ID field. Select the appropriate Gene ID system from the drop-down menu and click the Search button. For example, for ''Vibrio cholerae'', you could search for the ID "VC0028", which is an OrderedLocusNames ID. Once the ID has been found, click the OK button to return to the Drafting Board window.
** For the Final Project, you will need to try a sample ID from each of the gene ID systems, not just OrderedLocusNames.
* Open the Backpage by left-clicking on the gene box on the Drafting Board to see if all of the cross-referenced IDs that are supposed to be there are there.

Note:

====Creating an Expression Dataset in the Expression Dataset Manager====

* How many of the IDs were imported out of the total IDs in the microarray dataset? How many exceptions were there? Look in the EX.txt file and look at the error codes for the records that were not imported into the Expression Dataset. Do these represent IDs that were present in the UniProt XML, but were somehow not imported? or were they not present in the UniProt XML?

Note:

====Coloring a MAPP with expression data====

Note:

====Running MAPPFinder====

Note:

== Links ==
{{Template:Nanguiano}}

Nanguiano Week 9

2016-09-17T02:54:03Z

Nanguiano: /* OriginalRowCounts Comparison */ add more info

== Running GenMAPP Builder ==

I followed the process from the [[Running_GenMAPP_Builder | Running GenMAPP Builder Tutorial Page]].

=== Software ===

The following software was utilized:
# Any tool that can unpack .gz and .zip files
#* We use [http://www.7-zip.org/ 7-zip]
# [http://www.postgresql.org PostgreSQL] on Windows (http://www.enterprisedb.com/products-services-training/pgdownload)
# GenMAPP Builder version 3.0.0 build 5 (https://github.com/lmu-bioinformatics/xmlpipedb/releases)
# Java JDK 1.8 64-bit
#* [http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html Download page]
#* File to download is: jdk-8u65-windows-x64.exe
# GenMAPP 2 can be downloaded [https://github.com/GenMAPPCS/genmapp here]. The file to download is "GenMAPPv2Setup.exe".
# XMLPipeDB ''match'' utility (https://sourceforge.net/projects/xmlpipedb/files/) for counting IDs in XML files
# Microsoft Access or any other tool that can read .mdb files

==Quality Assurance==

The template for this information was taken from [[Gene_Database_Testing_Report_Sample | the Gene Database Testing Report Sample Page]].

===Export Information===

Version of GenMAPP Builder: '''3.0.0 build 5'''

Computer on which export was run: '''Seaver 120-14, HP LV2311 Windows 7 Enterprise'''

Postgres Database name: '''V_Cholerae_20160916_gmbuilder03b5'''

UniProt XML filename (give filename and upload and link to compressed file): [[Media:Uniprot-organism-243277-NA.zip|uniprot-organism%3A243277.xml]]
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): '''UniProt Release 2016_08'''
* UniProt XML download link: http://www.uniprot.org/uniprot/?query=organism:243277
* Time taken to import: '''2.92 minutes'''
** Note: ''None''

GO OBO-XML filename (give filename and upload and link to compressed file): [[Media:Go_daily-termdb-old-AV-NA.zip|go_daily-termdb.obo-xml]]
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): '''10/27/2015, 2:24am'''
* GO OBO-XML download link: http://geneontology.org/page/download-ontology#Legacy_Downloads
* Time taken to import: '''6.87 minutes'''
* Time taken to process: '''4.26 minutes'''
** Note: ''The GO OBO-XML file and version is from [[Anuvarsh_Week_9 | Anu Varshneya's Week 9 Journal]]. Currently available GO OBO-XML files do not run properly in GenMAPP builder.''

GOA filename (give filename and upload and link to compressed file): [[Media:46.V_cholerae_ATCC_39315_NA.goa.zip|46.V_cholerae_ATCC_39315.goa]]
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): '''07/05/16'''
* GOA download link: http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/46.V_cholerae_ATCC_39315.goa
* Time taken to import: '''0.06 minutes'''
** Note: ''None''

Name of .gdb file (give filename and upload and link to compressed file): [[Media:Vc-Std_20160916_NA.zip|Vc-Std_20160916_NA.gdb]]
* Time taken to export: '''5.15 hours'''
** Start time: '''09/16/16, 12:26:08 PM'''
** End time: '''09/16/16, 7:16:23 PM'''
*** Note: There was a time during the export in which the computer was locked and not running the export, from 2:56pm - 4:19pm.

===TallyEngine===

* Run the TallyEngine in GenMAPP Builder and record the number of records for UniProt and GO in the XML data and in the Postgres databases.
** Choose the menu item Tallies > Run XML and Database Tallies for UniProt and GO...
** Take a screenshot of the results. Upload the image to the wiki and display it on this page.
[[Image:Tallyengine_Vcholerae_results_NA.png]]

=== Using XMLPipeDB match to Validate the XML Results from the TallyEngine===

* After entering into the directory containing the match utility and the uniprot xml file, I ran the command
java -jar xmlpipedb-match-1.1.1.jar "VC_[0-9][0-9][0-9][0-9]" < uniprot-organism%3A243277.xml
* It returned a total of 2,738 unique matches.
[[Image:Xmlpipedb_match_results_NA.png]]

Are your results the same as you got for the TallyEngine? Why or why not?
* My results were not the same as my results for Tally Engine. Tally Engine returned 3,789 unique matches, while this returned 2,738. This is likely due to potentially due to differently formatted names in the xml file, as we are only checking for names in the structure VC_####.

=== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine===

*Next, I checked the values in SQL to see if they would validate the results from Tally Engine or the match utility.
* In PgAdminIII, I ran the command:
select count(*) from genenametype where type = 'ordered locus' and value ~ 'VC_[0-9][0-9][0-9][0-9]';
[[Image:SQL_Query_Results_NA.png]]

*Are your results the same as reported by the TallyEngine? Why or why not?
** The results were not the same as the ones reported by Tally Engine, nor were they the same as the ones reported by the match utility. It returned 2,737 matches, one less than the result returned by the match utility. While it is less than the Tally Engine perhaps due to a different naming convention, I don't fully understand why it would be off by one from the match utility.

===OriginalRowCounts Comparison===

* Within the .gdb file, look at the OriginalRowCounts table to see if the database has the expected tables with the expected number of records. Compare the tables and records with a benchmark .gdb file.
* Benchmark .gdb file (2010): [[Media:Vc-Std_External_20101022.gdb|Vc-Std_External_20101022]]

Copy the OriginalRowCounts table from the benchmark and new gdb and paste them here:

*Original GDB:
<gallery>
Image:Week9_Benchmark_gdb_tables_1_NA.png
Image:Week9_Benchmark_gdb_tables_2_NA.png
</gallery>

*New GDB:
<gallery>
Image:Week9_New_gdb_tables_1_NA.png
Image:Week9_New_gdb_tables_2_NA.png
</gallery>

* The new gdb had 10 additional tables.
* The have the same number of OrderedLocusNames, 7,664, but most of the other values are different.

Note:

===Visual Inspection===

Perform visual inspection of individual tables to see if there are any problems.

* Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?
* Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?

Note:

===.gdb Use in GenMAPP===



While the above sections perform quality assurance on the exported Gene Database via verifying ID counts, the "proof in the pudding" is to actually use the Gene Database in GenMAPP. You can follow the instructions in [http://www.openwetware.org/wiki/BIOL367/F10:GenMAPP_and_MAPPFinder_Protocols Part 2 of the ''Vibrio cholerae'' Microarray Data Analysis] to verify that the Gene Database works in GenMAPP. In this case, the emphasis is not on the findings of the data analysis itself, but that the Gene Database functions appropriate in GenMAPP.

For assistance with using the GenMAPP program, the GenMAPP Help is very extensive. To access it within GenMAPP, go to the menu item Help > GenMAPP Help and either browse or search for your topic of interest.

Note:

====Putting a gene on the MAPP using the GeneFinder window====

* In the main GenMAPP Drafting Board window, left-click on the icon for "Gene" in the upper left corner of the window. Click on the Drafting Board to place the Gene on the MAPP. Now, right-click on the gene to access the GeneFinder window. Type or paste a gene ID into the Gene ID field. Select the appropriate Gene ID system from the drop-down menu and click the Search button. For example, for ''Vibrio cholerae'', you could search for the ID "VC0028", which is an OrderedLocusNames ID. Once the ID has been found, click the OK button to return to the Drafting Board window.
** For the Final Project, you will need to try a sample ID from each of the gene ID systems, not just OrderedLocusNames.
* Open the Backpage by left-clicking on the gene box on the Drafting Board to see if all of the cross-referenced IDs that are supposed to be there are there.

Note:

====Creating an Expression Dataset in the Expression Dataset Manager====

* How many of the IDs were imported out of the total IDs in the microarray dataset? How many exceptions were there? Look in the EX.txt file and look at the error codes for the records that were not imported into the Expression Dataset. Do these represent IDs that were present in the UniProt XML, but were somehow not imported? or were they not present in the UniProt XML?

Note:

====Coloring a MAPP with expression data====

Note:

====Running MAPPFinder====

Note:

== Links ==
{{Template:Nanguiano}}

Nanguiano Week 9

2016-09-17T02:52:01Z

Nanguiano: /* OriginalRowCounts Comparison */ Added images

== Running GenMAPP Builder ==

I followed the process from the [[Running_GenMAPP_Builder | Running GenMAPP Builder Tutorial Page]].

=== Software ===

The following software was utilized:
# Any tool that can unpack .gz and .zip files
#* We use [http://www.7-zip.org/ 7-zip]
# [http://www.postgresql.org PostgreSQL] on Windows (http://www.enterprisedb.com/products-services-training/pgdownload)
# GenMAPP Builder version 3.0.0 build 5 (https://github.com/lmu-bioinformatics/xmlpipedb/releases)
# Java JDK 1.8 64-bit
#* [http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html Download page]
#* File to download is: jdk-8u65-windows-x64.exe
# GenMAPP 2 can be downloaded [https://github.com/GenMAPPCS/genmapp here]. The file to download is "GenMAPPv2Setup.exe".
# XMLPipeDB ''match'' utility (https://sourceforge.net/projects/xmlpipedb/files/) for counting IDs in XML files
# Microsoft Access or any other tool that can read .mdb files

==Quality Assurance==

The template for this information was taken from [[Gene_Database_Testing_Report_Sample | the Gene Database Testing Report Sample Page]].

===Export Information===

Version of GenMAPP Builder: '''3.0.0 build 5'''

Computer on which export was run: '''Seaver 120-14, HP LV2311 Windows 7 Enterprise'''

Postgres Database name: '''V_Cholerae_20160916_gmbuilder03b5'''

UniProt XML filename (give filename and upload and link to compressed file): [[Media:Uniprot-organism-243277-NA.zip|uniprot-organism%3A243277.xml]]
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): '''UniProt Release 2016_08'''
* UniProt XML download link: http://www.uniprot.org/uniprot/?query=organism:243277
* Time taken to import: '''2.92 minutes'''
** Note: ''None''

GO OBO-XML filename (give filename and upload and link to compressed file): [[Media:Go_daily-termdb-old-AV-NA.zip|go_daily-termdb.obo-xml]]
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): '''10/27/2015, 2:24am'''
* GO OBO-XML download link: http://geneontology.org/page/download-ontology#Legacy_Downloads
* Time taken to import: '''6.87 minutes'''
* Time taken to process: '''4.26 minutes'''
** Note: ''The GO OBO-XML file and version is from [[Anuvarsh_Week_9 | Anu Varshneya's Week 9 Journal]]. Currently available GO OBO-XML files do not run properly in GenMAPP builder.''

GOA filename (give filename and upload and link to compressed file): [[Media:46.V_cholerae_ATCC_39315_NA.goa.zip|46.V_cholerae_ATCC_39315.goa]]
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): '''07/05/16'''
* GOA download link: http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/46.V_cholerae_ATCC_39315.goa
* Time taken to import: '''0.06 minutes'''
** Note: ''None''

Name of .gdb file (give filename and upload and link to compressed file): [[Media:Vc-Std_20160916_NA.zip|Vc-Std_20160916_NA.gdb]]
* Time taken to export: '''5.15 hours'''
** Start time: '''09/16/16, 12:26:08 PM'''
** End time: '''09/16/16, 7:16:23 PM'''
*** Note: There was a time during the export in which the computer was locked and not running the export, from 2:56pm - 4:19pm.

===TallyEngine===

* Run the TallyEngine in GenMAPP Builder and record the number of records for UniProt and GO in the XML data and in the Postgres databases.
** Choose the menu item Tallies > Run XML and Database Tallies for UniProt and GO...
** Take a screenshot of the results. Upload the image to the wiki and display it on this page.
[[Image:Tallyengine_Vcholerae_results_NA.png]]

=== Using XMLPipeDB match to Validate the XML Results from the TallyEngine===

* After entering into the directory containing the match utility and the uniprot xml file, I ran the command
java -jar xmlpipedb-match-1.1.1.jar "VC_[0-9][0-9][0-9][0-9]" < uniprot-organism%3A243277.xml
* It returned a total of 2,738 unique matches.
[[Image:Xmlpipedb_match_results_NA.png]]

Are your results the same as you got for the TallyEngine? Why or why not?
* My results were not the same as my results for Tally Engine. Tally Engine returned 3,789 unique matches, while this returned 2,738. This is likely due to potentially due to differently formatted names in the xml file, as we are only checking for names in the structure VC_####.

=== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine===

*Next, I checked the values in SQL to see if they would validate the results from Tally Engine or the match utility.
* In PgAdminIII, I ran the command:
select count(*) from genenametype where type = 'ordered locus' and value ~ 'VC_[0-9][0-9][0-9][0-9]';
[[Image:SQL_Query_Results_NA.png]]

*Are your results the same as reported by the TallyEngine? Why or why not?
** The results were not the same as the ones reported by Tally Engine, nor were they the same as the ones reported by the match utility. It returned 2,737 matches, one less than the result returned by the match utility. While it is less than the Tally Engine perhaps due to a different naming convention, I don't fully understand why it would be off by one from the match utility.

===OriginalRowCounts Comparison===

* Within the .gdb file, look at the OriginalRowCounts table to see if the database has the expected tables with the expected number of records. Compare the tables and records with a benchmark .gdb file.
* Benchmark .gdb file (2010): [[Media:Vc-Std_External_20101022.gdb|Vc-Std_External_20101022]]

Copy the OriginalRowCounts table from the benchmark and new gdb and paste them here:

*Original GDB:
<gallery>
Image:Week9_Benchmark_gdb_tables_1_NA.png
Image:Week9_Benchmark_gdb_tables_2_NA.png
</gallery>

*New GDB:
<gallery>
Image:Week9_New_gdb_tables_1_NA.png
Image:Week9_New_gdb_tables_2_NA.png
</gallery>

* The new gdb had 10 additional tables.

Note:

===Visual Inspection===

Perform visual inspection of individual tables to see if there are any problems.

* Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?
* Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?

Note:

===.gdb Use in GenMAPP===



While the above sections perform quality assurance on the exported Gene Database via verifying ID counts, the "proof in the pudding" is to actually use the Gene Database in GenMAPP. You can follow the instructions in [http://www.openwetware.org/wiki/BIOL367/F10:GenMAPP_and_MAPPFinder_Protocols Part 2 of the ''Vibrio cholerae'' Microarray Data Analysis] to verify that the Gene Database works in GenMAPP. In this case, the emphasis is not on the findings of the data analysis itself, but that the Gene Database functions appropriate in GenMAPP.

For assistance with using the GenMAPP program, the GenMAPP Help is very extensive. To access it within GenMAPP, go to the menu item Help > GenMAPP Help and either browse or search for your topic of interest.

Note:

====Putting a gene on the MAPP using the GeneFinder window====

* In the main GenMAPP Drafting Board window, left-click on the icon for "Gene" in the upper left corner of the window. Click on the Drafting Board to place the Gene on the MAPP. Now, right-click on the gene to access the GeneFinder window. Type or paste a gene ID into the Gene ID field. Select the appropriate Gene ID system from the drop-down menu and click the Search button. For example, for ''Vibrio cholerae'', you could search for the ID "VC0028", which is an OrderedLocusNames ID. Once the ID has been found, click the OK button to return to the Drafting Board window.
** For the Final Project, you will need to try a sample ID from each of the gene ID systems, not just OrderedLocusNames.
* Open the Backpage by left-clicking on the gene box on the Drafting Board to see if all of the cross-referenced IDs that are supposed to be there are there.

Note:

====Creating an Expression Dataset in the Expression Dataset Manager====

* How many of the IDs were imported out of the total IDs in the microarray dataset? How many exceptions were there? Look in the EX.txt file and look at the error codes for the records that were not imported into the Expression Dataset. Do these represent IDs that were present in the UniProt XML, but were somehow not imported? or were they not present in the UniProt XML?

Note:

====Coloring a MAPP with expression data====

Note:

====Running MAPPFinder====

Note:

== Links ==
{{Template:Nanguiano}}

File:Week9 New gdb tables 2 NA.png

2016-09-17T02:51:25Z

Nanguiano: The new gdb tables, part 2. ~~~~

The new gdb tables, part 2. [[User:Nanguiano|Nanguiano]] ([[User talk:Nanguiano|talk]]) 19:51, 16 September 2016 (PDT)

File:Week9 New gdb tables 1 NA.png

2016-09-17T02:50:48Z

Nanguiano: The tables from the new GDB, part 1. ~~~~

The tables from the new GDB, part 1. [[User:Nanguiano|Nanguiano]] ([[User talk:Nanguiano|talk]]) 19:50, 16 September 2016 (PDT)

File:Week9 Benchmark gdb tables 2 NA.png

2016-09-17T02:49:56Z

Nanguiano: The benchmark gdb tables, part 2. ~~~~

The benchmark gdb tables, part 2. [[User:Nanguiano|Nanguiano]] ([[User talk:Nanguiano|talk]]) 19:49, 16 September 2016 (PDT)

File:Week9 Benchmark gdb tables 1 NA.png

2016-09-17T02:49:12Z

Nanguiano: The benchmark gdb tables, part 1. ~~~~

The benchmark gdb tables, part 1. [[User:Nanguiano|Nanguiano]] ([[User talk:Nanguiano|talk]]) 19:49, 16 September 2016 (PDT)

Nanguiano Week 9

2016-09-17T02:34:47Z

Nanguiano: /* Export Information */ Add link to gdb file

== Running GenMAPP Builder ==

I followed the process from the [[Running_GenMAPP_Builder | Running GenMAPP Builder Tutorial Page]].

=== Software ===

The following software was utilized:
# Any tool that can unpack .gz and .zip files
#* We use [http://www.7-zip.org/ 7-zip]
# [http://www.postgresql.org PostgreSQL] on Windows (http://www.enterprisedb.com/products-services-training/pgdownload)
# GenMAPP Builder version 3.0.0 build 5 (https://github.com/lmu-bioinformatics/xmlpipedb/releases)
# Java JDK 1.8 64-bit
#* [http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html Download page]
#* File to download is: jdk-8u65-windows-x64.exe
# GenMAPP 2 can be downloaded [https://github.com/GenMAPPCS/genmapp here]. The file to download is "GenMAPPv2Setup.exe".
# XMLPipeDB ''match'' utility (https://sourceforge.net/projects/xmlpipedb/files/) for counting IDs in XML files
# Microsoft Access or any other tool that can read .mdb files

==Quality Assurance==

The template for this information was taken from [[Gene_Database_Testing_Report_Sample | the Gene Database Testing Report Sample Page]].

===Export Information===

Version of GenMAPP Builder: '''3.0.0 build 5'''

Computer on which export was run: '''Seaver 120-14, HP LV2311 Windows 7 Enterprise'''

Postgres Database name: '''V_Cholerae_20160916_gmbuilder03b5'''

UniProt XML filename (give filename and upload and link to compressed file): [[Media:Uniprot-organism-243277-NA.zip|uniprot-organism%3A243277.xml]]
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): '''UniProt Release 2016_08'''
* UniProt XML download link: http://www.uniprot.org/uniprot/?query=organism:243277
* Time taken to import: '''2.92 minutes'''
** Note: ''None''

GO OBO-XML filename (give filename and upload and link to compressed file): [[Media:Go_daily-termdb-old-AV-NA.zip|go_daily-termdb.obo-xml]]
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): '''10/27/2015, 2:24am'''
* GO OBO-XML download link: http://geneontology.org/page/download-ontology#Legacy_Downloads
* Time taken to import: '''6.87 minutes'''
* Time taken to process: '''4.26 minutes'''
** Note: ''The GO OBO-XML file and version is from [[Anuvarsh_Week_9 | Anu Varshneya's Week 9 Journal]]. Currently available GO OBO-XML files do not run properly in GenMAPP builder.''

GOA filename (give filename and upload and link to compressed file): [[Media:46.V_cholerae_ATCC_39315_NA.goa.zip|46.V_cholerae_ATCC_39315.goa]]
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): '''07/05/16'''
* GOA download link: http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/46.V_cholerae_ATCC_39315.goa
* Time taken to import: '''0.06 minutes'''
** Note: ''None''

Name of .gdb file (give filename and upload and link to compressed file): [[Media:Vc-Std_20160916_NA.zip|Vc-Std_20160916_NA.gdb]]
* Time taken to export: '''5.15 hours'''
** Start time: '''09/16/16, 12:26:08 PM'''
** End time: '''09/16/16, 7:16:23 PM'''
*** Note: There was a time during the export in which the computer was locked and not running the export, from 2:56pm - 4:19pm.

===TallyEngine===

* Run the TallyEngine in GenMAPP Builder and record the number of records for UniProt and GO in the XML data and in the Postgres databases.
** Choose the menu item Tallies > Run XML and Database Tallies for UniProt and GO...
** Take a screenshot of the results. Upload the image to the wiki and display it on this page.
[[Image:Tallyengine_Vcholerae_results_NA.png]]

=== Using XMLPipeDB match to Validate the XML Results from the TallyEngine===

* After entering into the directory containing the match utility and the uniprot xml file, I ran the command
java -jar xmlpipedb-match-1.1.1.jar "VC_[0-9][0-9][0-9][0-9]" < uniprot-organism%3A243277.xml
* It returned a total of 2,738 unique matches.
[[Image:Xmlpipedb_match_results_NA.png]]

Are your results the same as you got for the TallyEngine? Why or why not?
* My results were not the same as my results for Tally Engine. Tally Engine returned 3,789 unique matches, while this returned 2,738. This is likely due to potentially due to differently formatted names in the xml file, as we are only checking for names in the structure VC_####.

=== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine===

*Next, I checked the values in SQL to see if they would validate the results from Tally Engine or the match utility.
* In PgAdminIII, I ran the command:
select count(*) from genenametype where type = 'ordered locus' and value ~ 'VC_[0-9][0-9][0-9][0-9]';
[[Image:SQL_Query_Results_NA.png]]

*Are your results the same as reported by the TallyEngine? Why or why not?
** The results were not the same as the ones reported by Tally Engine, nor were they the same as the ones reported by the match utility. It returned 2,737 matches, one less than the result returned by the match utility. While it is less than the Tally Engine perhaps due to a different naming convention, I don't fully understand why it would be off by one from the match utility.

===OriginalRowCounts Comparison===

Within the .gdb file, look at the OriginalRowCounts table to see if the database has the expected tables with the expected number of records. Compare the tables and records with a benchmark .gdb file.

Benchmark .gdb file:

Copy the OriginalRowCounts table from the benchmark and new gdb and paste them here:

Note:

===Visual Inspection===

Perform visual inspection of individual tables to see if there are any problems.

* Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?
* Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?

Note:

===.gdb Use in GenMAPP===



While the above sections perform quality assurance on the exported Gene Database via verifying ID counts, the "proof in the pudding" is to actually use the Gene Database in GenMAPP. You can follow the instructions in [http://www.openwetware.org/wiki/BIOL367/F10:GenMAPP_and_MAPPFinder_Protocols Part 2 of the ''Vibrio cholerae'' Microarray Data Analysis] to verify that the Gene Database works in GenMAPP. In this case, the emphasis is not on the findings of the data analysis itself, but that the Gene Database functions appropriate in GenMAPP.

For assistance with using the GenMAPP program, the GenMAPP Help is very extensive. To access it within GenMAPP, go to the menu item Help > GenMAPP Help and either browse or search for your topic of interest.

Note:

====Putting a gene on the MAPP using the GeneFinder window====

* In the main GenMAPP Drafting Board window, left-click on the icon for "Gene" in the upper left corner of the window. Click on the Drafting Board to place the Gene on the MAPP. Now, right-click on the gene to access the GeneFinder window. Type or paste a gene ID into the Gene ID field. Select the appropriate Gene ID system from the drop-down menu and click the Search button. For example, for ''Vibrio cholerae'', you could search for the ID "VC0028", which is an OrderedLocusNames ID. Once the ID has been found, click the OK button to return to the Drafting Board window.
** For the Final Project, you will need to try a sample ID from each of the gene ID systems, not just OrderedLocusNames.
* Open the Backpage by left-clicking on the gene box on the Drafting Board to see if all of the cross-referenced IDs that are supposed to be there are there.

Note:

====Creating an Expression Dataset in the Expression Dataset Manager====

* How many of the IDs were imported out of the total IDs in the microarray dataset? How many exceptions were there? Look in the EX.txt file and look at the error codes for the records that were not imported into the Expression Dataset. Do these represent IDs that were present in the UniProt XML, but were somehow not imported? or were they not present in the UniProt XML?

Note:

====Coloring a MAPP with expression data====

Note:

====Running MAPPFinder====

Note:

== Links ==
{{Template:Nanguiano}}

File:Vc-Std 20160916 NA.zip

2016-09-17T02:33:52Z

Nanguiano: The GDB output from the V. cholerae export performed on 9/16/16. ~~~~

The GDB output from the V. cholerae export performed on 9/16/16. [[User:Nanguiano|Nanguiano]] ([[User talk:Nanguiano|talk]]) 19:33, 16 September 2016 (PDT)

Nanguiano Week 9

2016-09-17T02:32:39Z

Nanguiano: /* Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine */ spelling is hard

== Running GenMAPP Builder ==

I followed the process from the [[Running_GenMAPP_Builder | Running GenMAPP Builder Tutorial Page]].

=== Software ===

The following software was utilized:
# Any tool that can unpack .gz and .zip files
#* We use [http://www.7-zip.org/ 7-zip]
# [http://www.postgresql.org PostgreSQL] on Windows (http://www.enterprisedb.com/products-services-training/pgdownload)
# GenMAPP Builder version 3.0.0 build 5 (https://github.com/lmu-bioinformatics/xmlpipedb/releases)
# Java JDK 1.8 64-bit
#* [http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html Download page]
#* File to download is: jdk-8u65-windows-x64.exe
# GenMAPP 2 can be downloaded [https://github.com/GenMAPPCS/genmapp here]. The file to download is "GenMAPPv2Setup.exe".
# XMLPipeDB ''match'' utility (https://sourceforge.net/projects/xmlpipedb/files/) for counting IDs in XML files
# Microsoft Access or any other tool that can read .mdb files

==Quality Assurance==

The template for this information was taken from [[Gene_Database_Testing_Report_Sample | the Gene Database Testing Report Sample Page]].

===Export Information===

Version of GenMAPP Builder: '''3.0.0 build 5'''

Computer on which export was run: '''Seaver 120-14, HP LV2311 Windows 7 Enterprise'''

Postgres Database name: '''V_Cholerae_20160916_gmbuilder03b5'''

UniProt XML filename (give filename and upload and link to compressed file): [[Media:Uniprot-organism-243277-NA.zip|uniprot-organism%3A243277.xml]]
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): '''UniProt Release 2016_08'''
* UniProt XML download link: http://www.uniprot.org/uniprot/?query=organism:243277
* Time taken to import: '''2.92 minutes'''
** Note: ''None''

GO OBO-XML filename (give filename and upload and link to compressed file): [[Media:Go_daily-termdb-old-AV-NA.zip|go_daily-termdb.obo-xml]]
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): '''10/27/2015, 2:24am'''
* GO OBO-XML download link: http://geneontology.org/page/download-ontology#Legacy_Downloads
* Time taken to import: '''6.87 minutes'''
* Time taken to process: '''4.26 minutes'''
** Note: ''The GO OBO-XML file and version is from [[Anuvarsh_Week_9 | Anu Varshneya's Week 9 Journal]]. Currently available GO OBO-XML files do not run properly in GenMAPP builder.''

GOA filename (give filename and upload and link to compressed file): [[Media:46.V_cholerae_ATCC_39315_NA.goa.zip|46.V_cholerae_ATCC_39315.goa]]
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): '''07/05/16'''
* GOA download link: http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/46.V_cholerae_ATCC_39315.goa
* Time taken to import: '''0.06 minutes'''
** Note: ''None''

Name of .gdb file (give filename and upload and link to compressed file):
* Time taken to export: '''5.15 hours'''
** Start time: '''09/16/16, 12:26:08 PM'''
** End time: '''09/16/16, 7:16:23 PM'''
*** Note: There was a time during the export in which the computer was locked and not running the export, from 2:56pm - 4:19pm.

===TallyEngine===

* Run the TallyEngine in GenMAPP Builder and record the number of records for UniProt and GO in the XML data and in the Postgres databases.
** Choose the menu item Tallies > Run XML and Database Tallies for UniProt and GO...
** Take a screenshot of the results. Upload the image to the wiki and display it on this page.
[[Image:Tallyengine_Vcholerae_results_NA.png]]

=== Using XMLPipeDB match to Validate the XML Results from the TallyEngine===

* After entering into the directory containing the match utility and the uniprot xml file, I ran the command
java -jar xmlpipedb-match-1.1.1.jar "VC_[0-9][0-9][0-9][0-9]" < uniprot-organism%3A243277.xml
* It returned a total of 2,738 unique matches.
[[Image:Xmlpipedb_match_results_NA.png]]

Are your results the same as you got for the TallyEngine? Why or why not?
* My results were not the same as my results for Tally Engine. Tally Engine returned 3,789 unique matches, while this returned 2,738. This is likely due to potentially due to differently formatted names in the xml file, as we are only checking for names in the structure VC_####.

=== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine===

*Next, I checked the values in SQL to see if they would validate the results from Tally Engine or the match utility.
* In PgAdminIII, I ran the command:
select count(*) from genenametype where type = 'ordered locus' and value ~ 'VC_[0-9][0-9][0-9][0-9]';
[[Image:SQL_Query_Results_NA.png]]

*Are your results the same as reported by the TallyEngine? Why or why not?
** The results were not the same as the ones reported by Tally Engine, nor were they the same as the ones reported by the match utility. It returned 2,737 matches, one less than the result returned by the match utility. While it is less than the Tally Engine perhaps due to a different naming convention, I don't fully understand why it would be off by one from the match utility.

===OriginalRowCounts Comparison===

Within the .gdb file, look at the OriginalRowCounts table to see if the database has the expected tables with the expected number of records. Compare the tables and records with a benchmark .gdb file.

Benchmark .gdb file:

Copy the OriginalRowCounts table from the benchmark and new gdb and paste them here:

Note:

===Visual Inspection===

Perform visual inspection of individual tables to see if there are any problems.

* Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?
* Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?

Note:

===.gdb Use in GenMAPP===



While the above sections perform quality assurance on the exported Gene Database via verifying ID counts, the "proof in the pudding" is to actually use the Gene Database in GenMAPP. You can follow the instructions in [http://www.openwetware.org/wiki/BIOL367/F10:GenMAPP_and_MAPPFinder_Protocols Part 2 of the ''Vibrio cholerae'' Microarray Data Analysis] to verify that the Gene Database works in GenMAPP. In this case, the emphasis is not on the findings of the data analysis itself, but that the Gene Database functions appropriate in GenMAPP.

For assistance with using the GenMAPP program, the GenMAPP Help is very extensive. To access it within GenMAPP, go to the menu item Help > GenMAPP Help and either browse or search for your topic of interest.

Note:

====Putting a gene on the MAPP using the GeneFinder window====

* In the main GenMAPP Drafting Board window, left-click on the icon for "Gene" in the upper left corner of the window. Click on the Drafting Board to place the Gene on the MAPP. Now, right-click on the gene to access the GeneFinder window. Type or paste a gene ID into the Gene ID field. Select the appropriate Gene ID system from the drop-down menu and click the Search button. For example, for ''Vibrio cholerae'', you could search for the ID "VC0028", which is an OrderedLocusNames ID. Once the ID has been found, click the OK button to return to the Drafting Board window.
** For the Final Project, you will need to try a sample ID from each of the gene ID systems, not just OrderedLocusNames.
* Open the Backpage by left-clicking on the gene box on the Drafting Board to see if all of the cross-referenced IDs that are supposed to be there are there.

Note:

====Creating an Expression Dataset in the Expression Dataset Manager====

* How many of the IDs were imported out of the total IDs in the microarray dataset? How many exceptions were there? Look in the EX.txt file and look at the error codes for the records that were not imported into the Expression Dataset. Do these represent IDs that were present in the UniProt XML, but were somehow not imported? or were they not present in the UniProt XML?

Note:

====Coloring a MAPP with expression data====

Note:

====Running MAPPFinder====

Note:

== Links ==
{{Template:Nanguiano}}

Nanguiano Week 9

2016-09-17T02:32:14Z

Nanguiano: /* Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine */ Add results

== Running GenMAPP Builder ==

I followed the process from the [[Running_GenMAPP_Builder | Running GenMAPP Builder Tutorial Page]].

=== Software ===

The following software was utilized:
# Any tool that can unpack .gz and .zip files
#* We use [http://www.7-zip.org/ 7-zip]
# [http://www.postgresql.org PostgreSQL] on Windows (http://www.enterprisedb.com/products-services-training/pgdownload)
# GenMAPP Builder version 3.0.0 build 5 (https://github.com/lmu-bioinformatics/xmlpipedb/releases)
# Java JDK 1.8 64-bit
#* [http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html Download page]
#* File to download is: jdk-8u65-windows-x64.exe
# GenMAPP 2 can be downloaded [https://github.com/GenMAPPCS/genmapp here]. The file to download is "GenMAPPv2Setup.exe".
# XMLPipeDB ''match'' utility (https://sourceforge.net/projects/xmlpipedb/files/) for counting IDs in XML files
# Microsoft Access or any other tool that can read .mdb files

==Quality Assurance==

The template for this information was taken from [[Gene_Database_Testing_Report_Sample | the Gene Database Testing Report Sample Page]].

===Export Information===

Version of GenMAPP Builder: '''3.0.0 build 5'''

Computer on which export was run: '''Seaver 120-14, HP LV2311 Windows 7 Enterprise'''

Postgres Database name: '''V_Cholerae_20160916_gmbuilder03b5'''

UniProt XML filename (give filename and upload and link to compressed file): [[Media:Uniprot-organism-243277-NA.zip|uniprot-organism%3A243277.xml]]
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): '''UniProt Release 2016_08'''
* UniProt XML download link: http://www.uniprot.org/uniprot/?query=organism:243277
* Time taken to import: '''2.92 minutes'''
** Note: ''None''

GO OBO-XML filename (give filename and upload and link to compressed file): [[Media:Go_daily-termdb-old-AV-NA.zip|go_daily-termdb.obo-xml]]
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): '''10/27/2015, 2:24am'''
* GO OBO-XML download link: http://geneontology.org/page/download-ontology#Legacy_Downloads
* Time taken to import: '''6.87 minutes'''
* Time taken to process: '''4.26 minutes'''
** Note: ''The GO OBO-XML file and version is from [[Anuvarsh_Week_9 | Anu Varshneya's Week 9 Journal]]. Currently available GO OBO-XML files do not run properly in GenMAPP builder.''

GOA filename (give filename and upload and link to compressed file): [[Media:46.V_cholerae_ATCC_39315_NA.goa.zip|46.V_cholerae_ATCC_39315.goa]]
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): '''07/05/16'''
* GOA download link: http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/46.V_cholerae_ATCC_39315.goa
* Time taken to import: '''0.06 minutes'''
** Note: ''None''

Name of .gdb file (give filename and upload and link to compressed file):
* Time taken to export: '''5.15 hours'''
** Start time: '''09/16/16, 12:26:08 PM'''
** End time: '''09/16/16, 7:16:23 PM'''
*** Note: There was a time during the export in which the computer was locked and not running the export, from 2:56pm - 4:19pm.

===TallyEngine===

* Run the TallyEngine in GenMAPP Builder and record the number of records for UniProt and GO in the XML data and in the Postgres databases.
** Choose the menu item Tallies > Run XML and Database Tallies for UniProt and GO...
** Take a screenshot of the results. Upload the image to the wiki and display it on this page.
[[Image:Tallyengine_Vcholerae_results_NA.png]]

=== Using XMLPipeDB match to Validate the XML Results from the TallyEngine===

* After entering into the directory containing the match utility and the uniprot xml file, I ran the command
java -jar xmlpipedb-match-1.1.1.jar "VC_[0-9][0-9][0-9][0-9]" < uniprot-organism%3A243277.xml
* It returned a total of 2,738 unique matches.
[[Image:Xmlpipedb_match_results_NA.png]]

Are your results the same as you got for the TallyEngine? Why or why not?
* My results were not the same as my results for Tally Engine. Tally Engine returned 3,789 unique matches, while this returned 2,738. This is likely due to potentially due to differently formatted names in the xml file, as we are only checking for names in the structure VC_####.

=== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine===

*Next, I checked the values in SQL to see if they would valudate the results from Tally Engine or the match utility.
* In PgAdminIII, I ran the command:
select count(*) from genenametype where type = 'ordered locus' and value ~ 'VC_[0-9][0-9][0-9][0-9]';
[[Image:SQL_Query_Results_NA.png]]

*Are your results the same as reported by the TallyEngine? Why or why not?
** The results were not the same as the ones reported by Tally Engine, nor were they the same as the ones reported by the match utility. It returned 2,737 matches, one less than the result returned by the match utility. While it is less than the Tally Engine perhaps due to a different naming convention, I don't fully understand why it would be off by one from the match utility.

===OriginalRowCounts Comparison===

Within the .gdb file, look at the OriginalRowCounts table to see if the database has the expected tables with the expected number of records. Compare the tables and records with a benchmark .gdb file.

Benchmark .gdb file:

Copy the OriginalRowCounts table from the benchmark and new gdb and paste them here:

Note:

===Visual Inspection===

Perform visual inspection of individual tables to see if there are any problems.

* Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?
* Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?

Note:

===.gdb Use in GenMAPP===



While the above sections perform quality assurance on the exported Gene Database via verifying ID counts, the "proof in the pudding" is to actually use the Gene Database in GenMAPP. You can follow the instructions in [http://www.openwetware.org/wiki/BIOL367/F10:GenMAPP_and_MAPPFinder_Protocols Part 2 of the ''Vibrio cholerae'' Microarray Data Analysis] to verify that the Gene Database works in GenMAPP. In this case, the emphasis is not on the findings of the data analysis itself, but that the Gene Database functions appropriate in GenMAPP.

For assistance with using the GenMAPP program, the GenMAPP Help is very extensive. To access it within GenMAPP, go to the menu item Help > GenMAPP Help and either browse or search for your topic of interest.

Note:

====Putting a gene on the MAPP using the GeneFinder window====

* In the main GenMAPP Drafting Board window, left-click on the icon for "Gene" in the upper left corner of the window. Click on the Drafting Board to place the Gene on the MAPP. Now, right-click on the gene to access the GeneFinder window. Type or paste a gene ID into the Gene ID field. Select the appropriate Gene ID system from the drop-down menu and click the Search button. For example, for ''Vibrio cholerae'', you could search for the ID "VC0028", which is an OrderedLocusNames ID. Once the ID has been found, click the OK button to return to the Drafting Board window.
** For the Final Project, you will need to try a sample ID from each of the gene ID systems, not just OrderedLocusNames.
* Open the Backpage by left-clicking on the gene box on the Drafting Board to see if all of the cross-referenced IDs that are supposed to be there are there.

Note:

====Creating an Expression Dataset in the Expression Dataset Manager====

* How many of the IDs were imported out of the total IDs in the microarray dataset? How many exceptions were there? Look in the EX.txt file and look at the error codes for the records that were not imported into the Expression Dataset. Do these represent IDs that were present in the UniProt XML, but were somehow not imported? or were they not present in the UniProt XML?

Note:

====Coloring a MAPP with expression data====

Note:

====Running MAPPFinder====

Note:

== Links ==
{{Template:Nanguiano}}

File:SQL Query Results NA.png

2016-09-17T02:29:43Z

Nanguiano: The SQL query results from the database. ~~~~

The SQL query results from the database. [[User:Nanguiano|Nanguiano]] ([[User talk:Nanguiano|talk]]) 19:29, 16 September 2016 (PDT)

Nanguiano Week 9

2016-09-17T02:24:27Z

Nanguiano: /* Using XMLPipeDB match to Validate the XML Results from the TallyEngine */ Added some detail and fixed typos.

== Running GenMAPP Builder ==

I followed the process from the [[Running_GenMAPP_Builder | Running GenMAPP Builder Tutorial Page]].

=== Software ===

The following software was utilized:
# Any tool that can unpack .gz and .zip files
#* We use [http://www.7-zip.org/ 7-zip]
# [http://www.postgresql.org PostgreSQL] on Windows (http://www.enterprisedb.com/products-services-training/pgdownload)
# GenMAPP Builder version 3.0.0 build 5 (https://github.com/lmu-bioinformatics/xmlpipedb/releases)
# Java JDK 1.8 64-bit
#* [http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html Download page]
#* File to download is: jdk-8u65-windows-x64.exe
# GenMAPP 2 can be downloaded [https://github.com/GenMAPPCS/genmapp here]. The file to download is "GenMAPPv2Setup.exe".
# XMLPipeDB ''match'' utility (https://sourceforge.net/projects/xmlpipedb/files/) for counting IDs in XML files
# Microsoft Access or any other tool that can read .mdb files

==Quality Assurance==

The template for this information was taken from [[Gene_Database_Testing_Report_Sample | the Gene Database Testing Report Sample Page]].

===Export Information===

Version of GenMAPP Builder: '''3.0.0 build 5'''

Computer on which export was run: '''Seaver 120-14, HP LV2311 Windows 7 Enterprise'''

Postgres Database name: '''V_Cholerae_20160916_gmbuilder03b5'''

UniProt XML filename (give filename and upload and link to compressed file): [[Media:Uniprot-organism-243277-NA.zip|uniprot-organism%3A243277.xml]]
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): '''UniProt Release 2016_08'''
* UniProt XML download link: http://www.uniprot.org/uniprot/?query=organism:243277
* Time taken to import: '''2.92 minutes'''
** Note: ''None''

GO OBO-XML filename (give filename and upload and link to compressed file): [[Media:Go_daily-termdb-old-AV-NA.zip|go_daily-termdb.obo-xml]]
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): '''10/27/2015, 2:24am'''
* GO OBO-XML download link: http://geneontology.org/page/download-ontology#Legacy_Downloads
* Time taken to import: '''6.87 minutes'''
* Time taken to process: '''4.26 minutes'''
** Note: ''The GO OBO-XML file and version is from [[Anuvarsh_Week_9 | Anu Varshneya's Week 9 Journal]]. Currently available GO OBO-XML files do not run properly in GenMAPP builder.''

GOA filename (give filename and upload and link to compressed file): [[Media:46.V_cholerae_ATCC_39315_NA.goa.zip|46.V_cholerae_ATCC_39315.goa]]
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): '''07/05/16'''
* GOA download link: http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/46.V_cholerae_ATCC_39315.goa
* Time taken to import: '''0.06 minutes'''
** Note: ''None''

Name of .gdb file (give filename and upload and link to compressed file):
* Time taken to export: '''5.15 hours'''
** Start time: '''09/16/16, 12:26:08 PM'''
** End time: '''09/16/16, 7:16:23 PM'''
*** Note: There was a time during the export in which the computer was locked and not running the export, from 2:56pm - 4:19pm.

===TallyEngine===

* Run the TallyEngine in GenMAPP Builder and record the number of records for UniProt and GO in the XML data and in the Postgres databases.
** Choose the menu item Tallies > Run XML and Database Tallies for UniProt and GO...
** Take a screenshot of the results. Upload the image to the wiki and display it on this page.
[[Image:Tallyengine_Vcholerae_results_NA.png]]

=== Using XMLPipeDB match to Validate the XML Results from the TallyEngine===

* After entering into the directory containing the match utility and the uniprot xml file, I ran the command
java -jar xmlpipedb-match-1.1.1.jar "VC_[0-9][0-9][0-9][0-9]" < uniprot-organism%3A243277.xml
* It returned a total of 2,738 unique matches.
[[Image:Xmlpipedb_match_results_NA.png]]

Are your results the same as you got for the TallyEngine? Why or why not?
* My results were not the same as my results for Tally Engine. Tally Engine returned 3,789 unique matches, while this returned 2,738. This is likely due to potentially due to differently formatted names in the xml file, as we are only checking for names in the structure VC_####.

=== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine===

For more information, [[How_Do_I_Count_Thee%3F_Let_Me_Count_The_Ways | see this page.]]

You can also look for counts at the SQL level, using some variation of a ''select count(*)'' query. This requires some knowledge of which table received what data. Here’s an initial tip: the ''gene/name'' tags in the XML file land in the ''genenametype'' table. A query on this table counting values from this table that were marked as ''ordered locus'' in the XML file matching the pattern ''VC_[0-9][0-9][0-9][0-9]'' would look like this:

select count(*) from genenametype where type = 'ordered locus' and value ~ 'VC_[0-9][0-9][0-9][0-9]';

In ''pgAdmin III'', you can issue these queries by clicking on the pencil/SQL icon in the toolbar, typing the query into the ''SQL Editor'' tab, then clicking on the green triangular ''Play'' button to run.

[[Image:Pgadminiii-query.png]]

Are your results the same as reported by the TallyEngine? Why or why not?

===OriginalRowCounts Comparison===

Within the .gdb file, look at the OriginalRowCounts table to see if the database has the expected tables with the expected number of records. Compare the tables and records with a benchmark .gdb file.

Benchmark .gdb file:

Copy the OriginalRowCounts table from the benchmark and new gdb and paste them here:

Note:

===Visual Inspection===

Perform visual inspection of individual tables to see if there are any problems.

* Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?
* Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?

Note:

===.gdb Use in GenMAPP===



While the above sections perform quality assurance on the exported Gene Database via verifying ID counts, the "proof in the pudding" is to actually use the Gene Database in GenMAPP. You can follow the instructions in [http://www.openwetware.org/wiki/BIOL367/F10:GenMAPP_and_MAPPFinder_Protocols Part 2 of the ''Vibrio cholerae'' Microarray Data Analysis] to verify that the Gene Database works in GenMAPP. In this case, the emphasis is not on the findings of the data analysis itself, but that the Gene Database functions appropriate in GenMAPP.

For assistance with using the GenMAPP program, the GenMAPP Help is very extensive. To access it within GenMAPP, go to the menu item Help > GenMAPP Help and either browse or search for your topic of interest.

Note:

====Putting a gene on the MAPP using the GeneFinder window====

* In the main GenMAPP Drafting Board window, left-click on the icon for "Gene" in the upper left corner of the window. Click on the Drafting Board to place the Gene on the MAPP. Now, right-click on the gene to access the GeneFinder window. Type or paste a gene ID into the Gene ID field. Select the appropriate Gene ID system from the drop-down menu and click the Search button. For example, for ''Vibrio cholerae'', you could search for the ID "VC0028", which is an OrderedLocusNames ID. Once the ID has been found, click the OK button to return to the Drafting Board window.
** For the Final Project, you will need to try a sample ID from each of the gene ID systems, not just OrderedLocusNames.
* Open the Backpage by left-clicking on the gene box on the Drafting Board to see if all of the cross-referenced IDs that are supposed to be there are there.

Note:

====Creating an Expression Dataset in the Expression Dataset Manager====

* How many of the IDs were imported out of the total IDs in the microarray dataset? How many exceptions were there? Look in the EX.txt file and look at the error codes for the records that were not imported into the Expression Dataset. Do these represent IDs that were present in the UniProt XML, but were somehow not imported? or were they not present in the UniProt XML?

Note:

====Coloring a MAPP with expression data====

Note:

====Running MAPPFinder====

Note:

== Links ==
{{Template:Nanguiano}}

Nanguiano Week 9

2016-09-17T02:23:21Z

Nanguiano: /* Using XMLPipeDB match to Validate the XML Results from the TallyEngine */ Add image and results.

== Running GenMAPP Builder ==

I followed the process from the [[Running_GenMAPP_Builder | Running GenMAPP Builder Tutorial Page]].

=== Software ===

The following software was utilized:
# Any tool that can unpack .gz and .zip files
#* We use [http://www.7-zip.org/ 7-zip]
# [http://www.postgresql.org PostgreSQL] on Windows (http://www.enterprisedb.com/products-services-training/pgdownload)
# GenMAPP Builder version 3.0.0 build 5 (https://github.com/lmu-bioinformatics/xmlpipedb/releases)
# Java JDK 1.8 64-bit
#* [http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html Download page]
#* File to download is: jdk-8u65-windows-x64.exe
# GenMAPP 2 can be downloaded [https://github.com/GenMAPPCS/genmapp here]. The file to download is "GenMAPPv2Setup.exe".
# XMLPipeDB ''match'' utility (https://sourceforge.net/projects/xmlpipedb/files/) for counting IDs in XML files
# Microsoft Access or any other tool that can read .mdb files

==Quality Assurance==

The template for this information was taken from [[Gene_Database_Testing_Report_Sample | the Gene Database Testing Report Sample Page]].

===Export Information===

Version of GenMAPP Builder: '''3.0.0 build 5'''

Computer on which export was run: '''Seaver 120-14, HP LV2311 Windows 7 Enterprise'''

Postgres Database name: '''V_Cholerae_20160916_gmbuilder03b5'''

UniProt XML filename (give filename and upload and link to compressed file): [[Media:Uniprot-organism-243277-NA.zip|uniprot-organism%3A243277.xml]]
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): '''UniProt Release 2016_08'''
* UniProt XML download link: http://www.uniprot.org/uniprot/?query=organism:243277
* Time taken to import: '''2.92 minutes'''
** Note: ''None''

GO OBO-XML filename (give filename and upload and link to compressed file): [[Media:Go_daily-termdb-old-AV-NA.zip|go_daily-termdb.obo-xml]]
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): '''10/27/2015, 2:24am'''
* GO OBO-XML download link: http://geneontology.org/page/download-ontology#Legacy_Downloads
* Time taken to import: '''6.87 minutes'''
* Time taken to process: '''4.26 minutes'''
** Note: ''The GO OBO-XML file and version is from [[Anuvarsh_Week_9 | Anu Varshneya's Week 9 Journal]]. Currently available GO OBO-XML files do not run properly in GenMAPP builder.''

GOA filename (give filename and upload and link to compressed file): [[Media:46.V_cholerae_ATCC_39315_NA.goa.zip|46.V_cholerae_ATCC_39315.goa]]
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): '''07/05/16'''
* GOA download link: http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/46.V_cholerae_ATCC_39315.goa
* Time taken to import: '''0.06 minutes'''
** Note: ''None''

Name of .gdb file (give filename and upload and link to compressed file):
* Time taken to export: '''5.15 hours'''
** Start time: '''09/16/16, 12:26:08 PM'''
** End time: '''09/16/16, 7:16:23 PM'''
*** Note: There was a time during the export in which the computer was locked and not running the export, from 2:56pm - 4:19pm.

===TallyEngine===

* Run the TallyEngine in GenMAPP Builder and record the number of records for UniProt and GO in the XML data and in the Postgres databases.
** Choose the menu item Tallies > Run XML and Database Tallies for UniProt and GO...
** Take a screenshot of the results. Upload the image to the wiki and display it on this page.
[[Image:Tallyengine_Vcholerae_results_NA.png]]

=== Using XMLPipeDB match to Validate the XML Results from the TallyEngine===

[[How_Do_I_Count_Thee%3F_Let_Me_Count_The_Ways | Follow the instructions found on this page to run XMLPipeDB match.]]

* I ran the command
java -jar xmlpipedb-match-1.1.1.jar "VC_[0-9][0-9][0-9][0-9]" < uniport-organism%3A243277.xml
* It returned a total of 2,738 unique matches.
[[Image:Xmlpipedb_match_results_NA.png]]

Are your results the same as you got for the TallyEngine? Why or why not?
* My results were not the same as my results for Tally Engine. Tally Engine returned 3,789 unique matches, while this returned 2,738.This is likely due to potentially due to differently formatted names in the xml file, as we are only checking for names in the structure VC_####.

=== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine===

For more information, [[How_Do_I_Count_Thee%3F_Let_Me_Count_The_Ways | see this page.]]

You can also look for counts at the SQL level, using some variation of a ''select count(*)'' query. This requires some knowledge of which table received what data. Here’s an initial tip: the ''gene/name'' tags in the XML file land in the ''genenametype'' table. A query on this table counting values from this table that were marked as ''ordered locus'' in the XML file matching the pattern ''VC_[0-9][0-9][0-9][0-9]'' would look like this:

select count(*) from genenametype where type = 'ordered locus' and value ~ 'VC_[0-9][0-9][0-9][0-9]';

In ''pgAdmin III'', you can issue these queries by clicking on the pencil/SQL icon in the toolbar, typing the query into the ''SQL Editor'' tab, then clicking on the green triangular ''Play'' button to run.

[[Image:Pgadminiii-query.png]]

Are your results the same as reported by the TallyEngine? Why or why not?

===OriginalRowCounts Comparison===

Within the .gdb file, look at the OriginalRowCounts table to see if the database has the expected tables with the expected number of records. Compare the tables and records with a benchmark .gdb file.

Benchmark .gdb file:

Copy the OriginalRowCounts table from the benchmark and new gdb and paste them here:

Note:

===Visual Inspection===

Perform visual inspection of individual tables to see if there are any problems.

* Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?
* Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?

Note:

===.gdb Use in GenMAPP===



While the above sections perform quality assurance on the exported Gene Database via verifying ID counts, the "proof in the pudding" is to actually use the Gene Database in GenMAPP. You can follow the instructions in [http://www.openwetware.org/wiki/BIOL367/F10:GenMAPP_and_MAPPFinder_Protocols Part 2 of the ''Vibrio cholerae'' Microarray Data Analysis] to verify that the Gene Database works in GenMAPP. In this case, the emphasis is not on the findings of the data analysis itself, but that the Gene Database functions appropriate in GenMAPP.

For assistance with using the GenMAPP program, the GenMAPP Help is very extensive. To access it within GenMAPP, go to the menu item Help > GenMAPP Help and either browse or search for your topic of interest.

Note:

====Putting a gene on the MAPP using the GeneFinder window====

* In the main GenMAPP Drafting Board window, left-click on the icon for "Gene" in the upper left corner of the window. Click on the Drafting Board to place the Gene on the MAPP. Now, right-click on the gene to access the GeneFinder window. Type or paste a gene ID into the Gene ID field. Select the appropriate Gene ID system from the drop-down menu and click the Search button. For example, for ''Vibrio cholerae'', you could search for the ID "VC0028", which is an OrderedLocusNames ID. Once the ID has been found, click the OK button to return to the Drafting Board window.
** For the Final Project, you will need to try a sample ID from each of the gene ID systems, not just OrderedLocusNames.
* Open the Backpage by left-clicking on the gene box on the Drafting Board to see if all of the cross-referenced IDs that are supposed to be there are there.

Note:

====Creating an Expression Dataset in the Expression Dataset Manager====

* How many of the IDs were imported out of the total IDs in the microarray dataset? How many exceptions were there? Look in the EX.txt file and look at the error codes for the records that were not imported into the Expression Dataset. Do these represent IDs that were present in the UniProt XML, but were somehow not imported? or were they not present in the UniProt XML?

Note:

====Coloring a MAPP with expression data====

Note:

====Running MAPPFinder====

Note:

== Links ==
{{Template:Nanguiano}}

File:Xmlpipedb match results NA.png

2016-09-17T02:20:33Z

Nanguiano: XMLPipeDB Match utility results. ~~~~

XMLPipeDB Match utility results. [[User:Nanguiano|Nanguiano]] ([[User talk:Nanguiano|talk]]) 19:20, 16 September 2016 (PDT)

Nanguiano Week 9

2016-09-17T02:11:40Z

Nanguiano: /* TallyEngine */ Added TallyEngine image

== Running GenMAPP Builder ==

I followed the process from the [[Running_GenMAPP_Builder | Running GenMAPP Builder Tutorial Page]].

=== Software ===

The following software was utilized:
# Any tool that can unpack .gz and .zip files
#* We use [http://www.7-zip.org/ 7-zip]
# [http://www.postgresql.org PostgreSQL] on Windows (http://www.enterprisedb.com/products-services-training/pgdownload)
# GenMAPP Builder version 3.0.0 build 5 (https://github.com/lmu-bioinformatics/xmlpipedb/releases)
# Java JDK 1.8 64-bit
#* [http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html Download page]
#* File to download is: jdk-8u65-windows-x64.exe
# GenMAPP 2 can be downloaded [https://github.com/GenMAPPCS/genmapp here]. The file to download is "GenMAPPv2Setup.exe".
# XMLPipeDB ''match'' utility (https://sourceforge.net/projects/xmlpipedb/files/) for counting IDs in XML files
# Microsoft Access or any other tool that can read .mdb files

==Quality Assurance==

The template for this information was taken from [[Gene_Database_Testing_Report_Sample | the Gene Database Testing Report Sample Page]].

===Export Information===

Version of GenMAPP Builder: '''3.0.0 build 5'''

Computer on which export was run: '''Seaver 120-14, HP LV2311 Windows 7 Enterprise'''

Postgres Database name: '''V_Cholerae_20160916_gmbuilder03b5'''

UniProt XML filename (give filename and upload and link to compressed file): [[Media:Uniprot-organism-243277-NA.zip|uniprot-organism%3A243277.xml]]
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): '''UniProt Release 2016_08'''
* UniProt XML download link: http://www.uniprot.org/uniprot/?query=organism:243277
* Time taken to import: '''2.92 minutes'''
** Note: ''None''

GO OBO-XML filename (give filename and upload and link to compressed file): [[Media:Go_daily-termdb-old-AV-NA.zip|go_daily-termdb.obo-xml]]
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): '''10/27/2015, 2:24am'''
* GO OBO-XML download link: http://geneontology.org/page/download-ontology#Legacy_Downloads
* Time taken to import: '''6.87 minutes'''
* Time taken to process: '''4.26 minutes'''
** Note: ''The GO OBO-XML file and version is from [[Anuvarsh_Week_9 | Anu Varshneya's Week 9 Journal]]. Currently available GO OBO-XML files do not run properly in GenMAPP builder.''

GOA filename (give filename and upload and link to compressed file): [[Media:46.V_cholerae_ATCC_39315_NA.goa.zip|46.V_cholerae_ATCC_39315.goa]]
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): '''07/05/16'''
* GOA download link: http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/46.V_cholerae_ATCC_39315.goa
* Time taken to import: '''0.06 minutes'''
** Note: ''None''

Name of .gdb file (give filename and upload and link to compressed file):
* Time taken to export: '''5.15 hours'''
** Start time: '''09/16/16, 12:26:08 PM'''
** End time: '''09/16/16, 7:16:23 PM'''
*** Note: There was a time during the export in which the computer was locked and not running the export, from 2:56pm - 4:19pm.

===TallyEngine===

* Run the TallyEngine in GenMAPP Builder and record the number of records for UniProt and GO in the XML data and in the Postgres databases.
** Choose the menu item Tallies > Run XML and Database Tallies for UniProt and GO...
** Take a screenshot of the results. Upload the image to the wiki and display it on this page.
[[Image:Tallyengine_Vcholerae_results_NA.png]]

=== Using XMLPipeDB match to Validate the XML Results from the TallyEngine===

[[How_Do_I_Count_Thee%3F_Let_Me_Count_The_Ways | Follow the instructions found on this page to run XMLPipeDB match.]]

Are your results the same as you got for the TallyEngine? Why or why not?

=== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine===

For more information, [[How_Do_I_Count_Thee%3F_Let_Me_Count_The_Ways | see this page.]]

You can also look for counts at the SQL level, using some variation of a ''select count(*)'' query. This requires some knowledge of which table received what data. Here’s an initial tip: the ''gene/name'' tags in the XML file land in the ''genenametype'' table. A query on this table counting values from this table that were marked as ''ordered locus'' in the XML file matching the pattern ''VC_[0-9][0-9][0-9][0-9]'' would look like this:

select count(*) from genenametype where type = 'ordered locus' and value ~ 'VC_[0-9][0-9][0-9][0-9]';

In ''pgAdmin III'', you can issue these queries by clicking on the pencil/SQL icon in the toolbar, typing the query into the ''SQL Editor'' tab, then clicking on the green triangular ''Play'' button to run.

[[Image:Pgadminiii-query.png]]

Are your results the same as reported by the TallyEngine? Why or why not?

===OriginalRowCounts Comparison===

Within the .gdb file, look at the OriginalRowCounts table to see if the database has the expected tables with the expected number of records. Compare the tables and records with a benchmark .gdb file.

Benchmark .gdb file:

Copy the OriginalRowCounts table from the benchmark and new gdb and paste them here:

Note:

===Visual Inspection===

Perform visual inspection of individual tables to see if there are any problems.

* Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?
* Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?

Note:

===.gdb Use in GenMAPP===



While the above sections perform quality assurance on the exported Gene Database via verifying ID counts, the "proof in the pudding" is to actually use the Gene Database in GenMAPP. You can follow the instructions in [http://www.openwetware.org/wiki/BIOL367/F10:GenMAPP_and_MAPPFinder_Protocols Part 2 of the ''Vibrio cholerae'' Microarray Data Analysis] to verify that the Gene Database works in GenMAPP. In this case, the emphasis is not on the findings of the data analysis itself, but that the Gene Database functions appropriate in GenMAPP.

For assistance with using the GenMAPP program, the GenMAPP Help is very extensive. To access it within GenMAPP, go to the menu item Help > GenMAPP Help and either browse or search for your topic of interest.

Note:

====Putting a gene on the MAPP using the GeneFinder window====

* In the main GenMAPP Drafting Board window, left-click on the icon for "Gene" in the upper left corner of the window. Click on the Drafting Board to place the Gene on the MAPP. Now, right-click on the gene to access the GeneFinder window. Type or paste a gene ID into the Gene ID field. Select the appropriate Gene ID system from the drop-down menu and click the Search button. For example, for ''Vibrio cholerae'', you could search for the ID "VC0028", which is an OrderedLocusNames ID. Once the ID has been found, click the OK button to return to the Drafting Board window.
** For the Final Project, you will need to try a sample ID from each of the gene ID systems, not just OrderedLocusNames.
* Open the Backpage by left-clicking on the gene box on the Drafting Board to see if all of the cross-referenced IDs that are supposed to be there are there.

Note:

====Creating an Expression Dataset in the Expression Dataset Manager====

* How many of the IDs were imported out of the total IDs in the microarray dataset? How many exceptions were there? Look in the EX.txt file and look at the error codes for the records that were not imported into the Expression Dataset. Do these represent IDs that were present in the UniProt XML, but were somehow not imported? or were they not present in the UniProt XML?

Note:

====Coloring a MAPP with expression data====

Note:

====Running MAPPFinder====

Note:

== Links ==
{{Template:Nanguiano}}