<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Troque</id>
		<title>LMU BioDB 2015 - User contributions [en]</title>
		<link rel="self" type="application/atom+xml" href="https://xmlpipedb.lmucs.io/biodb/fall2015/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Troque"/>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php/Special:Contributions/Troque"/>
		<updated>2026-05-03T11:37:38Z</updated>
		<subtitle>User contributions</subtitle>
		<generator>MediaWiki 1.25.1</generator>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=OTS_Deliverables&amp;diff=8189</id>
		<title>OTS Deliverables</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=OTS_Deliverables&amp;diff=8189"/>
				<updated>2015-12-19T00:23:33Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: /* Individual Reflections */ Linking pdf&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Template:Oregon Trail Survivors}}&lt;br /&gt;
==OTS Group Files and Datasets==&lt;br /&gt;
&lt;br /&gt;
[[Media:GenMAPP Builder 12 14 2015 Number 2.zip | Gene Database .gdb]]&lt;br /&gt;
&lt;br /&gt;
[[Media:ReadMe Sf-Std External 20151214.pdf | ReadMe]]&lt;br /&gt;
&lt;br /&gt;
[[Media:ShigellaGeneDatabaseSchema.pdf | Gene Database Schema]]&lt;br /&gt;
&lt;br /&gt;
[[Media:Gene Database Testing Report for Shigella flexneri 2a str 301.pdf | Gene Database Testing Report (.pdf)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:FINAL CompiledRawData RXRP EYKZ20151210.xlsx | Compiled Raw Microarray Dataset (.xlsx)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:FINAL CompiledRawData RXRP EYKZ20151210.txt | Data Used for Import into GenMAPP (.txt)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:FINAL CompiledRawData RXRP EYKZ20151210 (1).gex | GenMAPP Expression Dataset File (.gex)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:FINAL CompiledRawData RXRP EYKZ20151210.EX.txt | Exceptions file (.EX.txt)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:Criterion.GOfiles.zip | Raw MAPPFinder results files (-GO.txt)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015126.gmf | .gmf file]]&lt;br /&gt;
&lt;br /&gt;
[[Media:Filtered MAPPFinder Results.xlsx | Filtered MAPPFinder Results .xlsx]]&lt;br /&gt;
&lt;br /&gt;
[[Media:MAPPFinderResults.zip | Filtered MAPPFinder Results (common GO terms highlighted) .png]]&lt;br /&gt;
&lt;br /&gt;
[[Media:RPRX MAPPs.zip | .zip of .mapp s of relevant genes]]&lt;br /&gt;
&lt;br /&gt;
[[Media:OTSDeliverables.docx | Final Group Report .docx]]&lt;br /&gt;
&lt;br /&gt;
[[Media:FinalOTSPresentation.pptx | Final PowerPoint Presentation]]&lt;br /&gt;
&lt;br /&gt;
==Individual Reflections==&lt;br /&gt;
&lt;br /&gt;
[[Kzebrows Individual Reflection | Kristin Zebrowski]]&lt;br /&gt;
&lt;br /&gt;
[[Eyanosch Individual Reflection | Erich Yanoschik]]&lt;br /&gt;
&lt;br /&gt;
[[Jwoodlee Individual Reflection | Jake Woodlee]]&lt;br /&gt;
&lt;br /&gt;
[[Media:Final Project Reflection OTS TR 20151218.pdf | Trixie Roque]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
==OTS Files==&lt;br /&gt;
&lt;br /&gt;
[[Media:Micro Array Shigella Flexneri 20151011.pdf | Shigella Flexneri Microarray Paper (PDF)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:Shigellamicroarray.pptx | Microarray Journal Club Power Point]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ20152211.xlsx | Microarray Compiled Raw Data RP/RX IDLR]]&lt;br /&gt;
&lt;br /&gt;
[[Media:SamplesFilesCorrespondanceTable SF301a EYKZ201522111.xls | Microarray Corresponding Files Table]]&lt;br /&gt;
&lt;br /&gt;
[[Media: GMBuilder Shigella flexneri.zip]]&lt;br /&gt;
&lt;br /&gt;
[[Media: QA Files.zip | Download  QA files]]&lt;br /&gt;
&lt;br /&gt;
[[Media:GMBuilder December7 2015 build 2.zip]]&lt;br /&gt;
&lt;br /&gt;
[[Media:GenMAPP Builder 12 14 2015.zip]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==GenMAPP User Files==&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015121.xlsx | ScalingCentering file 12/1 .xlsx]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Media:Flagellum Ribosomal Mapp 1 60min 20151012.jpg | RP vs RX 1 MIC @ 60 minutes MAPP 12/12 .jpg]]&lt;br /&gt;
&lt;br /&gt;
[[Media:Flagellum Ribosomal Mapp 0pt5 10min 20151012.jpg | RP vs RX 0.5 MIC @ 10 minutes MAPP 12/12 .jpg]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
====RP (Erich)====&lt;br /&gt;
[[Media:CompiledRaw data GenMAPP Final RP IDLR EYKZ2015126.xlsx | RP Compiled Raw Data Final 12/10]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data GenMAPP ready RP IDLR EYKZ2015126.txt | RP .txt format GenMAPP ready 12/6]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data Errors RP EYKZ2015126.EX.xlsx | RP Exceptions file in Excel format (filtered)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data Errors RP EYKZ2015126.EX.txt | RP Exceptions (txt)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data GenMAPP ready RP IDLR EYKZ2015126.gex | RP .gex file]]&lt;br /&gt;
&lt;br /&gt;
====RX (Kristin)====&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015126.xlsx | RX Compiled Raw Data 12/6]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data statistics BonferroniPvalue RP IDLR EYKZ2015126.txt | RX .txt format GenMAPP ready 12/6]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015126.xlsx | RX Compiled Raw Data as of 12/6]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015126.txt | RX .txt format updated as of 12/6]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015126.EX.txt | RX Exceptions file]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015126.gex | RX .gex file]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RX IDLR KZ2015126.EX.xlsx | RX Exceptions file in Excel format (filtered)]]&lt;br /&gt;
&lt;br /&gt;
--&amp;gt;&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Final_Project_Reflection_OTS_TR_20151218.pdf&amp;diff=8188</id>
		<title>File:Final Project Reflection OTS TR 20151218.pdf</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Final_Project_Reflection_OTS_TR_20151218.pdf&amp;diff=8188"/>
				<updated>2015-12-19T00:23:14Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: Uploading pdf form&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Uploading pdf form&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=OTS_Deliverables&amp;diff=8187</id>
		<title>OTS Deliverables</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=OTS_Deliverables&amp;diff=8187"/>
				<updated>2015-12-19T00:22:09Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: /* Individual Reflections */ Linking word document&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Template:Oregon Trail Survivors}}&lt;br /&gt;
==OTS Group Files and Datasets==&lt;br /&gt;
&lt;br /&gt;
[[Media:GenMAPP Builder 12 14 2015 Number 2.zip | Gene Database .gdb]]&lt;br /&gt;
&lt;br /&gt;
[[Media:ReadMe Sf-Std External 20151214.pdf | ReadMe]]&lt;br /&gt;
&lt;br /&gt;
[[Media:ShigellaGeneDatabaseSchema.pdf | Gene Database Schema]]&lt;br /&gt;
&lt;br /&gt;
[[Media:Gene Database Testing Report for Shigella flexneri 2a str 301.pdf | Gene Database Testing Report (.pdf)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:FINAL CompiledRawData RXRP EYKZ20151210.xlsx | Compiled Raw Microarray Dataset (.xlsx)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:FINAL CompiledRawData RXRP EYKZ20151210.txt | Data Used for Import into GenMAPP (.txt)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:FINAL CompiledRawData RXRP EYKZ20151210 (1).gex | GenMAPP Expression Dataset File (.gex)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:FINAL CompiledRawData RXRP EYKZ20151210.EX.txt | Exceptions file (.EX.txt)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:Criterion.GOfiles.zip | Raw MAPPFinder results files (-GO.txt)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015126.gmf | .gmf file]]&lt;br /&gt;
&lt;br /&gt;
[[Media:Filtered MAPPFinder Results.xlsx | Filtered MAPPFinder Results .xlsx]]&lt;br /&gt;
&lt;br /&gt;
[[Media:MAPPFinderResults.zip | Filtered MAPPFinder Results (common GO terms highlighted) .png]]&lt;br /&gt;
&lt;br /&gt;
[[Media:RPRX MAPPs.zip | .zip of .mapp s of relevant genes]]&lt;br /&gt;
&lt;br /&gt;
[[Media:OTSDeliverables.docx | Final Group Report .docx]]&lt;br /&gt;
&lt;br /&gt;
[[Media:FinalOTSPresentation.pptx | Final PowerPoint Presentation]]&lt;br /&gt;
&lt;br /&gt;
==Individual Reflections==&lt;br /&gt;
&lt;br /&gt;
[[Kzebrows Individual Reflection | Kristin Zebrowski]]&lt;br /&gt;
&lt;br /&gt;
[[Eyanosch Individual Reflection | Erich Yanoschik]]&lt;br /&gt;
&lt;br /&gt;
[[Jwoodlee Individual Reflection | Jake Woodlee]]&lt;br /&gt;
&lt;br /&gt;
[[Media:Final Project Reflection OTS TR 20151218.docx | Trixie Roque]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
==OTS Files==&lt;br /&gt;
&lt;br /&gt;
[[Media:Micro Array Shigella Flexneri 20151011.pdf | Shigella Flexneri Microarray Paper (PDF)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:Shigellamicroarray.pptx | Microarray Journal Club Power Point]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ20152211.xlsx | Microarray Compiled Raw Data RP/RX IDLR]]&lt;br /&gt;
&lt;br /&gt;
[[Media:SamplesFilesCorrespondanceTable SF301a EYKZ201522111.xls | Microarray Corresponding Files Table]]&lt;br /&gt;
&lt;br /&gt;
[[Media: GMBuilder Shigella flexneri.zip]]&lt;br /&gt;
&lt;br /&gt;
[[Media: QA Files.zip | Download  QA files]]&lt;br /&gt;
&lt;br /&gt;
[[Media:GMBuilder December7 2015 build 2.zip]]&lt;br /&gt;
&lt;br /&gt;
[[Media:GenMAPP Builder 12 14 2015.zip]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==GenMAPP User Files==&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015121.xlsx | ScalingCentering file 12/1 .xlsx]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Media:Flagellum Ribosomal Mapp 1 60min 20151012.jpg | RP vs RX 1 MIC @ 60 minutes MAPP 12/12 .jpg]]&lt;br /&gt;
&lt;br /&gt;
[[Media:Flagellum Ribosomal Mapp 0pt5 10min 20151012.jpg | RP vs RX 0.5 MIC @ 10 minutes MAPP 12/12 .jpg]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
====RP (Erich)====&lt;br /&gt;
[[Media:CompiledRaw data GenMAPP Final RP IDLR EYKZ2015126.xlsx | RP Compiled Raw Data Final 12/10]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data GenMAPP ready RP IDLR EYKZ2015126.txt | RP .txt format GenMAPP ready 12/6]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data Errors RP EYKZ2015126.EX.xlsx | RP Exceptions file in Excel format (filtered)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data Errors RP EYKZ2015126.EX.txt | RP Exceptions (txt)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data GenMAPP ready RP IDLR EYKZ2015126.gex | RP .gex file]]&lt;br /&gt;
&lt;br /&gt;
====RX (Kristin)====&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015126.xlsx | RX Compiled Raw Data 12/6]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data statistics BonferroniPvalue RP IDLR EYKZ2015126.txt | RX .txt format GenMAPP ready 12/6]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015126.xlsx | RX Compiled Raw Data as of 12/6]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015126.txt | RX .txt format updated as of 12/6]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015126.EX.txt | RX Exceptions file]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015126.gex | RX .gex file]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RX IDLR KZ2015126.EX.xlsx | RX Exceptions file in Excel format (filtered)]]&lt;br /&gt;
&lt;br /&gt;
--&amp;gt;&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Final_Project_Reflection_OTS_TR_20151218.docx&amp;diff=8186</id>
		<title>File:Final Project Reflection OTS TR 20151218.docx</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Final_Project_Reflection_OTS_TR_20151218.docx&amp;diff=8186"/>
				<updated>2015-12-19T00:21:39Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: Uploading final stuff :)&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Uploading final stuff :)&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Gene_Database_Project_Deliverables&amp;diff=8174</id>
		<title>Gene Database Project Deliverables</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Gene_Database_Project_Deliverables&amp;diff=8174"/>
				<updated>2015-12-18T23:22:15Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: /* Individual Assessment and Reflection */ Removed accidentally added entry&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Gene Database Project Links}}&lt;br /&gt;
&lt;br /&gt;
== Group Report ==&lt;br /&gt;
&lt;br /&gt;
These guidelines are based on the [https://peerj.com/about/author-instructions/ Instructions for Authors] issued by the [https://peerj.com/computer-science/ PeerJ Computer Science] journal. We have made this choice so that, if a group report is considered to be of sufficient quality, we can pursue publication of this report in &amp;#039;&amp;#039;PeerJ Computer Science&amp;#039;&amp;#039; as smoothly as possible. If there are formatting or detail questions that are not covered here, visit the [https://peerj.com/about/author-instructions/ Instructions for Authors] and follow their guidance.&lt;br /&gt;
&lt;br /&gt;
* The report should be written with contributions from all group members.&lt;br /&gt;
* Submit as &amp;#039;&amp;#039;.doc&amp;#039;&amp;#039;, &amp;#039;&amp;#039;.docx&amp;#039;&amp;#039; or &amp;#039;&amp;#039;.pdf&amp;#039;&amp;#039; file.&lt;br /&gt;
&lt;br /&gt;
=== Style Sheet ===&lt;br /&gt;
&lt;br /&gt;
Use the following guidelines when formatting your report:&lt;br /&gt;
* 2.54 cm (1 in) margins on all sides&lt;br /&gt;
* Double-spaced&lt;br /&gt;
* 12 point Times/Times New Roman font&lt;br /&gt;
* Number the pages on the lower-right corner&lt;br /&gt;
* Use left justification (“jagged” on the right side)&lt;br /&gt;
&lt;br /&gt;
=== Cover Page ===&lt;br /&gt;
&lt;br /&gt;
Include the following information in a standalone cover page:&lt;br /&gt;
* A descriptive title for your project&lt;br /&gt;
** The function of the title is to identify the main result or take-home message of the paper.  It should be as specific as possible and name the organism.  It can be a phrase or a sentence.  What is the main result of your paper that you want to convey with the title?&lt;br /&gt;
* The names of the team members (with middle initials)&lt;br /&gt;
* The course number and title of the class&lt;br /&gt;
* The date of submission&lt;br /&gt;
&lt;br /&gt;
=== Abstract ===&lt;br /&gt;
&lt;br /&gt;
Provide an abstract of no more than 500 words.&lt;br /&gt;
&lt;br /&gt;
=== Introduction ===&lt;br /&gt;
&lt;br /&gt;
The introduction gives the background information necessary to understand your report. The introduction should be in the form of a logical argument that “funnels” from broad to narrow:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;gallery mode=&amp;quot;nolines&amp;quot; widths=322px heights=256px&amp;gt;&lt;br /&gt;
Funnel.png&lt;br /&gt;
&amp;lt;/gallery&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* States importance of the problem&lt;br /&gt;
 Why is this species important?&lt;br /&gt;
* States what is known about the problem &lt;br /&gt;
 - Give an overview of what is known about your species&amp;#039; genome from your [[Week 11|journal club outline and presentation]].&lt;br /&gt;
 - Introduce the DNA microarray experiment that was performed on your species from your [[Week 11|journal club outline and presentation]].&lt;br /&gt;
* States what is unknown about the problem&lt;br /&gt;
 You want to analyze the data with GenMAPP/MAPPFinder, but can&amp;#039;t because there is no Gene Database for your species.&lt;br /&gt;
* States clues that suggest how to approach the unknown&lt;br /&gt;
 Introduce XMLPipeDB and GenMAPP Builder as the answer to this problem.&lt;br /&gt;
* States the question the paper is trying to address&lt;br /&gt;
 In this case you want to discover new information about the microarray data using GenMAPP.&lt;br /&gt;
&lt;br /&gt;
=== Materials &amp;amp; Methods ===&lt;br /&gt;
&lt;br /&gt;
This section will summarize the entire workflow for the project.  This needs to be a &amp;#039;&amp;#039;&amp;#039;&amp;#039;&amp;#039;narrative description&amp;#039;&amp;#039;&amp;#039;&amp;#039;&amp;#039; of what your team actually did, but not a step-by-step protocol.  We are following the standards of reproducible research such that someone else with the appropriate expertise could reproduce what you did given the information in your Materials and Methods section.  You can consider your audience to be the fellow members of your class.&lt;br /&gt;
# Download the UniProt XML proteome set and GOA (GO association) files for your species.&lt;br /&gt;
#* Note the date of download and the version of the files.&lt;br /&gt;
# Download GO terms from in the OBO-XML format.&lt;br /&gt;
#* Note the date of download and the version of the files.&lt;br /&gt;
# Create the GenMAPP Builder tables in PostgreSQL.&lt;br /&gt;
# Load files into PostgreSQL database via GenMAPP Builder.&lt;br /&gt;
# Export into a GenMAPP Gene Database.&lt;br /&gt;
# Inspect/vet/validate Gene Database.&lt;br /&gt;
# Prepare microarray data (organize, normalize, perform statistical analysis)&lt;br /&gt;
# Run GenMAPP using the Gene Database.&lt;br /&gt;
#* Microarray data (import using Expression Dataset Manager)&lt;br /&gt;
#* Run MAPPFinder analysis&lt;br /&gt;
#* Place genes on MAPP and draw pathway&lt;br /&gt;
&lt;br /&gt;
=== Results ===&lt;br /&gt;
&lt;br /&gt;
This section will summarize the results of the project.  This section will include figures, tables, and a &amp;#039;&amp;#039;&amp;#039;&amp;#039;&amp;#039;narrative description&amp;#039;&amp;#039;&amp;#039;&amp;#039;&amp;#039; of the results shown in those figures and tables.  You should:&lt;br /&gt;
* Number each of the figures sequentially and number each of the tables sequentially in order from first mention in the text.  You can either embed your figures and tables in the appropriate place in the text or put them all at the end.  Do not mix both styles, however.&lt;br /&gt;
* Write a descriptive legend for each figure and table that briefly states what the figure/table is and gives a brief key to any labels and abbreviations.&lt;br /&gt;
* Gene Database Schema figure&lt;br /&gt;
* Gene Database Testing Report on final version of Gene Database (can be put at the end of the report as an Appendix)&lt;br /&gt;
* A table that summarizes how many OrderedLocusNames IDs were found&lt;br /&gt;
** by XMLPipeDB match in the UniProt XML file&lt;br /&gt;
** by TallyEngine in the UniProt XML file&lt;br /&gt;
** by TallyEngine in the PostgreSQL database&lt;br /&gt;
** in the OriginalRowCounts table in the gdb&lt;br /&gt;
** in your external model organism database source&lt;br /&gt;
* Give the command used in match to generate these results&lt;br /&gt;
* Give the query used in PGAdmin III to generate these results&lt;br /&gt;
* Include a screenshot of the TallyEngine results as a figure&lt;br /&gt;
* Report on quantity and identity of gene IDs that did not make it into the database&lt;br /&gt;
*# OrderedLocusNames IDs that were not in the XML source at all&lt;br /&gt;
*# OrderedLocusNames IDs that were in the XML source but did not get imported into Postgres&lt;br /&gt;
*# OrderedLocusNames IDs that were in Postgres but did not get exported to the GenMAPP Gene Database&lt;br /&gt;
* Report on what changes were made to the GenMAPP Builder code in order to to accommodate the second and third type of missing gene IDs and the result of those changes&lt;br /&gt;
* Report results of the DNA microarray analysis&lt;br /&gt;
** Include a table that shows the results of your &amp;quot;Sanity Check&amp;quot;, i.e., how many genes were significantly increased and decreased at different p value cut-offs in the dataset?&lt;br /&gt;
** Include the criteria you used for a significant increase and decrease in expression for your GenMAPP Expression Dataset&lt;br /&gt;
** Table of filtered MAPPFinder results (from &amp;#039;&amp;#039;.xls&amp;#039;&amp;#039; or &amp;#039;&amp;#039;.xls&amp;#039;&amp;#039;)&lt;br /&gt;
*** Show a list of 15-20 non-redundant GO terms.&lt;br /&gt;
*** Include in your table the GO ID, the name of the GO term, the number changed/number present and the percent (e.g., 10/20 (50%)), the number present/number in GO and the percent, the regular p value and adjusted p value.&lt;br /&gt;
*** Write a paragraph interpreting the GO results in light of the experiment performed in the published paper. &lt;br /&gt;
** GenMAPP MAPP of a pathway relevant to your results&lt;br /&gt;
&lt;br /&gt;
=== Discussion ===&lt;br /&gt;
&lt;br /&gt;
* How well did the GenMAPP Builder process work for your species (just comment on the technical aspects here, you will discuss the teamwork/process aspects in your individual assessment).&lt;br /&gt;
* Discuss the statistical analysis and MAPPFinder results for your microarray dataset.  Compare it to what was reported in the original paper from which you got the microarray data.  &lt;br /&gt;
** In particular, compare directly the log fold change value of a couple of key genes mentioned in the paper with what you found for those genes. &lt;br /&gt;
** Compare the criteria the journal article used for a significant expression change to the criteria that you used.  How many genes met the criterion for the article vs. how many met the criterion for your analysis.&lt;br /&gt;
&lt;br /&gt;
=== Conclusions ===&lt;br /&gt;
&lt;br /&gt;
Write a concluding paragraph that summarizes the overall project and your findings. &lt;br /&gt;
* How closely do your findings correspond to the original study? &lt;br /&gt;
* Are there significant differences? &lt;br /&gt;
* Did you discover anything new?&lt;br /&gt;
* What future directions would you take if you were to continue this project?&lt;br /&gt;
&lt;br /&gt;
=== Acknowledgments ===&lt;br /&gt;
&lt;br /&gt;
Write a short paragraph acknowledging the assistance of anyone who is not a member of your team.&lt;br /&gt;
&lt;br /&gt;
=== References ===&lt;br /&gt;
&lt;br /&gt;
* This section lists all of the references cited in the text of the report (and only those references cited in the paper).  Follow the [[Media:BIOL367_Fall2015_GuidelinesforLiteratureCitations.pdf | Guidelines for Literature Citations in a Scientific Paper]] handout for general principles.&lt;br /&gt;
* Remember that you need to cite anything for which you are not the original source.  Generally, in the introduction, you should aim for a minimum of two in-text citations per paragraph.  You may reference the course web site using the appropriate format for a web reference.&lt;br /&gt;
* List your references in alphabetical order by first author using [https://peerj.com/about/author-instructions/#reference-format PeerJ’s recommended reference format]. This format is very similar to APA style and should feel familiar if you have written research papers before.&lt;br /&gt;
* To minimize busy work, the PeerJ website includes links to downloadable style files for [https://www.zotero.org/styles/?q=peerj Zotero] and [http://endnote.com/downloads/style/peerj EndNote], if you use either system for managing and rendering references.&lt;br /&gt;
&lt;br /&gt;
== PowerPoint Presentation ==&lt;br /&gt;
&lt;br /&gt;
Each team of students will prepare and give a 20 minute PowerPoint presentation to report the results of their project on Tuesday, December 18 at 2:00-4:00 PM.  &lt;br /&gt;
* Please follow the [[Media:PresentationGuidelines.ppt | Presentation Guidelines]] for how to format your slides.&lt;br /&gt;
* You will need to prepare ~20 slides (assume 1 slide per minute of presentation) and include the following content:&lt;br /&gt;
# Background on your species and your species&amp;#039; genome from the genome paper presentation.&lt;br /&gt;
# The results of the Gene Database creation&lt;br /&gt;
#* Gene Database Schema figure&lt;br /&gt;
#* A table that summarizes how many OrderedLocusNames IDs were found&lt;br /&gt;
#** by XMLPipeDB match in the UniProt XML file&lt;br /&gt;
#** by TallyEngine in the UniProt XML file&lt;br /&gt;
#** by TallyEngine in the PostgreSQL database&lt;br /&gt;
#** in the OriginalRowCounts table in the gdb&lt;br /&gt;
#** in your external model organism database source&lt;br /&gt;
#* Give the command used in match to generate these results&lt;br /&gt;
#* Give the query used in PGAdmin III to generate these results&lt;br /&gt;
#* Include a screenshot of the TallyEngine results as a figure&lt;br /&gt;
#* Report on quantity and identity of gene IDs that did not make it into the database&lt;br /&gt;
#*# OrderedLocusNames IDs that were not in the XML source at all&lt;br /&gt;
#*# OrderedLocusNames IDs that were in the XML source but did not get imported into Postgres&lt;br /&gt;
#*# OrderedLocusNames IDs that were in Postgres but did not get exported to the GenMAPP Gene Database&lt;br /&gt;
#* Report on what changes were made to the GenMAPP Builder code in order to to accommodate the second and third type of missing gene IDs and the result of those changes&lt;br /&gt;
# Introduce the experiment performed in the microarray paper, including the experimental design flow chart&lt;br /&gt;
# Report results of the DNA microarray analysis&lt;br /&gt;
#* Include a table that shows the results of your &amp;quot;Sanity Check&amp;quot;, i.e., how many genes were significantly increased and decreased at different p value cut-offs in the dataset?&lt;br /&gt;
#* Include the criteria you used for a significant increase and decrease in expression for your GenMAPP Expression Dataset&lt;br /&gt;
#* Table of filtered MAPPFinder results (from &amp;#039;&amp;#039;.xls&amp;#039;&amp;#039; or &amp;#039;&amp;#039;.xls&amp;#039;&amp;#039;)&lt;br /&gt;
#** Show a list of 15-20 non-redundant GO terms.&lt;br /&gt;
#** Include in your table the GO ID, the name of the GO term, the number changed/number present and the percent (e.g., 10/20 (50%)), the number present/number in GO and the percent, the regular p value and adjusted p value.&lt;br /&gt;
#* GenMAPP MAPP of a pathway relevant to your results&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;&amp;#039;&amp;#039;Your PowerPoint slides must be uploaded to the wiki and linked to from your individual journal page and your team page by midnight, Tuesday, December 15.&amp;#039;&amp;#039;&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** You can update your slides before your presentation, but we will be grading the ones you upload by the deadline.&lt;br /&gt;
* Your presentation (both the slides and the oral presentation) will be evaluated by the instructors using the [[Presentation Rubric]].&lt;br /&gt;
* Your presentation will also be evaluated by your fellow classmates (anonymously) who will answer the following questions:&lt;br /&gt;
*# What is the speaker&amp;#039;s take-home message (one short sentence)?&lt;br /&gt;
*# What are the best points about the presentation&amp;#039;s organization, visuals, and delivery?  Please give at least 2 specific examples.&lt;br /&gt;
*# What points need improvement? Please give at least 2 specific examples.&lt;br /&gt;
* We expect that you will take the feedback from your previous presentation into account when doing this presentation.&lt;br /&gt;
&lt;br /&gt;
== Group Files and Datasets ==&lt;br /&gt;
&lt;br /&gt;
* GenMAPP Gene Database for assigned species (&amp;#039;&amp;#039;.gdb&amp;#039;&amp;#039;)&lt;br /&gt;
* ReadMe file to accompany the Gene Database (&amp;#039;&amp;#039;.pdf&amp;#039;&amp;#039;)&lt;br /&gt;
** Sample ReadMe in Word format:  [[Media:ReadMe_Vc-Std_External_20131122.zip | ReadMe_Vc-Std_External_20131122.zip]]&lt;br /&gt;
** [https://github.com/lmu-bioinformatics/xmlpipedb/blob/readme/GenMAPP%20Gene%20Databases/V.%20cholerae/V.%20cholerae%2020101022/ReadMe.md Sample ReadMe in markdown (a work in progress)] &lt;br /&gt;
** Include Gene Database Schema diagram in ReadMe&lt;br /&gt;
*** Sample schema in Adobe Illustrator format:  [[Media:GenMAPP_schema_generic_bacteria_20151210.zip | GenMAPP_schema_generic_bacteria_20151210.zip]] &amp;lt;!--[[Media:Vibrio_schema_20101022.zip | Vibrio_schema_20101022.zip]]--&amp;gt;&lt;br /&gt;
*** Sample schema in jpeg format: [[Media:GenMAPP_schema_generic_bacteria_20151210.jpg | GenMAPP_schema_generic_bacteria_20151210.jpg]]&lt;br /&gt;
* Gene Database Testing Report for final submitted Gene Database (print from wiki to &amp;#039;&amp;#039;.pdf&amp;#039;&amp;#039; file)&lt;br /&gt;
* Processed and analyzed DNA microarray dataset (&amp;#039;&amp;#039;.xls&amp;#039;&amp;#039; or &amp;#039;&amp;#039;.xlsx&amp;#039;&amp;#039;)&lt;br /&gt;
* Data file used for import into GenMAPP (&amp;#039;&amp;#039;.txt&amp;#039;&amp;#039; or &amp;#039;&amp;#039;.csv&amp;#039;&amp;#039;)&lt;br /&gt;
* GenMAPP Expression Dataset file (&amp;#039;&amp;#039;.gex&amp;#039;&amp;#039;)&lt;br /&gt;
* Exceptions file of data imported into GenMAPP (&amp;#039;&amp;#039;.EX.txt&amp;#039;&amp;#039;)&lt;br /&gt;
* Raw MAPPFinder results files (&amp;#039;&amp;#039;-GO.txt&amp;#039;&amp;#039;)&lt;br /&gt;
* &amp;#039;&amp;#039;.gmf&amp;#039;&amp;#039; file&lt;br /&gt;
* Filtered MAPPFinder Results (&amp;#039;&amp;#039;.xls&amp;#039;&amp;#039; or &amp;#039;&amp;#039;.xlsx&amp;#039;&amp;#039;)&lt;br /&gt;
* Sample MAPP file of a relevant biological pathway for your species (&amp;#039;&amp;#039;.mapp&amp;#039;&amp;#039;)&lt;br /&gt;
* [[Gene Database Project Deliverables#Group Report | Group Report]] describing the creation of the Gene Database and the biological analysis of the data (&amp;#039;&amp;#039;.doc&amp;#039;&amp;#039;, &amp;#039;&amp;#039;.docx&amp;#039;&amp;#039;, or &amp;#039;&amp;#039;.pdf&amp;#039;&amp;#039;)&lt;br /&gt;
* PowerPoint presentation (&amp;#039;&amp;#039;.ppt&amp;#039;&amp;#039;, &amp;#039;&amp;#039;.pptx&amp;#039;&amp;#039;, or &amp;#039;&amp;#039;.pdf&amp;#039;&amp;#039;, given on Tuesday, December 15)&lt;br /&gt;
&lt;br /&gt;
== Individual Assessment and Reflection ==&lt;br /&gt;
&lt;br /&gt;
Each person on the team will complete an assessment and reflection &amp;#039;&amp;#039;individually&amp;#039;&amp;#039;.  If you are comfortable with making this assessment publicly available, you may write it up as a wiki page or as a Word document uploaded to your group deliveables page.  If you prefer to communicate your assessment privately, then email this to both Drs. Dahlquist and Dionisio.&lt;br /&gt;
&lt;br /&gt;
=== Statement of Work ===&lt;br /&gt;
&lt;br /&gt;
* Describe exactly what you did on the project.&lt;br /&gt;
* Provide references or links to artifacts of your work, such as:&lt;br /&gt;
** Wiki pages&lt;br /&gt;
** Other files or documents&lt;br /&gt;
** Code or scripts&lt;br /&gt;
&lt;br /&gt;
=== Assessment of Project ===&lt;br /&gt;
&lt;br /&gt;
* Give an objective assessment of the success of your project workflow and teamwork.  &lt;br /&gt;
* What worked and what didn&amp;#039;t work?  &lt;br /&gt;
* What would you do differently if you could do it all over again?&lt;br /&gt;
* Evaluate the Gene Database Project and Group Report in the following areas:&lt;br /&gt;
*# Content: What is the quality of the work? &lt;br /&gt;
*# Organization: Comment on the organization of the project and of your group&amp;#039;s wiki pages.&lt;br /&gt;
*# Completeness:  Did your team achieve all of the project objectives?  Why or why not?&lt;br /&gt;
&lt;br /&gt;
=== Reflection on the Process ===&lt;br /&gt;
&lt;br /&gt;
* What did you learn?&lt;br /&gt;
** With your head (biological or computer science principles)&lt;br /&gt;
** With your heart (personal qualities and teamwork qualities that make things work or not work)?&lt;br /&gt;
** With your hands (technical skills)?&lt;br /&gt;
* What lesson will you take away from this project that you will still use a year from now?&lt;br /&gt;
&lt;br /&gt;
{{Gene Database Project Links}}&lt;br /&gt;
&lt;br /&gt;
[[Category:Group Projects]]&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Gene_Database_Project_Deliverables&amp;diff=8173</id>
		<title>Gene Database Project Deliverables</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Gene_Database_Project_Deliverables&amp;diff=8173"/>
				<updated>2015-12-18T23:21:27Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: /* Statement of Work */ Added some entry&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Gene Database Project Links}}&lt;br /&gt;
&lt;br /&gt;
== Group Report ==&lt;br /&gt;
&lt;br /&gt;
These guidelines are based on the [https://peerj.com/about/author-instructions/ Instructions for Authors] issued by the [https://peerj.com/computer-science/ PeerJ Computer Science] journal. We have made this choice so that, if a group report is considered to be of sufficient quality, we can pursue publication of this report in &amp;#039;&amp;#039;PeerJ Computer Science&amp;#039;&amp;#039; as smoothly as possible. If there are formatting or detail questions that are not covered here, visit the [https://peerj.com/about/author-instructions/ Instructions for Authors] and follow their guidance.&lt;br /&gt;
&lt;br /&gt;
* The report should be written with contributions from all group members.&lt;br /&gt;
* Submit as &amp;#039;&amp;#039;.doc&amp;#039;&amp;#039;, &amp;#039;&amp;#039;.docx&amp;#039;&amp;#039; or &amp;#039;&amp;#039;.pdf&amp;#039;&amp;#039; file.&lt;br /&gt;
&lt;br /&gt;
=== Style Sheet ===&lt;br /&gt;
&lt;br /&gt;
Use the following guidelines when formatting your report:&lt;br /&gt;
* 2.54 cm (1 in) margins on all sides&lt;br /&gt;
* Double-spaced&lt;br /&gt;
* 12 point Times/Times New Roman font&lt;br /&gt;
* Number the pages on the lower-right corner&lt;br /&gt;
* Use left justification (“jagged” on the right side)&lt;br /&gt;
&lt;br /&gt;
=== Cover Page ===&lt;br /&gt;
&lt;br /&gt;
Include the following information in a standalone cover page:&lt;br /&gt;
* A descriptive title for your project&lt;br /&gt;
** The function of the title is to identify the main result or take-home message of the paper.  It should be as specific as possible and name the organism.  It can be a phrase or a sentence.  What is the main result of your paper that you want to convey with the title?&lt;br /&gt;
* The names of the team members (with middle initials)&lt;br /&gt;
* The course number and title of the class&lt;br /&gt;
* The date of submission&lt;br /&gt;
&lt;br /&gt;
=== Abstract ===&lt;br /&gt;
&lt;br /&gt;
Provide an abstract of no more than 500 words.&lt;br /&gt;
&lt;br /&gt;
=== Introduction ===&lt;br /&gt;
&lt;br /&gt;
The introduction gives the background information necessary to understand your report. The introduction should be in the form of a logical argument that “funnels” from broad to narrow:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;gallery mode=&amp;quot;nolines&amp;quot; widths=322px heights=256px&amp;gt;&lt;br /&gt;
Funnel.png&lt;br /&gt;
&amp;lt;/gallery&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* States importance of the problem&lt;br /&gt;
 Why is this species important?&lt;br /&gt;
* States what is known about the problem &lt;br /&gt;
 - Give an overview of what is known about your species&amp;#039; genome from your [[Week 11|journal club outline and presentation]].&lt;br /&gt;
 - Introduce the DNA microarray experiment that was performed on your species from your [[Week 11|journal club outline and presentation]].&lt;br /&gt;
* States what is unknown about the problem&lt;br /&gt;
 You want to analyze the data with GenMAPP/MAPPFinder, but can&amp;#039;t because there is no Gene Database for your species.&lt;br /&gt;
* States clues that suggest how to approach the unknown&lt;br /&gt;
 Introduce XMLPipeDB and GenMAPP Builder as the answer to this problem.&lt;br /&gt;
* States the question the paper is trying to address&lt;br /&gt;
 In this case you want to discover new information about the microarray data using GenMAPP.&lt;br /&gt;
&lt;br /&gt;
=== Materials &amp;amp; Methods ===&lt;br /&gt;
&lt;br /&gt;
This section will summarize the entire workflow for the project.  This needs to be a &amp;#039;&amp;#039;&amp;#039;&amp;#039;&amp;#039;narrative description&amp;#039;&amp;#039;&amp;#039;&amp;#039;&amp;#039; of what your team actually did, but not a step-by-step protocol.  We are following the standards of reproducible research such that someone else with the appropriate expertise could reproduce what you did given the information in your Materials and Methods section.  You can consider your audience to be the fellow members of your class.&lt;br /&gt;
# Download the UniProt XML proteome set and GOA (GO association) files for your species.&lt;br /&gt;
#* Note the date of download and the version of the files.&lt;br /&gt;
# Download GO terms from in the OBO-XML format.&lt;br /&gt;
#* Note the date of download and the version of the files.&lt;br /&gt;
# Create the GenMAPP Builder tables in PostgreSQL.&lt;br /&gt;
# Load files into PostgreSQL database via GenMAPP Builder.&lt;br /&gt;
# Export into a GenMAPP Gene Database.&lt;br /&gt;
# Inspect/vet/validate Gene Database.&lt;br /&gt;
# Prepare microarray data (organize, normalize, perform statistical analysis)&lt;br /&gt;
# Run GenMAPP using the Gene Database.&lt;br /&gt;
#* Microarray data (import using Expression Dataset Manager)&lt;br /&gt;
#* Run MAPPFinder analysis&lt;br /&gt;
#* Place genes on MAPP and draw pathway&lt;br /&gt;
&lt;br /&gt;
=== Results ===&lt;br /&gt;
&lt;br /&gt;
This section will summarize the results of the project.  This section will include figures, tables, and a &amp;#039;&amp;#039;&amp;#039;&amp;#039;&amp;#039;narrative description&amp;#039;&amp;#039;&amp;#039;&amp;#039;&amp;#039; of the results shown in those figures and tables.  You should:&lt;br /&gt;
* Number each of the figures sequentially and number each of the tables sequentially in order from first mention in the text.  You can either embed your figures and tables in the appropriate place in the text or put them all at the end.  Do not mix both styles, however.&lt;br /&gt;
* Write a descriptive legend for each figure and table that briefly states what the figure/table is and gives a brief key to any labels and abbreviations.&lt;br /&gt;
* Gene Database Schema figure&lt;br /&gt;
* Gene Database Testing Report on final version of Gene Database (can be put at the end of the report as an Appendix)&lt;br /&gt;
* A table that summarizes how many OrderedLocusNames IDs were found&lt;br /&gt;
** by XMLPipeDB match in the UniProt XML file&lt;br /&gt;
** by TallyEngine in the UniProt XML file&lt;br /&gt;
** by TallyEngine in the PostgreSQL database&lt;br /&gt;
** in the OriginalRowCounts table in the gdb&lt;br /&gt;
** in your external model organism database source&lt;br /&gt;
* Give the command used in match to generate these results&lt;br /&gt;
* Give the query used in PGAdmin III to generate these results&lt;br /&gt;
* Include a screenshot of the TallyEngine results as a figure&lt;br /&gt;
* Report on quantity and identity of gene IDs that did not make it into the database&lt;br /&gt;
*# OrderedLocusNames IDs that were not in the XML source at all&lt;br /&gt;
*# OrderedLocusNames IDs that were in the XML source but did not get imported into Postgres&lt;br /&gt;
*# OrderedLocusNames IDs that were in Postgres but did not get exported to the GenMAPP Gene Database&lt;br /&gt;
* Report on what changes were made to the GenMAPP Builder code in order to to accommodate the second and third type of missing gene IDs and the result of those changes&lt;br /&gt;
* Report results of the DNA microarray analysis&lt;br /&gt;
** Include a table that shows the results of your &amp;quot;Sanity Check&amp;quot;, i.e., how many genes were significantly increased and decreased at different p value cut-offs in the dataset?&lt;br /&gt;
** Include the criteria you used for a significant increase and decrease in expression for your GenMAPP Expression Dataset&lt;br /&gt;
** Table of filtered MAPPFinder results (from &amp;#039;&amp;#039;.xls&amp;#039;&amp;#039; or &amp;#039;&amp;#039;.xls&amp;#039;&amp;#039;)&lt;br /&gt;
*** Show a list of 15-20 non-redundant GO terms.&lt;br /&gt;
*** Include in your table the GO ID, the name of the GO term, the number changed/number present and the percent (e.g., 10/20 (50%)), the number present/number in GO and the percent, the regular p value and adjusted p value.&lt;br /&gt;
*** Write a paragraph interpreting the GO results in light of the experiment performed in the published paper. &lt;br /&gt;
** GenMAPP MAPP of a pathway relevant to your results&lt;br /&gt;
&lt;br /&gt;
=== Discussion ===&lt;br /&gt;
&lt;br /&gt;
* How well did the GenMAPP Builder process work for your species (just comment on the technical aspects here, you will discuss the teamwork/process aspects in your individual assessment).&lt;br /&gt;
* Discuss the statistical analysis and MAPPFinder results for your microarray dataset.  Compare it to what was reported in the original paper from which you got the microarray data.  &lt;br /&gt;
** In particular, compare directly the log fold change value of a couple of key genes mentioned in the paper with what you found for those genes. &lt;br /&gt;
** Compare the criteria the journal article used for a significant expression change to the criteria that you used.  How many genes met the criterion for the article vs. how many met the criterion for your analysis.&lt;br /&gt;
&lt;br /&gt;
=== Conclusions ===&lt;br /&gt;
&lt;br /&gt;
Write a concluding paragraph that summarizes the overall project and your findings. &lt;br /&gt;
* How closely do your findings correspond to the original study? &lt;br /&gt;
* Are there significant differences? &lt;br /&gt;
* Did you discover anything new?&lt;br /&gt;
* What future directions would you take if you were to continue this project?&lt;br /&gt;
&lt;br /&gt;
=== Acknowledgments ===&lt;br /&gt;
&lt;br /&gt;
Write a short paragraph acknowledging the assistance of anyone who is not a member of your team.&lt;br /&gt;
&lt;br /&gt;
=== References ===&lt;br /&gt;
&lt;br /&gt;
* This section lists all of the references cited in the text of the report (and only those references cited in the paper).  Follow the [[Media:BIOL367_Fall2015_GuidelinesforLiteratureCitations.pdf | Guidelines for Literature Citations in a Scientific Paper]] handout for general principles.&lt;br /&gt;
* Remember that you need to cite anything for which you are not the original source.  Generally, in the introduction, you should aim for a minimum of two in-text citations per paragraph.  You may reference the course web site using the appropriate format for a web reference.&lt;br /&gt;
* List your references in alphabetical order by first author using [https://peerj.com/about/author-instructions/#reference-format PeerJ’s recommended reference format]. This format is very similar to APA style and should feel familiar if you have written research papers before.&lt;br /&gt;
* To minimize busy work, the PeerJ website includes links to downloadable style files for [https://www.zotero.org/styles/?q=peerj Zotero] and [http://endnote.com/downloads/style/peerj EndNote], if you use either system for managing and rendering references.&lt;br /&gt;
&lt;br /&gt;
== PowerPoint Presentation ==&lt;br /&gt;
&lt;br /&gt;
Each team of students will prepare and give a 20 minute PowerPoint presentation to report the results of their project on Tuesday, December 18 at 2:00-4:00 PM.  &lt;br /&gt;
* Please follow the [[Media:PresentationGuidelines.ppt | Presentation Guidelines]] for how to format your slides.&lt;br /&gt;
* You will need to prepare ~20 slides (assume 1 slide per minute of presentation) and include the following content:&lt;br /&gt;
# Background on your species and your species&amp;#039; genome from the genome paper presentation.&lt;br /&gt;
# The results of the Gene Database creation&lt;br /&gt;
#* Gene Database Schema figure&lt;br /&gt;
#* A table that summarizes how many OrderedLocusNames IDs were found&lt;br /&gt;
#** by XMLPipeDB match in the UniProt XML file&lt;br /&gt;
#** by TallyEngine in the UniProt XML file&lt;br /&gt;
#** by TallyEngine in the PostgreSQL database&lt;br /&gt;
#** in the OriginalRowCounts table in the gdb&lt;br /&gt;
#** in your external model organism database source&lt;br /&gt;
#* Give the command used in match to generate these results&lt;br /&gt;
#* Give the query used in PGAdmin III to generate these results&lt;br /&gt;
#* Include a screenshot of the TallyEngine results as a figure&lt;br /&gt;
#* Report on quantity and identity of gene IDs that did not make it into the database&lt;br /&gt;
#*# OrderedLocusNames IDs that were not in the XML source at all&lt;br /&gt;
#*# OrderedLocusNames IDs that were in the XML source but did not get imported into Postgres&lt;br /&gt;
#*# OrderedLocusNames IDs that were in Postgres but did not get exported to the GenMAPP Gene Database&lt;br /&gt;
#* Report on what changes were made to the GenMAPP Builder code in order to to accommodate the second and third type of missing gene IDs and the result of those changes&lt;br /&gt;
# Introduce the experiment performed in the microarray paper, including the experimental design flow chart&lt;br /&gt;
# Report results of the DNA microarray analysis&lt;br /&gt;
#* Include a table that shows the results of your &amp;quot;Sanity Check&amp;quot;, i.e., how many genes were significantly increased and decreased at different p value cut-offs in the dataset?&lt;br /&gt;
#* Include the criteria you used for a significant increase and decrease in expression for your GenMAPP Expression Dataset&lt;br /&gt;
#* Table of filtered MAPPFinder results (from &amp;#039;&amp;#039;.xls&amp;#039;&amp;#039; or &amp;#039;&amp;#039;.xls&amp;#039;&amp;#039;)&lt;br /&gt;
#** Show a list of 15-20 non-redundant GO terms.&lt;br /&gt;
#** Include in your table the GO ID, the name of the GO term, the number changed/number present and the percent (e.g., 10/20 (50%)), the number present/number in GO and the percent, the regular p value and adjusted p value.&lt;br /&gt;
#* GenMAPP MAPP of a pathway relevant to your results&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;&amp;#039;&amp;#039;Your PowerPoint slides must be uploaded to the wiki and linked to from your individual journal page and your team page by midnight, Tuesday, December 15.&amp;#039;&amp;#039;&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** You can update your slides before your presentation, but we will be grading the ones you upload by the deadline.&lt;br /&gt;
* Your presentation (both the slides and the oral presentation) will be evaluated by the instructors using the [[Presentation Rubric]].&lt;br /&gt;
* Your presentation will also be evaluated by your fellow classmates (anonymously) who will answer the following questions:&lt;br /&gt;
*# What is the speaker&amp;#039;s take-home message (one short sentence)?&lt;br /&gt;
*# What are the best points about the presentation&amp;#039;s organization, visuals, and delivery?  Please give at least 2 specific examples.&lt;br /&gt;
*# What points need improvement? Please give at least 2 specific examples.&lt;br /&gt;
* We expect that you will take the feedback from your previous presentation into account when doing this presentation.&lt;br /&gt;
&lt;br /&gt;
== Group Files and Datasets ==&lt;br /&gt;
&lt;br /&gt;
* GenMAPP Gene Database for assigned species (&amp;#039;&amp;#039;.gdb&amp;#039;&amp;#039;)&lt;br /&gt;
* ReadMe file to accompany the Gene Database (&amp;#039;&amp;#039;.pdf&amp;#039;&amp;#039;)&lt;br /&gt;
** Sample ReadMe in Word format:  [[Media:ReadMe_Vc-Std_External_20131122.zip | ReadMe_Vc-Std_External_20131122.zip]]&lt;br /&gt;
** [https://github.com/lmu-bioinformatics/xmlpipedb/blob/readme/GenMAPP%20Gene%20Databases/V.%20cholerae/V.%20cholerae%2020101022/ReadMe.md Sample ReadMe in markdown (a work in progress)] &lt;br /&gt;
** Include Gene Database Schema diagram in ReadMe&lt;br /&gt;
*** Sample schema in Adobe Illustrator format:  [[Media:GenMAPP_schema_generic_bacteria_20151210.zip | GenMAPP_schema_generic_bacteria_20151210.zip]] &amp;lt;!--[[Media:Vibrio_schema_20101022.zip | Vibrio_schema_20101022.zip]]--&amp;gt;&lt;br /&gt;
*** Sample schema in jpeg format: [[Media:GenMAPP_schema_generic_bacteria_20151210.jpg | GenMAPP_schema_generic_bacteria_20151210.jpg]]&lt;br /&gt;
* Gene Database Testing Report for final submitted Gene Database (print from wiki to &amp;#039;&amp;#039;.pdf&amp;#039;&amp;#039; file)&lt;br /&gt;
* Processed and analyzed DNA microarray dataset (&amp;#039;&amp;#039;.xls&amp;#039;&amp;#039; or &amp;#039;&amp;#039;.xlsx&amp;#039;&amp;#039;)&lt;br /&gt;
* Data file used for import into GenMAPP (&amp;#039;&amp;#039;.txt&amp;#039;&amp;#039; or &amp;#039;&amp;#039;.csv&amp;#039;&amp;#039;)&lt;br /&gt;
* GenMAPP Expression Dataset file (&amp;#039;&amp;#039;.gex&amp;#039;&amp;#039;)&lt;br /&gt;
* Exceptions file of data imported into GenMAPP (&amp;#039;&amp;#039;.EX.txt&amp;#039;&amp;#039;)&lt;br /&gt;
* Raw MAPPFinder results files (&amp;#039;&amp;#039;-GO.txt&amp;#039;&amp;#039;)&lt;br /&gt;
* &amp;#039;&amp;#039;.gmf&amp;#039;&amp;#039; file&lt;br /&gt;
* Filtered MAPPFinder Results (&amp;#039;&amp;#039;.xls&amp;#039;&amp;#039; or &amp;#039;&amp;#039;.xlsx&amp;#039;&amp;#039;)&lt;br /&gt;
* Sample MAPP file of a relevant biological pathway for your species (&amp;#039;&amp;#039;.mapp&amp;#039;&amp;#039;)&lt;br /&gt;
* [[Gene Database Project Deliverables#Group Report | Group Report]] describing the creation of the Gene Database and the biological analysis of the data (&amp;#039;&amp;#039;.doc&amp;#039;&amp;#039;, &amp;#039;&amp;#039;.docx&amp;#039;&amp;#039;, or &amp;#039;&amp;#039;.pdf&amp;#039;&amp;#039;)&lt;br /&gt;
* PowerPoint presentation (&amp;#039;&amp;#039;.ppt&amp;#039;&amp;#039;, &amp;#039;&amp;#039;.pptx&amp;#039;&amp;#039;, or &amp;#039;&amp;#039;.pdf&amp;#039;&amp;#039;, given on Tuesday, December 15)&lt;br /&gt;
&lt;br /&gt;
== Individual Assessment and Reflection ==&lt;br /&gt;
&lt;br /&gt;
Each person on the team will complete an assessment and reflection &amp;#039;&amp;#039;individually&amp;#039;&amp;#039;.  If you are comfortable with making this assessment publicly available, you may write it up as a wiki page or as a Word document uploaded to your group deliveables page.  If you prefer to communicate your assessment privately, then email this to both Drs. Dahlquist and Dionisio.&lt;br /&gt;
&lt;br /&gt;
=== Statement of Work ===&lt;br /&gt;
&lt;br /&gt;
* Describe exactly what you did on the project.&lt;br /&gt;
** As the quality assurance member of the group, I was responsible for identifying the valid IDs that needed to be exported in the customized gene database for &amp;#039;&amp;#039;Shigella flexneri&amp;#039;&amp;#039;. I was tasked with finding IDs that exist within the UniProt XML file that were not exported into the .gdb file that our GenMAPP Users would use on their end. I aided the Coder on what to put on the customized species profile and also tested each build that he made to ensure that we actually captured the IDs we need and that we did not break our existing builds. I would record the results of the builds into our Gene Database Testing Report.&lt;br /&gt;
* Provide references or links to artifacts of your work, such as:&lt;br /&gt;
**&lt;br /&gt;
** Wiki pages&lt;br /&gt;
** Other files or documents&lt;br /&gt;
** Code or scripts&lt;br /&gt;
&lt;br /&gt;
=== Assessment of Project ===&lt;br /&gt;
&lt;br /&gt;
* Give an objective assessment of the success of your project workflow and teamwork.  &lt;br /&gt;
* What worked and what didn&amp;#039;t work?  &lt;br /&gt;
* What would you do differently if you could do it all over again?&lt;br /&gt;
* Evaluate the Gene Database Project and Group Report in the following areas:&lt;br /&gt;
*# Content: What is the quality of the work? &lt;br /&gt;
*# Organization: Comment on the organization of the project and of your group&amp;#039;s wiki pages.&lt;br /&gt;
*# Completeness:  Did your team achieve all of the project objectives?  Why or why not?&lt;br /&gt;
&lt;br /&gt;
=== Reflection on the Process ===&lt;br /&gt;
&lt;br /&gt;
* What did you learn?&lt;br /&gt;
** With your head (biological or computer science principles)&lt;br /&gt;
** With your heart (personal qualities and teamwork qualities that make things work or not work)?&lt;br /&gt;
** With your hands (technical skills)?&lt;br /&gt;
* What lesson will you take away from this project that you will still use a year from now?&lt;br /&gt;
&lt;br /&gt;
{{Gene Database Project Links}}&lt;br /&gt;
&lt;br /&gt;
[[Category:Group Projects]]&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=OTS_Deliverables&amp;diff=8092</id>
		<title>OTS Deliverables</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=OTS_Deliverables&amp;diff=8092"/>
				<updated>2015-12-18T20:26:57Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: /* OTS Group Files and Datasets */ Linking gene database report&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Template:Oregon Trail Survivors}}&lt;br /&gt;
==OTS Group Files and Datasets==&lt;br /&gt;
&lt;br /&gt;
[[Media:GenMAPP Builder 12 14 2015 Number 2.zip | Gene Database .gdb]]&lt;br /&gt;
&lt;br /&gt;
[[Media:ReadMe Sf-Std External 20151214.pdf | ReadMe]]&lt;br /&gt;
&lt;br /&gt;
[[Media:ShigellaGeneDatabaseSchema.pdf | Gene Database Schema]]&lt;br /&gt;
&lt;br /&gt;
[[Media:Gene Database Testing Report for Shigella flexneri 2a str 301.pdf | Gene Database Testing Report (.pdf)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:FINAL CompiledRawData RXRP EYKZ20151210.xlsx | Compiled Raw Microarray Dataset (.xlsx)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:FINAL CompiledRawData RXRP EYKZ20151210.txt | Data Used for Import into GenMAPP (.txt)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:FINAL CompiledRawData RXRP EYKZ20151210 (1).gex | GenMAPP Expression Dataset File (.gex)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:FINAL CompiledRawData RXRP EYKZ20151210.EX.txt | Exceptions file (.EX.txt)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:Criterion.GOfiles.zip | Raw MAPPFinder results files (-GO.txt)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015126.gmf | .gmf file]]&lt;br /&gt;
&lt;br /&gt;
[[Media:Filtered MAPPFinder Results.xlsx | Filtered MAPPFinder Results .xlsx]]&lt;br /&gt;
&lt;br /&gt;
[[Media:MAPPFinderResults.zip | Filtered MAPPFinder Results (common GO terms highlighted) .png]]&lt;br /&gt;
&lt;br /&gt;
[[Media:RPRX MAPPs.zip | .zip of .mapp s of relevant genes]]&lt;br /&gt;
&lt;br /&gt;
Group Report (.doc or .pdf)&lt;br /&gt;
&lt;br /&gt;
[[Media:FinalOTSPresentation.pptx | Final PowerPoint Presentation]]&lt;br /&gt;
&lt;br /&gt;
==Individual Reflections==&lt;br /&gt;
&lt;br /&gt;
[[Kzebrows Individual Reflection | Kristin Zebrowski]]&lt;br /&gt;
&lt;br /&gt;
[[Eyanosch Individual Reflection | Erich Yanoschik]]&lt;br /&gt;
&lt;br /&gt;
[[Jwoodlee Individual Reflection | Jake Woodlee]]&lt;br /&gt;
&lt;br /&gt;
[[Troque Individual Reflection | Trixie Roque]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
==OTS Files==&lt;br /&gt;
&lt;br /&gt;
[[Media:Micro Array Shigella Flexneri 20151011.pdf | Shigella Flexneri Microarray Paper (PDF)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:Shigellamicroarray.pptx | Microarray Journal Club Power Point]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ20152211.xlsx | Microarray Compiled Raw Data RP/RX IDLR]]&lt;br /&gt;
&lt;br /&gt;
[[Media:SamplesFilesCorrespondanceTable SF301a EYKZ201522111.xls | Microarray Corresponding Files Table]]&lt;br /&gt;
&lt;br /&gt;
[[Media: GMBuilder Shigella flexneri.zip]]&lt;br /&gt;
&lt;br /&gt;
[[Media: QA Files.zip | Download  QA files]]&lt;br /&gt;
&lt;br /&gt;
[[Media:GMBuilder December7 2015 build 2.zip]]&lt;br /&gt;
&lt;br /&gt;
[[Media:GenMAPP Builder 12 14 2015.zip]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==GenMAPP User Files==&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015121.xlsx | ScalingCentering file 12/1 .xlsx]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Media:Flagellum Ribosomal Mapp 1 60min 20151012.jpg | RP vs RX 1 MIC @ 60 minutes MAPP 12/12 .jpg]]&lt;br /&gt;
&lt;br /&gt;
[[Media:Flagellum Ribosomal Mapp 0pt5 10min 20151012.jpg | RP vs RX 0.5 MIC @ 10 minutes MAPP 12/12 .jpg]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
====RP (Erich)====&lt;br /&gt;
[[Media:CompiledRaw data GenMAPP Final RP IDLR EYKZ2015126.xlsx | RP Compiled Raw Data Final 12/10]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data GenMAPP ready RP IDLR EYKZ2015126.txt | RP .txt format GenMAPP ready 12/6]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data Errors RP EYKZ2015126.EX.xlsx | RP Exceptions file in Excel format (filtered)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data Errors RP EYKZ2015126.EX.txt | RP Exceptions (txt)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data GenMAPP ready RP IDLR EYKZ2015126.gex | RP .gex file]]&lt;br /&gt;
&lt;br /&gt;
====RX (Kristin)====&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015126.xlsx | RX Compiled Raw Data 12/6]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data statistics BonferroniPvalue RP IDLR EYKZ2015126.txt | RX .txt format GenMAPP ready 12/6]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015126.xlsx | RX Compiled Raw Data as of 12/6]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015126.txt | RX .txt format updated as of 12/6]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015126.EX.txt | RX Exceptions file]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015126.gex | RX .gex file]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RX IDLR KZ2015126.EX.xlsx | RX Exceptions file in Excel format (filtered)]]&lt;br /&gt;
&lt;br /&gt;
--&amp;gt;&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Gene_Database_Testing_Report_for_Shigella_flexneri_2a_str_301.pdf&amp;diff=8091</id>
		<title>File:Gene Database Testing Report for Shigella flexneri 2a str 301.pdf</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Gene_Database_Testing_Report_for_Shigella_flexneri_2a_str_301.pdf&amp;diff=8091"/>
				<updated>2015-12-18T20:26:20Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: Uploading gene database report&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Uploading gene database report&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Troque_Individual_Reflection&amp;diff=8080</id>
		<title>Troque Individual Reflection</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Troque_Individual_Reflection&amp;diff=8080"/>
				<updated>2015-12-18T19:40:51Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: Creating this page&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Statement of Work ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Assessment of Project == &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Reflection on the Project ==&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=OTS_Deliverables&amp;diff=8079</id>
		<title>OTS Deliverables</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=OTS_Deliverables&amp;diff=8079"/>
				<updated>2015-12-18T19:39:25Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: /* Individual Reflections */ Linked Trixie Roque individual reflection&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Template:Oregon Trail Survivors}}&lt;br /&gt;
==OTS Group Files and Datasets==&lt;br /&gt;
&lt;br /&gt;
[[Media:GenMAPP Builder 12 14 2015 Number 2.zip | Gene Database .gdb]]&lt;br /&gt;
&lt;br /&gt;
[[Media:ReadMe Sf-Std External 20151214.pdf | ReadMe]]&lt;br /&gt;
&lt;br /&gt;
[[Media:ShigellaGeneDatabaseSchema.pdf | Gene Database Schema]]&lt;br /&gt;
&lt;br /&gt;
Gene Database Testing Report (.pdf)&lt;br /&gt;
&lt;br /&gt;
[[Media:FINAL CompiledRawData RXRP EYKZ20151210.xlsx | Compiled Raw Microarray Dataset (.xlsx)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:FINAL CompiledRawData RXRP EYKZ20151210.txt | Data Used for Import into GenMAPP (.txt)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:FINAL CompiledRawData RXRP EYKZ20151210 (1).gex | GenMAPP Expression Dataset File (.gex)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:FINAL CompiledRawData RXRP EYKZ20151210.EX.txt | Exceptions file (.EX.txt)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:Criterion.GOfiles.zip | Raw MAPPFinder results files (-GO.txt)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015126.gmf | .gmf file]]&lt;br /&gt;
&lt;br /&gt;
[[Media:Filtered MAPPFinder Results.xlsx | Filtered MAPPFinder Results .xlsx]]&lt;br /&gt;
&lt;br /&gt;
[[Media:MAPPFinderResults.zip | Filtered MAPPFinder Results (common GO terms highlighted) .png]]&lt;br /&gt;
&lt;br /&gt;
[[Media:RPRX MAPPs.zip | .zip of .mapp s of relevant genes]]&lt;br /&gt;
&lt;br /&gt;
Group Report (.doc or .pdf)&lt;br /&gt;
&lt;br /&gt;
[[Media:FinalOTSPresentation.pptx | Final PowerPoint Presentation]]&lt;br /&gt;
&lt;br /&gt;
==Individual Reflections==&lt;br /&gt;
&lt;br /&gt;
[[Kzebrows Individual Reflection | Kristin Zebrowski]]&lt;br /&gt;
&lt;br /&gt;
[[Eyanosch Individual Reflection | Erich Yanoschik]]&lt;br /&gt;
&lt;br /&gt;
[[Jwoodlee Individual Reflection | Jake Woodlee]]&lt;br /&gt;
&lt;br /&gt;
[[Troque Individual Reflection | Trixie Roque]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
==OTS Files==&lt;br /&gt;
&lt;br /&gt;
[[Media:Micro Array Shigella Flexneri 20151011.pdf | Shigella Flexneri Microarray Paper (PDF)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:Shigellamicroarray.pptx | Microarray Journal Club Power Point]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ20152211.xlsx | Microarray Compiled Raw Data RP/RX IDLR]]&lt;br /&gt;
&lt;br /&gt;
[[Media:SamplesFilesCorrespondanceTable SF301a EYKZ201522111.xls | Microarray Corresponding Files Table]]&lt;br /&gt;
&lt;br /&gt;
[[Media: GMBuilder Shigella flexneri.zip]]&lt;br /&gt;
&lt;br /&gt;
[[Media: QA Files.zip | Download  QA files]]&lt;br /&gt;
&lt;br /&gt;
[[Media:GMBuilder December7 2015 build 2.zip]]&lt;br /&gt;
&lt;br /&gt;
[[Media:GenMAPP Builder 12 14 2015.zip]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==GenMAPP User Files==&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015121.xlsx | ScalingCentering file 12/1 .xlsx]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Media:Flagellum Ribosomal Mapp 1 60min 20151012.jpg | RP vs RX 1 MIC @ 60 minutes MAPP 12/12 .jpg]]&lt;br /&gt;
&lt;br /&gt;
[[Media:Flagellum Ribosomal Mapp 0pt5 10min 20151012.jpg | RP vs RX 0.5 MIC @ 10 minutes MAPP 12/12 .jpg]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
====RP (Erich)====&lt;br /&gt;
[[Media:CompiledRaw data GenMAPP Final RP IDLR EYKZ2015126.xlsx | RP Compiled Raw Data Final 12/10]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data GenMAPP ready RP IDLR EYKZ2015126.txt | RP .txt format GenMAPP ready 12/6]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data Errors RP EYKZ2015126.EX.xlsx | RP Exceptions file in Excel format (filtered)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data Errors RP EYKZ2015126.EX.txt | RP Exceptions (txt)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data GenMAPP ready RP IDLR EYKZ2015126.gex | RP .gex file]]&lt;br /&gt;
&lt;br /&gt;
====RX (Kristin)====&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015126.xlsx | RX Compiled Raw Data 12/6]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data statistics BonferroniPvalue RP IDLR EYKZ2015126.txt | RX .txt format GenMAPP ready 12/6]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015126.xlsx | RX Compiled Raw Data as of 12/6]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015126.txt | RX .txt format updated as of 12/6]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015126.EX.txt | RX Exceptions file]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015126.gex | RX .gex file]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RX IDLR KZ2015126.EX.xlsx | RX Exceptions file in Excel format (filtered)]]&lt;br /&gt;
&lt;br /&gt;
--&amp;gt;&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:QA_files_OTS_20151216.zip&amp;diff=7987</id>
		<title>File:QA files OTS 20151216.zip</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:QA_files_OTS_20151216.zip&amp;diff=7987"/>
				<updated>2015-12-16T21:06:25Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: Uploading new QA files&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Uploading new QA files&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=OTS_Deliverables&amp;diff=7972</id>
		<title>OTS Deliverables</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=OTS_Deliverables&amp;diff=7972"/>
				<updated>2015-12-16T20:43:41Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: /* OTS Group Files and Datasets */ Linking ReadMe&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==OTS Group Files and Datasets==&lt;br /&gt;
&lt;br /&gt;
[[Media:GenMAPP Builder 12 14 2015 Number 2.zip | Gene Database .gdb]]&lt;br /&gt;
&lt;br /&gt;
[[Media:ReadMe Sf-Std External 20151214.pdf | ReadMe]]&lt;br /&gt;
&lt;br /&gt;
Gene Database Schema&lt;br /&gt;
&lt;br /&gt;
Gene Database Testing Report (.pdf)&lt;br /&gt;
&lt;br /&gt;
[[Media:FINAL CompiledRawData RXRP EYKZ20151210.xlsx | Final Compiled Raw Data .xlsx]]&lt;br /&gt;
&lt;br /&gt;
[[Media:FINAL CompiledRawData RXRP EYKZ20151210.txt | Final Compiled Raw Data .txt]]&lt;br /&gt;
&lt;br /&gt;
[[Media:FINAL CompiledRawData RXRP EYKZ20151210 (1).gex | Final Compiled Raw Data .gex]]&lt;br /&gt;
&lt;br /&gt;
==OTS Files==&lt;br /&gt;
&lt;br /&gt;
[[Media:Micro Array Shigella Flexneri 20151011.pdf | Shigella Flexneri Microarray Paper (PDF)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:Shigellamicroarray.pptx | Microarray Journal Club Power Point]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ20152211.xlsx | Microarray Compiled Raw Data RP/RX IDLR]]&lt;br /&gt;
&lt;br /&gt;
[[Media:SamplesFilesCorrespondanceTable SF301a EYKZ201522111.xls | Microarray Corresponding Files Table]]&lt;br /&gt;
&lt;br /&gt;
[[Media: GMBuilder Shigella flexneri.zip]]&lt;br /&gt;
&lt;br /&gt;
[[Media: QA Files.zip | Download  QA files]]&lt;br /&gt;
&lt;br /&gt;
[[Media:GMBuilder December7 2015 build 2.zip]]&lt;br /&gt;
&lt;br /&gt;
[[Media:GenMAPP Builder 12 14 2015.zip]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Media:FinalOTSPresentation.pptx | &amp;#039;&amp;#039;&amp;#039;Final PowerPoint Presentation&amp;#039;&amp;#039;&amp;#039;]]&lt;br /&gt;
&lt;br /&gt;
==GenMAPP User Files==&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015121.xlsx | ScalingCentering file 12/1 .xlsx]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015126.gmf | Compiled Raw Data 12/8 .gmf]]&lt;br /&gt;
&lt;br /&gt;
[[Media:Filtered MAPPFinder Results.xlsx | Filtered MAPPFinder Results .xlsx]]&lt;br /&gt;
&lt;br /&gt;
[[Media:MAPPFinderResults.zip | Filtered MAPPFinder Results (common GO terms highlighted) .png]]&lt;br /&gt;
&lt;br /&gt;
[[Media:Flagellum Ribosomal Mapp 1 60min 20151012.jpg | RP vs RX 1 MIC @ 60 minutes MAPP 12/12 .jpg]]&lt;br /&gt;
&lt;br /&gt;
[[Media:Flagellum Ribosomal Mapp 0pt5 10min 20151012.jpg | RP vs RX 0.5 MIC @ 10 minutes MAPP 12/12 .jpg]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
====RP (Erich)====&lt;br /&gt;
[[Media:CompiledRaw data GenMAPP Final RP IDLR EYKZ2015126.xlsx | RP Compiled Raw Data Final 12/10]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data GenMAPP ready RP IDLR EYKZ2015126.txt | RP .txt format GenMAPP ready 12/6]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data Errors RP EYKZ2015126.EX.xlsx | RP Exceptions file in Excel format (filtered)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data Errors RP EYKZ2015126.EX.txt | RP Exceptions (txt)]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data GenMAPP ready RP IDLR EYKZ2015126.gex | RP .gex file]]&lt;br /&gt;
&lt;br /&gt;
====RX (Kristin)====&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015126.xlsx | RX Compiled Raw Data 12/6]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data statistics BonferroniPvalue RP IDLR EYKZ2015126.txt | RX .txt format GenMAPP ready 12/6]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015126.xlsx | RX Compiled Raw Data as of 12/6]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015126.txt | RX .txt format updated as of 12/6]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015126.EX.txt | RX Exceptions file]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RPRX IDLR EYKZ2015126.gex | RX .gex file]]&lt;br /&gt;
&lt;br /&gt;
[[Media:CompiledRaw data RX IDLR KZ2015126.EX.xlsx | RX Exceptions file in Excel format (filtered)]]&lt;br /&gt;
&lt;br /&gt;
{{Template:Oregon Trail Survivors}}&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:ReadMe_Sf-Std_External_20151214.pdf&amp;diff=7970</id>
		<title>File:ReadMe Sf-Std External 20151214.pdf</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:ReadMe_Sf-Std_External_20151214.pdf&amp;diff=7970"/>
				<updated>2015-12-16T20:42:35Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: Uploading ReadMe&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Uploading ReadMe&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Gene_Database_Testing_Report_-_Oregon_Trail_Survivors&amp;diff=7938</id>
		<title>Gene Database Testing Report - Oregon Trail Survivors</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Gene_Database_Testing_Report_-_Oregon_Trail_Survivors&amp;diff=7938"/>
				<updated>2015-12-15T08:22:52Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: /* Export Information (final) */ linked .gdb file&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Template:Oregon Trail Survivors}}&lt;br /&gt;
&lt;br /&gt;
== Things to note ==&lt;br /&gt;
* Taxonomy ID: 623&lt;br /&gt;
* UP000001006&lt;br /&gt;
* File management system: Wiki&lt;br /&gt;
&lt;br /&gt;
== Initial (Vanilla) Export Information ==&lt;br /&gt;
Version of GenMAPP Builder:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; gmbuilder-3.0.0-build-5 &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Computer on which export was run: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; Front of the room, 3rd computer from the right. &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Postgres Database name: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Shigella flexneri 20151911&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
UniProt XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): &amp;#039;&amp;#039;&amp;#039;UniProt release 2015_11&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* UniProt XML download link: &amp;#039;&amp;#039;&amp;#039;[http://www.uniprot.org/uniprot/?query=proteome:UP000001006 Click here]&amp;#039;&amp;#039;&amp;#039; &lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;4.48 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GO OBO-XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): &amp;#039;&amp;#039;&amp;#039;Version created on 11/19/2015 (at 2:24 AM)&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GO OBO-XML download link: &amp;#039;&amp;#039;&amp;#039;[http://archive.geneontology.org/latest-termdb/go_daily-termdb.obo-xml.gz Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;7.00 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to process: &amp;#039;&amp;#039;&amp;#039;4.99 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GOA filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): &amp;#039;&amp;#039;&amp;#039;Version released on .&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GOA download link: &amp;#039;&amp;#039;&amp;#039;[http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/103.S_flexneri_301.goa Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;0.06 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
Name of .gdb file (give filename and upload and link to compressed file): &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151119 OTS.gdb | Sf-Std_20151119_OTS.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039; 1 Hours, 32 Minutes, 33 Seconds &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;4:06:13 PM PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039;5:38:46 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
== Export Information for Build with Coder Changes # 1 ==&lt;br /&gt;
=== Build 1 ===&lt;br /&gt;
Name of .gdb file: &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151201.gdb | Sf-Std_20151201.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039; &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;4:19:22 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039; 8:30:08 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** I have confirmed that the necessary information in the .gdb file exist in the new build (e.g. the URL of the database we are using).&lt;br /&gt;
&lt;br /&gt;
=== TallyEngine ===&lt;br /&gt;
&lt;br /&gt;
* Run the TallyEngine in GenMAPP Builder and record the number of records for UniProt and GO in the XML data and in the Postgres databases.&lt;br /&gt;
** Choose the menu item Tallies &amp;gt; Run XML and Database Tallies for UniProt and GO...&lt;br /&gt;
** Choose the UniProt and GO OBO XML files that was uploaded from the previous sections of this assignment.&lt;br /&gt;
** Here is the screenshot of the tally result:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot; style=&amp;quot;width: auto; margin-left: auto; margin-right: auto;&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:TallyEngine results OTS 112115.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Using XMLPipeDB match to Validate the XML Results from the TallyEngine ===&lt;br /&gt;
[[How_Do_I_Count_Thee%3F_Let_Me_Count_The_Ways | Follow the instructions found on this page to run XMLPipeDB match.]]&lt;br /&gt;
* In the Thawspace directory, I created a folder called &amp;quot;Shigella_flexneri_BioDB_2015&amp;quot; and created subfolders called &amp;quot;Source&amp;quot; and &amp;quot;Working&amp;quot; to store the source files (i.e., the compressed files) and the working files (i.e., the files I will actually be processing).&lt;br /&gt;
* As a result, I had to cd to these directories first before using the command for using Match.&lt;br /&gt;
** In order to change into the ThawSpace0\Shigella_flexneri_BioDB_2015\Working directory, use the following commands on the command prompt window:&lt;br /&gt;
 T: &amp;amp;&amp;amp; cd &amp;quot;Shigella_flexneri_BioDB_2015\Working&amp;quot;&lt;br /&gt;
* The command I used once inside the directory I want is:&lt;br /&gt;
 java -jar xmlpipedb-match-1.1.1/xmlpipedb-match-1.1.1.jar &amp;quot;SF[0-9][0-9][0-9][0-9]&amp;quot; &amp;lt; uniprot-proteome%3AUP000001006.xml&lt;br /&gt;
* The results are as follows:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Match results OTS 112115.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
These results did not match up with what the TallyEngine gave (TallyEngine: 7567 vs. Match: 4610)&lt;br /&gt;
* As a result, the commands would have to be modified somehow so that the numbers match: &amp;lt;code&amp;gt;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?(/|&amp;lt;/name&amp;gt;)&amp;lt;/code&amp;gt;&lt;br /&gt;
* The overall command to write to a text file is as follows:&lt;br /&gt;
 java -jar xmlpipedb-match-1.1.1/xmlpipedb-match-1.1.1.jar &amp;quot;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?(/|&amp;lt;/name&amp;gt;)&amp;quot; &amp;lt; uniprot-proteome%3AUP000001006.xml &amp;gt; shigella_flexneri_results.txt&lt;br /&gt;
* Then our results became:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Match results OTS 20151203 more accurate.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* Observations: &lt;br /&gt;
** In order to lessen the number of matches, we had to add the end tag &amp;quot;&amp;lt;/name&amp;gt;&amp;quot; to our regular expression. This brought down the number of matches from over 8000, to just 7517. Since TallyEngine&amp;#039;s results were 7567, this means that 150 IDs were not being caught. In order to account for this, we had to add the genes with ID&amp;#039;s of the form CP#### (there were 50 instances of these), and those with the form SF####.# or S####.#. This led us to get 7566 gene IDs. &lt;br /&gt;
** When I looked at the IDs in Microsoft Access, the IDs total 7569. In order to account for this last piece of gene formatting, we also had to account for the genes with the form SF?####/SF?####. These 2 extra genes that were not accounted for by TallyEngine is actually not supposed to be separated since the genes are formatted such that it can be interpreted that the IDs are interchangeable. When the gdb file was created, it would seem that these genes have been split down the &amp;quot;/&amp;quot;.&lt;br /&gt;
** In other words, there are 3 ordered locus names with formatting that is different from the rest: SF2223/SF2224, S2352/S2353, and S3359/S3360. &lt;br /&gt;
** I wasn&amp;#039;t able to exactly hit the number outputted by Tally Engine since there are other genes with the same format that were already caught with the patterns SF#### or S####.&lt;br /&gt;
** Note: It turns out the ShiBASE database only uses the pattern SF#### instead of S#### or CP#### so the regular expression would really have to be just &amp;lt;code&amp;gt;SF?[0-9][0-9][0-9][0-9](\.[0-9])?(/|&amp;lt;/name&amp;gt;)&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
== System IDs ==&lt;br /&gt;
* UniProt: A0A0H2[A-Z][A-Z][0-9][0-9], A0A0H2[A-Z][A-Z][A-Z][0-9], A0A0H2[A-Z][0-9][0-9][0-9], or [A-Z][0-9][A-Z][0-9][A-Z][0-9]&lt;br /&gt;
** Examples: A0A0H2USI9, A0A0H2USA4, A0A0H2V010, A5A6A8&lt;br /&gt;
* RefSeq: NP_######, WP_#########, YP_#########, YP_######&lt;br /&gt;
** Examples: NP_858405, WP_000002440, YP_001449236, YP_145811 (only one of this)&lt;br /&gt;
* GeneID (EntrezGene from NCBI):&lt;br /&gt;
* GO: #######&lt;br /&gt;
* OrderedLocusNames: CP####, S#### (or S####.#), and SF#### (or SF####.#)&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine ===&lt;br /&gt;
* The command used to count the number of IDs is:&lt;br /&gt;
 select count(*) from genenametype where type = &amp;quot;ordered locus&amp;quot; and value ~ &amp;quot;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&amp;quot;;&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Postgres results OTS 20151203.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* The result above is exactly twice as much as the number of OrderedLocusNames from TallyEngine: 15134 / 2 = 7567 IDs&lt;br /&gt;
* A quick peek at the results after the command &amp;lt;code&amp;gt;select value from genenametype where type = &amp;#039;ordered locus&amp;#039; and value ~ &amp;#039;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&amp;#039;;&amp;lt;/code&amp;gt; is used and the results are exported to Excel reveals that this is because every single entry is entered twice: &lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Postgres results excel form OTS 20151203.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* Adding the keyword &amp;quot;distinct&amp;quot; would resolve the double counting:&lt;br /&gt;
 select distinct value from genenametype where type = &amp;#039;ordered locus&amp;#039; and value ~ &amp;#039;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&lt;br /&gt;
&lt;br /&gt;
=== Analysis ===&lt;br /&gt;
* The total number of OrderedLocusNames in TallyEngine is &amp;#039;&amp;#039;&amp;#039;7567&amp;#039;&amp;#039;&amp;#039;.&lt;br /&gt;
* Using the (best I could) regular expression in Match, the result is &amp;#039;&amp;#039;&amp;#039;7573&amp;#039;&amp;#039;&amp;#039;. The additional 6 IDs emerged since those are originally captured by the regular expression &amp;lt;code&amp;gt;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&amp;lt;/name&amp;gt;&amp;lt;/code&amp;gt; and trying to capture the IDs of the form &amp;lt;code&amp;gt;SF?####/SF?####&amp;lt;/code&amp;gt; would duplicate those captured IDs.&lt;br /&gt;
* The total of entries in PostGreSQL is &amp;#039;&amp;#039;&amp;#039;15134&amp;#039;&amp;#039;&amp;#039;, but this is only because each gene is repeated twice. As a result, diving by 2 would actually yield &amp;#039;&amp;#039;&amp;#039;7567&amp;#039;&amp;#039;&amp;#039;.&lt;br /&gt;
* Microsoft Access yielded &amp;#039;&amp;#039;&amp;#039;7569&amp;#039;&amp;#039;&amp;#039; in the OrderedLocusNames window. The extra 2 genes came from the IDs of the form &amp;lt;code&amp;gt;SF?####/SF?####&amp;lt;/code&amp;gt; since the export broke up the two IDs that represent the same ID.&lt;br /&gt;
** 49 are of the form &amp;lt;code&amp;gt;CP####&amp;lt;/code&amp;gt;&lt;br /&gt;
** 3413 are of the form &amp;lt;code&amp;gt;S####&amp;lt;/code&amp;gt;&lt;br /&gt;
*** 14 are of the form &amp;lt;code&amp;gt;S####.#&amp;lt;/code&amp;gt;&lt;br /&gt;
** 4107 are of the form &amp;lt;code&amp;gt;SF####&amp;lt;/code&amp;gt;&lt;br /&gt;
*** 35 are of the form &amp;lt;code&amp;gt;SF####.#&amp;lt;/code&amp;gt;&lt;br /&gt;
* Inspecting the UniProt XML file was necessary in identifying the IDs. Looking through what was inside, I discovered (with help from Dondi) that I had to add the end tag &amp;quot;&amp;lt;/name&amp;gt;&amp;quot; in order to narrow down the results in Match&lt;br /&gt;
&lt;br /&gt;
== &amp;quot;Export&amp;quot; from Build 2 ==&lt;br /&gt;
Name of .gdb file: [[Media:Sf-Std 20151207.gdb | Sf-Std 20151207.gdb]]&lt;br /&gt;
* Date: &amp;#039;&amp;#039;&amp;#039; 12/7/15 &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039; 4 hours, 24 minutes and 1 second &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;9:13:45 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039; 1:37:46 AM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: This export had to be redone since the PSQL database had twice as much entries.&lt;br /&gt;
&lt;br /&gt;
== Export Information (Re-imported) Build 2 ==&lt;br /&gt;
Version of GenMAPP Builder:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; gmbuilder-3.0.0-build-5 &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Computer on which export was run: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; Front of the room, 3rd computer from the right. &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Postgres Database name: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Shigella_flexneri_20151208&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
UniProt XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): &amp;#039;&amp;#039;&amp;#039;UniProt release 2015_11&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* UniProt XML download link: &amp;#039;&amp;#039;&amp;#039;[http://www.uniprot.org/uniprot/?query=proteome:UP000001006 Click here]&amp;#039;&amp;#039;&amp;#039; &lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;4.43 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GO OBO-XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): &amp;#039;&amp;#039;&amp;#039;Version created on 11/19/2015 (at 2:24 AM)&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GO OBO-XML download link: &amp;#039;&amp;#039;&amp;#039;[http://archive.geneontology.org/latest-termdb/go_daily-termdb.obo-xml.gz Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;6.84 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to process: &amp;#039;&amp;#039;&amp;#039;5.49 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GOA filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): &amp;#039;&amp;#039;&amp;#039;Version released on .&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GOA download link: &amp;#039;&amp;#039;&amp;#039;[http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/103.S_flexneri_301.goa Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;0.06 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
Name of .gdb file (give filename and upload and link to compressed file): &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151208.gdb | Sf-Std 20151208.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039;1 hour, 38 minutes, 42 seconds  &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;4:30:59 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039;6:09:41 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: The reason why I had to re-import everything into a new database is because the one I have been using had some files imported twice. Thus, the reports given by PostGres were all twice as much.&lt;br /&gt;
&lt;br /&gt;
=== Using TallyEngine ===&lt;br /&gt;
* The database used is the same one described in the section above: &amp;#039;&amp;#039;&amp;#039;Shigella_flexneri_20151208&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Notice in the image below that there is an error in the cells. It turns out that we did not even need to add the Ordered Locus since that was the default. We will definitely need to do one last build in order to fix that issue.&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Shigella flexneri tallyEngine results build 2.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Using XMLPipeDB match to Validate the XML Results from the TallyEngine ===&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:regex1_OTS.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:regex2_OTS.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* When added together, the results becomes 7566 + 3 = 7569.&lt;br /&gt;
&lt;br /&gt;
=== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine ===&lt;br /&gt;
* The following command in PostGreSQL resulted in 7567 entries:&lt;br /&gt;
 select value from genenametype where type = &amp;#039;ordered locus&amp;#039; and value ~ &amp;#039;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&amp;#039;;&lt;br /&gt;
* The following command resulted in 214 entries:&lt;br /&gt;
 select value from genenametype where type = &amp;#039;ORF&amp;#039; and value ~ &amp;#039;(CP|SF?)(_p)?[0-9][0-9][0-9][0-9](\.[0-9])?&amp;#039;;&lt;br /&gt;
&lt;br /&gt;
=== OriginalRowCounts Comparison ===&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Ms access originalrowcounts.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* The OrderedLocusNames row seems to report on the same number of IDs as our previous builds&lt;br /&gt;
&lt;br /&gt;
=== Visual Inspection ===&lt;br /&gt;
Perform visual inspection of individual tables to see if there are any problems.&lt;br /&gt;
&lt;br /&gt;
* Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?&lt;br /&gt;
** Yes, there are dates present for GeneOntology, InterPro, GeneID, RefSeq, UniProt, EMBL, PDB, Pfam, OrderedLocusNames, and EnsemblBacteria.&lt;br /&gt;
* Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?&lt;br /&gt;
** Yes, all of them seem to follow the same format (there ares more or less, 3 variations on the IDs for each of the tables).&lt;br /&gt;
&lt;br /&gt;
=== Excel Inspection ===&lt;br /&gt;
* [[Media:In-search-of-the-missing-ids.xlsx| Excel file]]&lt;br /&gt;
&lt;br /&gt;
=== Observations ===&lt;br /&gt;
* Through the use of an XML-reader program, called &amp;quot;firstObject XML Editor&amp;quot;, it was discovered that some ordered locus IDs that were exported by GenMAPPBuilder were placed in the same tag: &lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image: Dual ordered locus names.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* These differed from the ones originally captured (7567) since these existed separately in each of the gene/name tags:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image: Simple ordered locus names.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* Additionally, from the IDs reported by the GenMAPP users as missing, it was revealed that these do not exist in the XML file at all, or at least in the format that we wanted. These sets of IDs were actually misnomers since, even though CTRL + F lets us find them, they are not the ordered locus names that we were looking for: &lt;br /&gt;
&lt;br /&gt;
* Example 1:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image: Match pic.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* Example 2:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image: Id misnomers2.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* However, because of these observations, we have actually discovered ~92 IDs that existed within the XML file, albeit in a different tag than what we were using:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image: Id misnomers.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* All these observations led us to make one final build to capture those ~92 gene IDs.&lt;br /&gt;
&lt;br /&gt;
== Export Information (final) ==&lt;br /&gt;
* Date: 12/14/15&lt;br /&gt;
Name of .gdb file (give filename and upload and link to compressed file): &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151208.gdb | Sf-Std 20151208.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039;2 hour, 0 minutes, 27 seconds&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;9:35:00 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039;11:35:27 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
=== Using TallyEngine ===&lt;br /&gt;
* The results are shown below:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Tally results build2.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Using Microsoft Access ===&lt;br /&gt;
* Even though the results from the TallyEngine say different numbers, the OrderedLocusNames that were exported in the .gdb file is the following: &lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Orderedlocusnames.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== .gdb File ===&lt;br /&gt;
* The resulting .gdb file can be downloaded [[Media: Sf-Std 20151214.gdb | here]].&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Sf-Std_20151214.gdb&amp;diff=7937</id>
		<title>File:Sf-Std 20151214.gdb</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Sf-Std_20151214.gdb&amp;diff=7937"/>
				<updated>2015-12-15T08:22:35Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: Uploading final build&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Uploading final build&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Orderedlocusnames.png&amp;diff=7936</id>
		<title>File:Orderedlocusnames.png</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Orderedlocusnames.png&amp;diff=7936"/>
				<updated>2015-12-15T08:19:55Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: uploading orderedlocus from ms access&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;uploading orderedlocus from ms access&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Gene_Database_Testing_Report_-_Oregon_Trail_Survivors&amp;diff=7935</id>
		<title>Gene Database Testing Report - Oregon Trail Survivors</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Gene_Database_Testing_Report_-_Oregon_Trail_Survivors&amp;diff=7935"/>
				<updated>2015-12-15T08:19:32Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: /* Export Information (final) */ added picture&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Template:Oregon Trail Survivors}}&lt;br /&gt;
&lt;br /&gt;
== Things to note ==&lt;br /&gt;
* Taxonomy ID: 623&lt;br /&gt;
* UP000001006&lt;br /&gt;
* File management system: Wiki&lt;br /&gt;
&lt;br /&gt;
== Initial (Vanilla) Export Information ==&lt;br /&gt;
Version of GenMAPP Builder:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; gmbuilder-3.0.0-build-5 &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Computer on which export was run: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; Front of the room, 3rd computer from the right. &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Postgres Database name: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Shigella flexneri 20151911&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
UniProt XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): &amp;#039;&amp;#039;&amp;#039;UniProt release 2015_11&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* UniProt XML download link: &amp;#039;&amp;#039;&amp;#039;[http://www.uniprot.org/uniprot/?query=proteome:UP000001006 Click here]&amp;#039;&amp;#039;&amp;#039; &lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;4.48 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GO OBO-XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): &amp;#039;&amp;#039;&amp;#039;Version created on 11/19/2015 (at 2:24 AM)&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GO OBO-XML download link: &amp;#039;&amp;#039;&amp;#039;[http://archive.geneontology.org/latest-termdb/go_daily-termdb.obo-xml.gz Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;7.00 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to process: &amp;#039;&amp;#039;&amp;#039;4.99 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GOA filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): &amp;#039;&amp;#039;&amp;#039;Version released on .&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GOA download link: &amp;#039;&amp;#039;&amp;#039;[http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/103.S_flexneri_301.goa Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;0.06 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
Name of .gdb file (give filename and upload and link to compressed file): &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151119 OTS.gdb | Sf-Std_20151119_OTS.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039; 1 Hours, 32 Minutes, 33 Seconds &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;4:06:13 PM PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039;5:38:46 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
== Export Information for Build with Coder Changes # 1 ==&lt;br /&gt;
=== Build 1 ===&lt;br /&gt;
Name of .gdb file: &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151201.gdb | Sf-Std_20151201.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039; &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;4:19:22 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039; 8:30:08 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** I have confirmed that the necessary information in the .gdb file exist in the new build (e.g. the URL of the database we are using).&lt;br /&gt;
&lt;br /&gt;
=== TallyEngine ===&lt;br /&gt;
&lt;br /&gt;
* Run the TallyEngine in GenMAPP Builder and record the number of records for UniProt and GO in the XML data and in the Postgres databases.&lt;br /&gt;
** Choose the menu item Tallies &amp;gt; Run XML and Database Tallies for UniProt and GO...&lt;br /&gt;
** Choose the UniProt and GO OBO XML files that was uploaded from the previous sections of this assignment.&lt;br /&gt;
** Here is the screenshot of the tally result:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot; style=&amp;quot;width: auto; margin-left: auto; margin-right: auto;&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:TallyEngine results OTS 112115.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Using XMLPipeDB match to Validate the XML Results from the TallyEngine ===&lt;br /&gt;
[[How_Do_I_Count_Thee%3F_Let_Me_Count_The_Ways | Follow the instructions found on this page to run XMLPipeDB match.]]&lt;br /&gt;
* In the Thawspace directory, I created a folder called &amp;quot;Shigella_flexneri_BioDB_2015&amp;quot; and created subfolders called &amp;quot;Source&amp;quot; and &amp;quot;Working&amp;quot; to store the source files (i.e., the compressed files) and the working files (i.e., the files I will actually be processing).&lt;br /&gt;
* As a result, I had to cd to these directories first before using the command for using Match.&lt;br /&gt;
** In order to change into the ThawSpace0\Shigella_flexneri_BioDB_2015\Working directory, use the following commands on the command prompt window:&lt;br /&gt;
 T: &amp;amp;&amp;amp; cd &amp;quot;Shigella_flexneri_BioDB_2015\Working&amp;quot;&lt;br /&gt;
* The command I used once inside the directory I want is:&lt;br /&gt;
 java -jar xmlpipedb-match-1.1.1/xmlpipedb-match-1.1.1.jar &amp;quot;SF[0-9][0-9][0-9][0-9]&amp;quot; &amp;lt; uniprot-proteome%3AUP000001006.xml&lt;br /&gt;
* The results are as follows:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Match results OTS 112115.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
These results did not match up with what the TallyEngine gave (TallyEngine: 7567 vs. Match: 4610)&lt;br /&gt;
* As a result, the commands would have to be modified somehow so that the numbers match: &amp;lt;code&amp;gt;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?(/|&amp;lt;/name&amp;gt;)&amp;lt;/code&amp;gt;&lt;br /&gt;
* The overall command to write to a text file is as follows:&lt;br /&gt;
 java -jar xmlpipedb-match-1.1.1/xmlpipedb-match-1.1.1.jar &amp;quot;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?(/|&amp;lt;/name&amp;gt;)&amp;quot; &amp;lt; uniprot-proteome%3AUP000001006.xml &amp;gt; shigella_flexneri_results.txt&lt;br /&gt;
* Then our results became:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Match results OTS 20151203 more accurate.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* Observations: &lt;br /&gt;
** In order to lessen the number of matches, we had to add the end tag &amp;quot;&amp;lt;/name&amp;gt;&amp;quot; to our regular expression. This brought down the number of matches from over 8000, to just 7517. Since TallyEngine&amp;#039;s results were 7567, this means that 150 IDs were not being caught. In order to account for this, we had to add the genes with ID&amp;#039;s of the form CP#### (there were 50 instances of these), and those with the form SF####.# or S####.#. This led us to get 7566 gene IDs. &lt;br /&gt;
** When I looked at the IDs in Microsoft Access, the IDs total 7569. In order to account for this last piece of gene formatting, we also had to account for the genes with the form SF?####/SF?####. These 2 extra genes that were not accounted for by TallyEngine is actually not supposed to be separated since the genes are formatted such that it can be interpreted that the IDs are interchangeable. When the gdb file was created, it would seem that these genes have been split down the &amp;quot;/&amp;quot;.&lt;br /&gt;
** In other words, there are 3 ordered locus names with formatting that is different from the rest: SF2223/SF2224, S2352/S2353, and S3359/S3360. &lt;br /&gt;
** I wasn&amp;#039;t able to exactly hit the number outputted by Tally Engine since there are other genes with the same format that were already caught with the patterns SF#### or S####.&lt;br /&gt;
** Note: It turns out the ShiBASE database only uses the pattern SF#### instead of S#### or CP#### so the regular expression would really have to be just &amp;lt;code&amp;gt;SF?[0-9][0-9][0-9][0-9](\.[0-9])?(/|&amp;lt;/name&amp;gt;)&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
== System IDs ==&lt;br /&gt;
* UniProt: A0A0H2[A-Z][A-Z][0-9][0-9], A0A0H2[A-Z][A-Z][A-Z][0-9], A0A0H2[A-Z][0-9][0-9][0-9], or [A-Z][0-9][A-Z][0-9][A-Z][0-9]&lt;br /&gt;
** Examples: A0A0H2USI9, A0A0H2USA4, A0A0H2V010, A5A6A8&lt;br /&gt;
* RefSeq: NP_######, WP_#########, YP_#########, YP_######&lt;br /&gt;
** Examples: NP_858405, WP_000002440, YP_001449236, YP_145811 (only one of this)&lt;br /&gt;
* GeneID (EntrezGene from NCBI):&lt;br /&gt;
* GO: #######&lt;br /&gt;
* OrderedLocusNames: CP####, S#### (or S####.#), and SF#### (or SF####.#)&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine ===&lt;br /&gt;
* The command used to count the number of IDs is:&lt;br /&gt;
 select count(*) from genenametype where type = &amp;quot;ordered locus&amp;quot; and value ~ &amp;quot;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&amp;quot;;&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Postgres results OTS 20151203.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* The result above is exactly twice as much as the number of OrderedLocusNames from TallyEngine: 15134 / 2 = 7567 IDs&lt;br /&gt;
* A quick peek at the results after the command &amp;lt;code&amp;gt;select value from genenametype where type = &amp;#039;ordered locus&amp;#039; and value ~ &amp;#039;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&amp;#039;;&amp;lt;/code&amp;gt; is used and the results are exported to Excel reveals that this is because every single entry is entered twice: &lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Postgres results excel form OTS 20151203.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* Adding the keyword &amp;quot;distinct&amp;quot; would resolve the double counting:&lt;br /&gt;
 select distinct value from genenametype where type = &amp;#039;ordered locus&amp;#039; and value ~ &amp;#039;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&lt;br /&gt;
&lt;br /&gt;
=== Analysis ===&lt;br /&gt;
* The total number of OrderedLocusNames in TallyEngine is &amp;#039;&amp;#039;&amp;#039;7567&amp;#039;&amp;#039;&amp;#039;.&lt;br /&gt;
* Using the (best I could) regular expression in Match, the result is &amp;#039;&amp;#039;&amp;#039;7573&amp;#039;&amp;#039;&amp;#039;. The additional 6 IDs emerged since those are originally captured by the regular expression &amp;lt;code&amp;gt;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&amp;lt;/name&amp;gt;&amp;lt;/code&amp;gt; and trying to capture the IDs of the form &amp;lt;code&amp;gt;SF?####/SF?####&amp;lt;/code&amp;gt; would duplicate those captured IDs.&lt;br /&gt;
* The total of entries in PostGreSQL is &amp;#039;&amp;#039;&amp;#039;15134&amp;#039;&amp;#039;&amp;#039;, but this is only because each gene is repeated twice. As a result, diving by 2 would actually yield &amp;#039;&amp;#039;&amp;#039;7567&amp;#039;&amp;#039;&amp;#039;.&lt;br /&gt;
* Microsoft Access yielded &amp;#039;&amp;#039;&amp;#039;7569&amp;#039;&amp;#039;&amp;#039; in the OrderedLocusNames window. The extra 2 genes came from the IDs of the form &amp;lt;code&amp;gt;SF?####/SF?####&amp;lt;/code&amp;gt; since the export broke up the two IDs that represent the same ID.&lt;br /&gt;
** 49 are of the form &amp;lt;code&amp;gt;CP####&amp;lt;/code&amp;gt;&lt;br /&gt;
** 3413 are of the form &amp;lt;code&amp;gt;S####&amp;lt;/code&amp;gt;&lt;br /&gt;
*** 14 are of the form &amp;lt;code&amp;gt;S####.#&amp;lt;/code&amp;gt;&lt;br /&gt;
** 4107 are of the form &amp;lt;code&amp;gt;SF####&amp;lt;/code&amp;gt;&lt;br /&gt;
*** 35 are of the form &amp;lt;code&amp;gt;SF####.#&amp;lt;/code&amp;gt;&lt;br /&gt;
* Inspecting the UniProt XML file was necessary in identifying the IDs. Looking through what was inside, I discovered (with help from Dondi) that I had to add the end tag &amp;quot;&amp;lt;/name&amp;gt;&amp;quot; in order to narrow down the results in Match&lt;br /&gt;
&lt;br /&gt;
== &amp;quot;Export&amp;quot; from Build 2 ==&lt;br /&gt;
Name of .gdb file: [[Media:Sf-Std 20151207.gdb | Sf-Std 20151207.gdb]]&lt;br /&gt;
* Date: &amp;#039;&amp;#039;&amp;#039; 12/7/15 &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039; 4 hours, 24 minutes and 1 second &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;9:13:45 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039; 1:37:46 AM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: This export had to be redone since the PSQL database had twice as much entries.&lt;br /&gt;
&lt;br /&gt;
== Export Information (Re-imported) Build 2 ==&lt;br /&gt;
Version of GenMAPP Builder:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; gmbuilder-3.0.0-build-5 &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Computer on which export was run: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; Front of the room, 3rd computer from the right. &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Postgres Database name: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Shigella_flexneri_20151208&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
UniProt XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): &amp;#039;&amp;#039;&amp;#039;UniProt release 2015_11&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* UniProt XML download link: &amp;#039;&amp;#039;&amp;#039;[http://www.uniprot.org/uniprot/?query=proteome:UP000001006 Click here]&amp;#039;&amp;#039;&amp;#039; &lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;4.43 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GO OBO-XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): &amp;#039;&amp;#039;&amp;#039;Version created on 11/19/2015 (at 2:24 AM)&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GO OBO-XML download link: &amp;#039;&amp;#039;&amp;#039;[http://archive.geneontology.org/latest-termdb/go_daily-termdb.obo-xml.gz Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;6.84 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to process: &amp;#039;&amp;#039;&amp;#039;5.49 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GOA filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): &amp;#039;&amp;#039;&amp;#039;Version released on .&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GOA download link: &amp;#039;&amp;#039;&amp;#039;[http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/103.S_flexneri_301.goa Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;0.06 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
Name of .gdb file (give filename and upload and link to compressed file): &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151208.gdb | Sf-Std 20151208.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039;1 hour, 38 minutes, 42 seconds  &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;4:30:59 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039;6:09:41 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: The reason why I had to re-import everything into a new database is because the one I have been using had some files imported twice. Thus, the reports given by PostGres were all twice as much.&lt;br /&gt;
&lt;br /&gt;
=== Using TallyEngine ===&lt;br /&gt;
* The database used is the same one described in the section above: &amp;#039;&amp;#039;&amp;#039;Shigella_flexneri_20151208&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Notice in the image below that there is an error in the cells. It turns out that we did not even need to add the Ordered Locus since that was the default. We will definitely need to do one last build in order to fix that issue.&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Shigella flexneri tallyEngine results build 2.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Using XMLPipeDB match to Validate the XML Results from the TallyEngine ===&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:regex1_OTS.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:regex2_OTS.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* When added together, the results becomes 7566 + 3 = 7569.&lt;br /&gt;
&lt;br /&gt;
=== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine ===&lt;br /&gt;
* The following command in PostGreSQL resulted in 7567 entries:&lt;br /&gt;
 select value from genenametype where type = &amp;#039;ordered locus&amp;#039; and value ~ &amp;#039;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&amp;#039;;&lt;br /&gt;
* The following command resulted in 214 entries:&lt;br /&gt;
 select value from genenametype where type = &amp;#039;ORF&amp;#039; and value ~ &amp;#039;(CP|SF?)(_p)?[0-9][0-9][0-9][0-9](\.[0-9])?&amp;#039;;&lt;br /&gt;
&lt;br /&gt;
=== OriginalRowCounts Comparison ===&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Ms access originalrowcounts.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* The OrderedLocusNames row seems to report on the same number of IDs as our previous builds&lt;br /&gt;
&lt;br /&gt;
=== Visual Inspection ===&lt;br /&gt;
Perform visual inspection of individual tables to see if there are any problems.&lt;br /&gt;
&lt;br /&gt;
* Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?&lt;br /&gt;
** Yes, there are dates present for GeneOntology, InterPro, GeneID, RefSeq, UniProt, EMBL, PDB, Pfam, OrderedLocusNames, and EnsemblBacteria.&lt;br /&gt;
* Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?&lt;br /&gt;
** Yes, all of them seem to follow the same format (there ares more or less, 3 variations on the IDs for each of the tables).&lt;br /&gt;
&lt;br /&gt;
=== Excel Inspection ===&lt;br /&gt;
* [[Media:In-search-of-the-missing-ids.xlsx| Excel file]]&lt;br /&gt;
&lt;br /&gt;
=== Observations ===&lt;br /&gt;
* Through the use of an XML-reader program, called &amp;quot;firstObject XML Editor&amp;quot;, it was discovered that some ordered locus IDs that were exported by GenMAPPBuilder were placed in the same tag: &lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image: Dual ordered locus names.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* These differed from the ones originally captured (7567) since these existed separately in each of the gene/name tags:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image: Simple ordered locus names.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* Additionally, from the IDs reported by the GenMAPP users as missing, it was revealed that these do not exist in the XML file at all, or at least in the format that we wanted. These sets of IDs were actually misnomers since, even though CTRL + F lets us find them, they are not the ordered locus names that we were looking for: &lt;br /&gt;
&lt;br /&gt;
* Example 1:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image: Match pic.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* Example 2:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image: Id misnomers2.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* However, because of these observations, we have actually discovered ~92 IDs that existed within the XML file, albeit in a different tag than what we were using:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image: Id misnomers.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* All these observations led us to make one final build to capture those ~92 gene IDs.&lt;br /&gt;
&lt;br /&gt;
== Export Information (final) ==&lt;br /&gt;
* Date: 12/14/15&lt;br /&gt;
Name of .gdb file (give filename and upload and link to compressed file): &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151208.gdb | Sf-Std 20151208.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039;2 hour, 0 minutes, 27 seconds&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;9:35:00 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039;11:35:27 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
=== Using TallyEngine ===&lt;br /&gt;
* The results are shown below:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Tally results build2.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Tally_results_build2.png&amp;diff=7934</id>
		<title>File:Tally results build2.png</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Tally_results_build2.png&amp;diff=7934"/>
				<updated>2015-12-15T08:18:35Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: uploading a new one&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;uploading a new one&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Gene_Database_Testing_Report_-_Oregon_Trail_Survivors&amp;diff=7908</id>
		<title>Gene Database Testing Report - Oregon Trail Survivors</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Gene_Database_Testing_Report_-_Oregon_Trail_Survivors&amp;diff=7908"/>
				<updated>2015-12-15T06:40:47Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: /* Export from &amp;quot;Build 2&amp;quot; */ Edited header&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Template:Oregon Trail Survivors}}&lt;br /&gt;
&lt;br /&gt;
== Things to note ==&lt;br /&gt;
* Taxonomy ID: 623&lt;br /&gt;
* UP000001006&lt;br /&gt;
* File management system: Wiki&lt;br /&gt;
&lt;br /&gt;
== Initial (Vanilla) Export Information ==&lt;br /&gt;
Version of GenMAPP Builder:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; gmbuilder-3.0.0-build-5 &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Computer on which export was run: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; Front of the room, 3rd computer from the right. &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Postgres Database name: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Shigella flexneri 20151911&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
UniProt XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): &amp;#039;&amp;#039;&amp;#039;UniProt release 2015_11&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* UniProt XML download link: &amp;#039;&amp;#039;&amp;#039;[http://www.uniprot.org/uniprot/?query=proteome:UP000001006 Click here]&amp;#039;&amp;#039;&amp;#039; &lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;4.48 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GO OBO-XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): &amp;#039;&amp;#039;&amp;#039;Version created on 11/19/2015 (at 2:24 AM)&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GO OBO-XML download link: &amp;#039;&amp;#039;&amp;#039;[http://archive.geneontology.org/latest-termdb/go_daily-termdb.obo-xml.gz Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;7.00 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to process: &amp;#039;&amp;#039;&amp;#039;4.99 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GOA filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): &amp;#039;&amp;#039;&amp;#039;Version released on .&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GOA download link: &amp;#039;&amp;#039;&amp;#039;[http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/103.S_flexneri_301.goa Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;0.06 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
Name of .gdb file (give filename and upload and link to compressed file): &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151119 OTS.gdb | Sf-Std_20151119_OTS.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039; 1 Hours, 32 Minutes, 33 Seconds &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;4:06:13 PM PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039;5:38:46 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
== Export Information for Build with Coder Changes # 1 ==&lt;br /&gt;
=== Build 1 ===&lt;br /&gt;
Name of .gdb file: &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151201.gdb | Sf-Std_20151201.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039; &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;4:19:22 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039; 8:30:08 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** I have confirmed that the necessary information in the .gdb file exist in the new build (e.g. the URL of the database we are using).&lt;br /&gt;
&lt;br /&gt;
=== TallyEngine ===&lt;br /&gt;
&lt;br /&gt;
* Run the TallyEngine in GenMAPP Builder and record the number of records for UniProt and GO in the XML data and in the Postgres databases.&lt;br /&gt;
** Choose the menu item Tallies &amp;gt; Run XML and Database Tallies for UniProt and GO...&lt;br /&gt;
** Choose the UniProt and GO OBO XML files that was uploaded from the previous sections of this assignment.&lt;br /&gt;
** Here is the screenshot of the tally result:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot; style=&amp;quot;width: auto; margin-left: auto; margin-right: auto;&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:TallyEngine results OTS 112115.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Using XMLPipeDB match to Validate the XML Results from the TallyEngine ===&lt;br /&gt;
[[How_Do_I_Count_Thee%3F_Let_Me_Count_The_Ways | Follow the instructions found on this page to run XMLPipeDB match.]]&lt;br /&gt;
* In the Thawspace directory, I created a folder called &amp;quot;Shigella_flexneri_BioDB_2015&amp;quot; and created subfolders called &amp;quot;Source&amp;quot; and &amp;quot;Working&amp;quot; to store the source files (i.e., the compressed files) and the working files (i.e., the files I will actually be processing).&lt;br /&gt;
* As a result, I had to cd to these directories first before using the command for using Match.&lt;br /&gt;
** In order to change into the ThawSpace0\Shigella_flexneri_BioDB_2015\Working directory, use the following commands on the command prompt window:&lt;br /&gt;
 T: &amp;amp;&amp;amp; cd &amp;quot;Shigella_flexneri_BioDB_2015\Working&amp;quot;&lt;br /&gt;
* The command I used once inside the directory I want is:&lt;br /&gt;
 java -jar xmlpipedb-match-1.1.1/xmlpipedb-match-1.1.1.jar &amp;quot;SF[0-9][0-9][0-9][0-9]&amp;quot; &amp;lt; uniprot-proteome%3AUP000001006.xml&lt;br /&gt;
* The results are as follows:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Match results OTS 112115.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
These results did not match up with what the TallyEngine gave (TallyEngine: 7567 vs. Match: 4610)&lt;br /&gt;
* As a result, the commands would have to be modified somehow so that the numbers match: &amp;lt;code&amp;gt;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?(/|&amp;lt;/name&amp;gt;)&amp;lt;/code&amp;gt;&lt;br /&gt;
* The overall command to write to a text file is as follows:&lt;br /&gt;
 java -jar xmlpipedb-match-1.1.1/xmlpipedb-match-1.1.1.jar &amp;quot;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?(/|&amp;lt;/name&amp;gt;)&amp;quot; &amp;lt; uniprot-proteome%3AUP000001006.xml &amp;gt; shigella_flexneri_results.txt&lt;br /&gt;
* Then our results became:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Match results OTS 20151203 more accurate.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* Observations: &lt;br /&gt;
** In order to lessen the number of matches, we had to add the end tag &amp;quot;&amp;lt;/name&amp;gt;&amp;quot; to our regular expression. This brought down the number of matches from over 8000, to just 7517. Since TallyEngine&amp;#039;s results were 7567, this means that 150 IDs were not being caught. In order to account for this, we had to add the genes with ID&amp;#039;s of the form CP#### (there were 50 instances of these), and those with the form SF####.# or S####.#. This led us to get 7566 gene IDs. &lt;br /&gt;
** When I looked at the IDs in Microsoft Access, the IDs total 7569. In order to account for this last piece of gene formatting, we also had to account for the genes with the form SF?####/SF?####. These 2 extra genes that were not accounted for by TallyEngine is actually not supposed to be separated since the genes are formatted such that it can be interpreted that the IDs are interchangeable. When the gdb file was created, it would seem that these genes have been split down the &amp;quot;/&amp;quot;.&lt;br /&gt;
** In other words, there are 3 ordered locus names with formatting that is different from the rest: SF2223/SF2224, S2352/S2353, and S3359/S3360. &lt;br /&gt;
** I wasn&amp;#039;t able to exactly hit the number outputted by Tally Engine since there are other genes with the same format that were already caught with the patterns SF#### or S####.&lt;br /&gt;
** Note: It turns out the ShiBASE database only uses the pattern SF#### instead of S#### or CP#### so the regular expression would really have to be just &amp;lt;code&amp;gt;SF?[0-9][0-9][0-9][0-9](\.[0-9])?(/|&amp;lt;/name&amp;gt;)&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
== System IDs ==&lt;br /&gt;
* UniProt: A0A0H2[A-Z][A-Z][0-9][0-9], A0A0H2[A-Z][A-Z][A-Z][0-9], A0A0H2[A-Z][0-9][0-9][0-9], or [A-Z][0-9][A-Z][0-9][A-Z][0-9]&lt;br /&gt;
** Examples: A0A0H2USI9, A0A0H2USA4, A0A0H2V010, A5A6A8&lt;br /&gt;
* RefSeq: NP_######, WP_#########, YP_#########, YP_######&lt;br /&gt;
** Examples: NP_858405, WP_000002440, YP_001449236, YP_145811 (only one of this)&lt;br /&gt;
* GeneID (EntrezGene from NCBI):&lt;br /&gt;
* GO: #######&lt;br /&gt;
* OrderedLocusNames: CP####, S#### (or S####.#), and SF#### (or SF####.#)&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine ===&lt;br /&gt;
* The command used to count the number of IDs is:&lt;br /&gt;
 select count(*) from genenametype where type = &amp;quot;ordered locus&amp;quot; and value ~ &amp;quot;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&amp;quot;;&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Postgres results OTS 20151203.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* The result above is exactly twice as much as the number of OrderedLocusNames from TallyEngine: 15134 / 2 = 7567 IDs&lt;br /&gt;
* A quick peek at the results after the command &amp;lt;code&amp;gt;select value from genenametype where type = &amp;#039;ordered locus&amp;#039; and value ~ &amp;#039;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&amp;#039;;&amp;lt;/code&amp;gt; is used and the results are exported to Excel reveals that this is because every single entry is entered twice: &lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Postgres results excel form OTS 20151203.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* Adding the keyword &amp;quot;distinct&amp;quot; would resolve the double counting:&lt;br /&gt;
 select distinct value from genenametype where type = &amp;#039;ordered locus&amp;#039; and value ~ &amp;#039;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&lt;br /&gt;
&lt;br /&gt;
=== Analysis ===&lt;br /&gt;
* The total number of OrderedLocusNames in TallyEngine is &amp;#039;&amp;#039;&amp;#039;7567&amp;#039;&amp;#039;&amp;#039;.&lt;br /&gt;
* Using the (best I could) regular expression in Match, the result is &amp;#039;&amp;#039;&amp;#039;7573&amp;#039;&amp;#039;&amp;#039;. The additional 6 IDs emerged since those are originally captured by the regular expression &amp;lt;code&amp;gt;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&amp;lt;/name&amp;gt;&amp;lt;/code&amp;gt; and trying to capture the IDs of the form &amp;lt;code&amp;gt;SF?####/SF?####&amp;lt;/code&amp;gt; would duplicate those captured IDs.&lt;br /&gt;
* The total of entries in PostGreSQL is &amp;#039;&amp;#039;&amp;#039;15134&amp;#039;&amp;#039;&amp;#039;, but this is only because each gene is repeated twice. As a result, diving by 2 would actually yield &amp;#039;&amp;#039;&amp;#039;7567&amp;#039;&amp;#039;&amp;#039;.&lt;br /&gt;
* Microsoft Access yielded &amp;#039;&amp;#039;&amp;#039;7569&amp;#039;&amp;#039;&amp;#039; in the OrderedLocusNames window. The extra 2 genes came from the IDs of the form &amp;lt;code&amp;gt;SF?####/SF?####&amp;lt;/code&amp;gt; since the export broke up the two IDs that represent the same ID.&lt;br /&gt;
** 49 are of the form &amp;lt;code&amp;gt;CP####&amp;lt;/code&amp;gt;&lt;br /&gt;
** 3413 are of the form &amp;lt;code&amp;gt;S####&amp;lt;/code&amp;gt;&lt;br /&gt;
*** 14 are of the form &amp;lt;code&amp;gt;S####.#&amp;lt;/code&amp;gt;&lt;br /&gt;
** 4107 are of the form &amp;lt;code&amp;gt;SF####&amp;lt;/code&amp;gt;&lt;br /&gt;
*** 35 are of the form &amp;lt;code&amp;gt;SF####.#&amp;lt;/code&amp;gt;&lt;br /&gt;
* Inspecting the UniProt XML file was necessary in identifying the IDs. Looking through what was inside, I discovered (with help from Dondi) that I had to add the end tag &amp;quot;&amp;lt;/name&amp;gt;&amp;quot; in order to narrow down the results in Match&lt;br /&gt;
&lt;br /&gt;
== &amp;quot;Export&amp;quot; from Build 2 ==&lt;br /&gt;
Name of .gdb file: [[Media:Sf-Std 20151207.gdb | Sf-Std 20151207.gdb]]&lt;br /&gt;
* Date: &amp;#039;&amp;#039;&amp;#039; 12/7/15 &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039; 4 hours, 24 minutes and 1 second &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;9:13:45 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039; 1:37:46 AM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: This export had to be redone since the PSQL database had twice as much entries.&lt;br /&gt;
&lt;br /&gt;
== Export Information (Re-imported) Build 2 ==&lt;br /&gt;
Version of GenMAPP Builder:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; gmbuilder-3.0.0-build-5 &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Computer on which export was run: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; Front of the room, 3rd computer from the right. &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Postgres Database name: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Shigella_flexneri_20151208&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
UniProt XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): &amp;#039;&amp;#039;&amp;#039;UniProt release 2015_11&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* UniProt XML download link: &amp;#039;&amp;#039;&amp;#039;[http://www.uniprot.org/uniprot/?query=proteome:UP000001006 Click here]&amp;#039;&amp;#039;&amp;#039; &lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;4.43 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GO OBO-XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): &amp;#039;&amp;#039;&amp;#039;Version created on 11/19/2015 (at 2:24 AM)&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GO OBO-XML download link: &amp;#039;&amp;#039;&amp;#039;[http://archive.geneontology.org/latest-termdb/go_daily-termdb.obo-xml.gz Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;6.84 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to process: &amp;#039;&amp;#039;&amp;#039;5.49 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GOA filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): &amp;#039;&amp;#039;&amp;#039;Version released on .&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GOA download link: &amp;#039;&amp;#039;&amp;#039;[http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/103.S_flexneri_301.goa Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;0.06 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
Name of .gdb file (give filename and upload and link to compressed file): &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151208.gdb | Sf-Std 20151208.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039;1 hour, 38 minutes, 42 seconds  &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;4:30:59 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039;6:09:41 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: The reason why I had to re-import everything into a new database is because the one I have been using had some files imported twice. Thus, the reports given by PostGres were all twice as much.&lt;br /&gt;
&lt;br /&gt;
=== Using TallyEngine ===&lt;br /&gt;
* The database used is the same one described in the section above: &amp;#039;&amp;#039;&amp;#039;Shigella_flexneri_20151208&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Notice in the image below that there is an error in the cells. It turns out that we did not even need to add the Ordered Locus since that was the default. We will definitely need to do one last build in order to fix that issue.&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Shigella flexneri tallyEngine results build 2.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Using XMLPipeDB match to Validate the XML Results from the TallyEngine ===&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:regex1_OTS.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:regex2_OTS.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* When added together, the results becomes 7566 + 3 = 7569.&lt;br /&gt;
&lt;br /&gt;
=== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine ===&lt;br /&gt;
* The following command in PostGreSQL resulted in 7567 entries:&lt;br /&gt;
 select value from genenametype where type = &amp;#039;ordered locus&amp;#039; and value ~ &amp;#039;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&amp;#039;;&lt;br /&gt;
* The following command resulted in 214 entries:&lt;br /&gt;
 select value from genenametype where type = &amp;#039;ORF&amp;#039; and value ~ &amp;#039;(CP|SF?)(_p)?[0-9][0-9][0-9][0-9](\.[0-9])?&amp;#039;;&lt;br /&gt;
&lt;br /&gt;
=== OriginalRowCounts Comparison ===&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Ms access originalrowcounts.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* The OrderedLocusNames row seems to report on the same number of IDs as our previous builds&lt;br /&gt;
&lt;br /&gt;
=== Visual Inspection ===&lt;br /&gt;
Perform visual inspection of individual tables to see if there are any problems.&lt;br /&gt;
&lt;br /&gt;
* Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?&lt;br /&gt;
** Yes, there are dates present for GeneOntology, InterPro, GeneID, RefSeq, UniProt, EMBL, PDB, Pfam, OrderedLocusNames, and EnsemblBacteria.&lt;br /&gt;
* Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?&lt;br /&gt;
** Yes, all of them seem to follow the same format (there ares more or less, 3 variations on the IDs for each of the tables).&lt;br /&gt;
&lt;br /&gt;
=== Excel Inspection ===&lt;br /&gt;
* [[Media:In-search-of-the-missing-ids.xlsx| Excel file]]&lt;br /&gt;
&lt;br /&gt;
=== Observations ===&lt;br /&gt;
* Through the use of an XML-reader program, called &amp;quot;firstObject XML Editor&amp;quot;, it was discovered that some ordered locus IDs that were exported by GenMAPPBuilder were placed in the same tag: &lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image: Dual ordered locus names.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* These differed from the ones originally captured (7567) since these existed separately in each of the gene/name tags:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image: Simple ordered locus names.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* Additionally, from the IDs reported by the GenMAPP users as missing, it was revealed that these do not exist in the XML file at all, or at least in the format that we wanted. These sets of IDs were actually misnomers since, even though CTRL + F lets us find them, they are not the ordered locus names that we were looking for: &lt;br /&gt;
&lt;br /&gt;
* Example 1:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image: Match pic.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* Example 2:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image: Id misnomers2.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* However, because of these observations, we have actually discovered ~92 IDs that existed within the XML file, albeit in a different tag than what we were using:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image: Id misnomers.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* All these observations led us to make one final build to capture those ~92 gene IDs.&lt;br /&gt;
&lt;br /&gt;
== Export Information (final) ==&lt;br /&gt;
* Date: 12/14/15&lt;br /&gt;
Name of .gdb file (give filename and upload and link to compressed file): &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151208.gdb | Sf-Std 20151208.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039;1 hour, 38 minutes, 42 seconds  &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;9:35 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039; PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Gene_Database_Testing_Report_-_Oregon_Trail_Survivors&amp;diff=7907</id>
		<title>Gene Database Testing Report - Oregon Trail Survivors</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Gene_Database_Testing_Report_-_Oregon_Trail_Survivors&amp;diff=7907"/>
				<updated>2015-12-15T06:39:44Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: /* Export Information (Re-imported) Build 2 */ Added more pictures&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Template:Oregon Trail Survivors}}&lt;br /&gt;
&lt;br /&gt;
== Things to note ==&lt;br /&gt;
* Taxonomy ID: 623&lt;br /&gt;
* UP000001006&lt;br /&gt;
* File management system: Wiki&lt;br /&gt;
&lt;br /&gt;
== Initial (Vanilla) Export Information ==&lt;br /&gt;
Version of GenMAPP Builder:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; gmbuilder-3.0.0-build-5 &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Computer on which export was run: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; Front of the room, 3rd computer from the right. &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Postgres Database name: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Shigella flexneri 20151911&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
UniProt XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): &amp;#039;&amp;#039;&amp;#039;UniProt release 2015_11&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* UniProt XML download link: &amp;#039;&amp;#039;&amp;#039;[http://www.uniprot.org/uniprot/?query=proteome:UP000001006 Click here]&amp;#039;&amp;#039;&amp;#039; &lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;4.48 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GO OBO-XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): &amp;#039;&amp;#039;&amp;#039;Version created on 11/19/2015 (at 2:24 AM)&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GO OBO-XML download link: &amp;#039;&amp;#039;&amp;#039;[http://archive.geneontology.org/latest-termdb/go_daily-termdb.obo-xml.gz Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;7.00 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to process: &amp;#039;&amp;#039;&amp;#039;4.99 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GOA filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): &amp;#039;&amp;#039;&amp;#039;Version released on .&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GOA download link: &amp;#039;&amp;#039;&amp;#039;[http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/103.S_flexneri_301.goa Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;0.06 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
Name of .gdb file (give filename and upload and link to compressed file): &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151119 OTS.gdb | Sf-Std_20151119_OTS.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039; 1 Hours, 32 Minutes, 33 Seconds &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;4:06:13 PM PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039;5:38:46 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
== Export Information for Build with Coder Changes # 1 ==&lt;br /&gt;
=== Build 1 ===&lt;br /&gt;
Name of .gdb file: &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151201.gdb | Sf-Std_20151201.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039; &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;4:19:22 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039; 8:30:08 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** I have confirmed that the necessary information in the .gdb file exist in the new build (e.g. the URL of the database we are using).&lt;br /&gt;
&lt;br /&gt;
=== TallyEngine ===&lt;br /&gt;
&lt;br /&gt;
* Run the TallyEngine in GenMAPP Builder and record the number of records for UniProt and GO in the XML data and in the Postgres databases.&lt;br /&gt;
** Choose the menu item Tallies &amp;gt; Run XML and Database Tallies for UniProt and GO...&lt;br /&gt;
** Choose the UniProt and GO OBO XML files that was uploaded from the previous sections of this assignment.&lt;br /&gt;
** Here is the screenshot of the tally result:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot; style=&amp;quot;width: auto; margin-left: auto; margin-right: auto;&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:TallyEngine results OTS 112115.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Using XMLPipeDB match to Validate the XML Results from the TallyEngine ===&lt;br /&gt;
[[How_Do_I_Count_Thee%3F_Let_Me_Count_The_Ways | Follow the instructions found on this page to run XMLPipeDB match.]]&lt;br /&gt;
* In the Thawspace directory, I created a folder called &amp;quot;Shigella_flexneri_BioDB_2015&amp;quot; and created subfolders called &amp;quot;Source&amp;quot; and &amp;quot;Working&amp;quot; to store the source files (i.e., the compressed files) and the working files (i.e., the files I will actually be processing).&lt;br /&gt;
* As a result, I had to cd to these directories first before using the command for using Match.&lt;br /&gt;
** In order to change into the ThawSpace0\Shigella_flexneri_BioDB_2015\Working directory, use the following commands on the command prompt window:&lt;br /&gt;
 T: &amp;amp;&amp;amp; cd &amp;quot;Shigella_flexneri_BioDB_2015\Working&amp;quot;&lt;br /&gt;
* The command I used once inside the directory I want is:&lt;br /&gt;
 java -jar xmlpipedb-match-1.1.1/xmlpipedb-match-1.1.1.jar &amp;quot;SF[0-9][0-9][0-9][0-9]&amp;quot; &amp;lt; uniprot-proteome%3AUP000001006.xml&lt;br /&gt;
* The results are as follows:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Match results OTS 112115.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
These results did not match up with what the TallyEngine gave (TallyEngine: 7567 vs. Match: 4610)&lt;br /&gt;
* As a result, the commands would have to be modified somehow so that the numbers match: &amp;lt;code&amp;gt;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?(/|&amp;lt;/name&amp;gt;)&amp;lt;/code&amp;gt;&lt;br /&gt;
* The overall command to write to a text file is as follows:&lt;br /&gt;
 java -jar xmlpipedb-match-1.1.1/xmlpipedb-match-1.1.1.jar &amp;quot;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?(/|&amp;lt;/name&amp;gt;)&amp;quot; &amp;lt; uniprot-proteome%3AUP000001006.xml &amp;gt; shigella_flexneri_results.txt&lt;br /&gt;
* Then our results became:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Match results OTS 20151203 more accurate.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* Observations: &lt;br /&gt;
** In order to lessen the number of matches, we had to add the end tag &amp;quot;&amp;lt;/name&amp;gt;&amp;quot; to our regular expression. This brought down the number of matches from over 8000, to just 7517. Since TallyEngine&amp;#039;s results were 7567, this means that 150 IDs were not being caught. In order to account for this, we had to add the genes with ID&amp;#039;s of the form CP#### (there were 50 instances of these), and those with the form SF####.# or S####.#. This led us to get 7566 gene IDs. &lt;br /&gt;
** When I looked at the IDs in Microsoft Access, the IDs total 7569. In order to account for this last piece of gene formatting, we also had to account for the genes with the form SF?####/SF?####. These 2 extra genes that were not accounted for by TallyEngine is actually not supposed to be separated since the genes are formatted such that it can be interpreted that the IDs are interchangeable. When the gdb file was created, it would seem that these genes have been split down the &amp;quot;/&amp;quot;.&lt;br /&gt;
** In other words, there are 3 ordered locus names with formatting that is different from the rest: SF2223/SF2224, S2352/S2353, and S3359/S3360. &lt;br /&gt;
** I wasn&amp;#039;t able to exactly hit the number outputted by Tally Engine since there are other genes with the same format that were already caught with the patterns SF#### or S####.&lt;br /&gt;
** Note: It turns out the ShiBASE database only uses the pattern SF#### instead of S#### or CP#### so the regular expression would really have to be just &amp;lt;code&amp;gt;SF?[0-9][0-9][0-9][0-9](\.[0-9])?(/|&amp;lt;/name&amp;gt;)&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
== System IDs ==&lt;br /&gt;
* UniProt: A0A0H2[A-Z][A-Z][0-9][0-9], A0A0H2[A-Z][A-Z][A-Z][0-9], A0A0H2[A-Z][0-9][0-9][0-9], or [A-Z][0-9][A-Z][0-9][A-Z][0-9]&lt;br /&gt;
** Examples: A0A0H2USI9, A0A0H2USA4, A0A0H2V010, A5A6A8&lt;br /&gt;
* RefSeq: NP_######, WP_#########, YP_#########, YP_######&lt;br /&gt;
** Examples: NP_858405, WP_000002440, YP_001449236, YP_145811 (only one of this)&lt;br /&gt;
* GeneID (EntrezGene from NCBI):&lt;br /&gt;
* GO: #######&lt;br /&gt;
* OrderedLocusNames: CP####, S#### (or S####.#), and SF#### (or SF####.#)&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine ===&lt;br /&gt;
* The command used to count the number of IDs is:&lt;br /&gt;
 select count(*) from genenametype where type = &amp;quot;ordered locus&amp;quot; and value ~ &amp;quot;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&amp;quot;;&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Postgres results OTS 20151203.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* The result above is exactly twice as much as the number of OrderedLocusNames from TallyEngine: 15134 / 2 = 7567 IDs&lt;br /&gt;
* A quick peek at the results after the command &amp;lt;code&amp;gt;select value from genenametype where type = &amp;#039;ordered locus&amp;#039; and value ~ &amp;#039;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&amp;#039;;&amp;lt;/code&amp;gt; is used and the results are exported to Excel reveals that this is because every single entry is entered twice: &lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Postgres results excel form OTS 20151203.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* Adding the keyword &amp;quot;distinct&amp;quot; would resolve the double counting:&lt;br /&gt;
 select distinct value from genenametype where type = &amp;#039;ordered locus&amp;#039; and value ~ &amp;#039;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&lt;br /&gt;
&lt;br /&gt;
=== Analysis ===&lt;br /&gt;
* The total number of OrderedLocusNames in TallyEngine is &amp;#039;&amp;#039;&amp;#039;7567&amp;#039;&amp;#039;&amp;#039;.&lt;br /&gt;
* Using the (best I could) regular expression in Match, the result is &amp;#039;&amp;#039;&amp;#039;7573&amp;#039;&amp;#039;&amp;#039;. The additional 6 IDs emerged since those are originally captured by the regular expression &amp;lt;code&amp;gt;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&amp;lt;/name&amp;gt;&amp;lt;/code&amp;gt; and trying to capture the IDs of the form &amp;lt;code&amp;gt;SF?####/SF?####&amp;lt;/code&amp;gt; would duplicate those captured IDs.&lt;br /&gt;
* The total of entries in PostGreSQL is &amp;#039;&amp;#039;&amp;#039;15134&amp;#039;&amp;#039;&amp;#039;, but this is only because each gene is repeated twice. As a result, diving by 2 would actually yield &amp;#039;&amp;#039;&amp;#039;7567&amp;#039;&amp;#039;&amp;#039;.&lt;br /&gt;
* Microsoft Access yielded &amp;#039;&amp;#039;&amp;#039;7569&amp;#039;&amp;#039;&amp;#039; in the OrderedLocusNames window. The extra 2 genes came from the IDs of the form &amp;lt;code&amp;gt;SF?####/SF?####&amp;lt;/code&amp;gt; since the export broke up the two IDs that represent the same ID.&lt;br /&gt;
** 49 are of the form &amp;lt;code&amp;gt;CP####&amp;lt;/code&amp;gt;&lt;br /&gt;
** 3413 are of the form &amp;lt;code&amp;gt;S####&amp;lt;/code&amp;gt;&lt;br /&gt;
*** 14 are of the form &amp;lt;code&amp;gt;S####.#&amp;lt;/code&amp;gt;&lt;br /&gt;
** 4107 are of the form &amp;lt;code&amp;gt;SF####&amp;lt;/code&amp;gt;&lt;br /&gt;
*** 35 are of the form &amp;lt;code&amp;gt;SF####.#&amp;lt;/code&amp;gt;&lt;br /&gt;
* Inspecting the UniProt XML file was necessary in identifying the IDs. Looking through what was inside, I discovered (with help from Dondi) that I had to add the end tag &amp;quot;&amp;lt;/name&amp;gt;&amp;quot; in order to narrow down the results in Match&lt;br /&gt;
&lt;br /&gt;
== Export from &amp;quot;Build 2&amp;quot; ==&lt;br /&gt;
Name of .gdb file: [[Media:Sf-Std 20151207.gdb | Sf-Std 20151207.gdb]]&lt;br /&gt;
* Date: &amp;#039;&amp;#039;&amp;#039; 12/7/15 &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039; 4 hours, 24 minutes and 1 second &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;9:13:45 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039; 1:37:46 AM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: This export had to be redone since the PSQL database had twice as much entries.&lt;br /&gt;
&lt;br /&gt;
== Export Information (Re-imported) Build 2 ==&lt;br /&gt;
Version of GenMAPP Builder:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; gmbuilder-3.0.0-build-5 &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Computer on which export was run: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; Front of the room, 3rd computer from the right. &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Postgres Database name: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Shigella_flexneri_20151208&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
UniProt XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): &amp;#039;&amp;#039;&amp;#039;UniProt release 2015_11&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* UniProt XML download link: &amp;#039;&amp;#039;&amp;#039;[http://www.uniprot.org/uniprot/?query=proteome:UP000001006 Click here]&amp;#039;&amp;#039;&amp;#039; &lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;4.43 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GO OBO-XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): &amp;#039;&amp;#039;&amp;#039;Version created on 11/19/2015 (at 2:24 AM)&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GO OBO-XML download link: &amp;#039;&amp;#039;&amp;#039;[http://archive.geneontology.org/latest-termdb/go_daily-termdb.obo-xml.gz Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;6.84 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to process: &amp;#039;&amp;#039;&amp;#039;5.49 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GOA filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): &amp;#039;&amp;#039;&amp;#039;Version released on .&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GOA download link: &amp;#039;&amp;#039;&amp;#039;[http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/103.S_flexneri_301.goa Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;0.06 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
Name of .gdb file (give filename and upload and link to compressed file): &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151208.gdb | Sf-Std 20151208.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039;1 hour, 38 minutes, 42 seconds  &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;4:30:59 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039;6:09:41 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: The reason why I had to re-import everything into a new database is because the one I have been using had some files imported twice. Thus, the reports given by PostGres were all twice as much.&lt;br /&gt;
&lt;br /&gt;
=== Using TallyEngine ===&lt;br /&gt;
* The database used is the same one described in the section above: &amp;#039;&amp;#039;&amp;#039;Shigella_flexneri_20151208&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Notice in the image below that there is an error in the cells. It turns out that we did not even need to add the Ordered Locus since that was the default. We will definitely need to do one last build in order to fix that issue.&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Shigella flexneri tallyEngine results build 2.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Using XMLPipeDB match to Validate the XML Results from the TallyEngine ===&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:regex1_OTS.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:regex2_OTS.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* When added together, the results becomes 7566 + 3 = 7569.&lt;br /&gt;
&lt;br /&gt;
=== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine ===&lt;br /&gt;
* The following command in PostGreSQL resulted in 7567 entries:&lt;br /&gt;
 select value from genenametype where type = &amp;#039;ordered locus&amp;#039; and value ~ &amp;#039;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&amp;#039;;&lt;br /&gt;
* The following command resulted in 214 entries:&lt;br /&gt;
 select value from genenametype where type = &amp;#039;ORF&amp;#039; and value ~ &amp;#039;(CP|SF?)(_p)?[0-9][0-9][0-9][0-9](\.[0-9])?&amp;#039;;&lt;br /&gt;
&lt;br /&gt;
=== OriginalRowCounts Comparison ===&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Ms access originalrowcounts.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* The OrderedLocusNames row seems to report on the same number of IDs as our previous builds&lt;br /&gt;
&lt;br /&gt;
=== Visual Inspection ===&lt;br /&gt;
Perform visual inspection of individual tables to see if there are any problems.&lt;br /&gt;
&lt;br /&gt;
* Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?&lt;br /&gt;
** Yes, there are dates present for GeneOntology, InterPro, GeneID, RefSeq, UniProt, EMBL, PDB, Pfam, OrderedLocusNames, and EnsemblBacteria.&lt;br /&gt;
* Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?&lt;br /&gt;
** Yes, all of them seem to follow the same format (there ares more or less, 3 variations on the IDs for each of the tables).&lt;br /&gt;
&lt;br /&gt;
=== Excel Inspection ===&lt;br /&gt;
* [[Media:In-search-of-the-missing-ids.xlsx| Excel file]]&lt;br /&gt;
&lt;br /&gt;
=== Observations ===&lt;br /&gt;
* Through the use of an XML-reader program, called &amp;quot;firstObject XML Editor&amp;quot;, it was discovered that some ordered locus IDs that were exported by GenMAPPBuilder were placed in the same tag: &lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image: Dual ordered locus names.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* These differed from the ones originally captured (7567) since these existed separately in each of the gene/name tags:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image: Simple ordered locus names.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* Additionally, from the IDs reported by the GenMAPP users as missing, it was revealed that these do not exist in the XML file at all, or at least in the format that we wanted. These sets of IDs were actually misnomers since, even though CTRL + F lets us find them, they are not the ordered locus names that we were looking for: &lt;br /&gt;
&lt;br /&gt;
* Example 1:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image: Match pic.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* Example 2:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image: Id misnomers2.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* However, because of these observations, we have actually discovered ~92 IDs that existed within the XML file, albeit in a different tag than what we were using:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image: Id misnomers.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* All these observations led us to make one final build to capture those ~92 gene IDs.&lt;br /&gt;
&lt;br /&gt;
== Export Information (final) ==&lt;br /&gt;
* Date: 12/14/15&lt;br /&gt;
Name of .gdb file (give filename and upload and link to compressed file): &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151208.gdb | Sf-Std 20151208.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039;1 hour, 38 minutes, 42 seconds  &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;9:35 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039; PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Gene_Database_Testing_Report_-_Oregon_Trail_Survivors&amp;diff=7900</id>
		<title>Gene Database Testing Report - Oregon Trail Survivors</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Gene_Database_Testing_Report_-_Oregon_Trail_Survivors&amp;diff=7900"/>
				<updated>2015-12-15T06:24:42Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: Added other builds&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Template:Oregon Trail Survivors}}&lt;br /&gt;
&lt;br /&gt;
== Things to note ==&lt;br /&gt;
* Taxonomy ID: 623&lt;br /&gt;
* UP000001006&lt;br /&gt;
* File management system: Wiki&lt;br /&gt;
&lt;br /&gt;
== Initial (Vanilla) Export Information ==&lt;br /&gt;
Version of GenMAPP Builder:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; gmbuilder-3.0.0-build-5 &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Computer on which export was run: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; Front of the room, 3rd computer from the right. &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Postgres Database name: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Shigella flexneri 20151911&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
UniProt XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): &amp;#039;&amp;#039;&amp;#039;UniProt release 2015_11&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* UniProt XML download link: &amp;#039;&amp;#039;&amp;#039;[http://www.uniprot.org/uniprot/?query=proteome:UP000001006 Click here]&amp;#039;&amp;#039;&amp;#039; &lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;4.48 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GO OBO-XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): &amp;#039;&amp;#039;&amp;#039;Version created on 11/19/2015 (at 2:24 AM)&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GO OBO-XML download link: &amp;#039;&amp;#039;&amp;#039;[http://archive.geneontology.org/latest-termdb/go_daily-termdb.obo-xml.gz Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;7.00 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to process: &amp;#039;&amp;#039;&amp;#039;4.99 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GOA filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): &amp;#039;&amp;#039;&amp;#039;Version released on .&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GOA download link: &amp;#039;&amp;#039;&amp;#039;[http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/103.S_flexneri_301.goa Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;0.06 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
Name of .gdb file (give filename and upload and link to compressed file): &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151119 OTS.gdb | Sf-Std_20151119_OTS.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039; 1 Hours, 32 Minutes, 33 Seconds &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;4:06:13 PM PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039;5:38:46 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
== Export Information for Build with Coder Changes # 1 ==&lt;br /&gt;
=== Build 1 ===&lt;br /&gt;
Name of .gdb file: &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151201.gdb | Sf-Std_20151201.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039; &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;4:19:22 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039; 8:30:08 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** I have confirmed that the necessary information in the .gdb file exist in the new build (e.g. the URL of the database we are using).&lt;br /&gt;
&lt;br /&gt;
=== TallyEngine ===&lt;br /&gt;
&lt;br /&gt;
* Run the TallyEngine in GenMAPP Builder and record the number of records for UniProt and GO in the XML data and in the Postgres databases.&lt;br /&gt;
** Choose the menu item Tallies &amp;gt; Run XML and Database Tallies for UniProt and GO...&lt;br /&gt;
** Choose the UniProt and GO OBO XML files that was uploaded from the previous sections of this assignment.&lt;br /&gt;
** Here is the screenshot of the tally result:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot; style=&amp;quot;width: auto; margin-left: auto; margin-right: auto;&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:TallyEngine results OTS 112115.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Using XMLPipeDB match to Validate the XML Results from the TallyEngine ===&lt;br /&gt;
[[How_Do_I_Count_Thee%3F_Let_Me_Count_The_Ways | Follow the instructions found on this page to run XMLPipeDB match.]]&lt;br /&gt;
* In the Thawspace directory, I created a folder called &amp;quot;Shigella_flexneri_BioDB_2015&amp;quot; and created subfolders called &amp;quot;Source&amp;quot; and &amp;quot;Working&amp;quot; to store the source files (i.e., the compressed files) and the working files (i.e., the files I will actually be processing).&lt;br /&gt;
* As a result, I had to cd to these directories first before using the command for using Match.&lt;br /&gt;
** In order to change into the ThawSpace0\Shigella_flexneri_BioDB_2015\Working directory, use the following commands on the command prompt window:&lt;br /&gt;
 T: &amp;amp;&amp;amp; cd &amp;quot;Shigella_flexneri_BioDB_2015\Working&amp;quot;&lt;br /&gt;
* The command I used once inside the directory I want is:&lt;br /&gt;
 java -jar xmlpipedb-match-1.1.1/xmlpipedb-match-1.1.1.jar &amp;quot;SF[0-9][0-9][0-9][0-9]&amp;quot; &amp;lt; uniprot-proteome%3AUP000001006.xml&lt;br /&gt;
* The results are as follows:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Match results OTS 112115.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
These results did not match up with what the TallyEngine gave (TallyEngine: 7567 vs. Match: 4610)&lt;br /&gt;
* As a result, the commands would have to be modified somehow so that the numbers match: &amp;lt;code&amp;gt;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?(/|&amp;lt;/name&amp;gt;)&amp;lt;/code&amp;gt;&lt;br /&gt;
* The overall command to write to a text file is as follows:&lt;br /&gt;
 java -jar xmlpipedb-match-1.1.1/xmlpipedb-match-1.1.1.jar &amp;quot;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?(/|&amp;lt;/name&amp;gt;)&amp;quot; &amp;lt; uniprot-proteome%3AUP000001006.xml &amp;gt; shigella_flexneri_results.txt&lt;br /&gt;
* Then our results became:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Match results OTS 20151203 more accurate.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* Observations: &lt;br /&gt;
** In order to lessen the number of matches, we had to add the end tag &amp;quot;&amp;lt;/name&amp;gt;&amp;quot; to our regular expression. This brought down the number of matches from over 8000, to just 7517. Since TallyEngine&amp;#039;s results were 7567, this means that 150 IDs were not being caught. In order to account for this, we had to add the genes with ID&amp;#039;s of the form CP#### (there were 50 instances of these), and those with the form SF####.# or S####.#. This led us to get 7566 gene IDs. &lt;br /&gt;
** When I looked at the IDs in Microsoft Access, the IDs total 7569. In order to account for this last piece of gene formatting, we also had to account for the genes with the form SF?####/SF?####. These 2 extra genes that were not accounted for by TallyEngine is actually not supposed to be separated since the genes are formatted such that it can be interpreted that the IDs are interchangeable. When the gdb file was created, it would seem that these genes have been split down the &amp;quot;/&amp;quot;.&lt;br /&gt;
** In other words, there are 3 ordered locus names with formatting that is different from the rest: SF2223/SF2224, S2352/S2353, and S3359/S3360. &lt;br /&gt;
** I wasn&amp;#039;t able to exactly hit the number outputted by Tally Engine since there are other genes with the same format that were already caught with the patterns SF#### or S####.&lt;br /&gt;
** Note: It turns out the ShiBASE database only uses the pattern SF#### instead of S#### or CP#### so the regular expression would really have to be just &amp;lt;code&amp;gt;SF?[0-9][0-9][0-9][0-9](\.[0-9])?(/|&amp;lt;/name&amp;gt;)&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
== System IDs ==&lt;br /&gt;
* UniProt: A0A0H2[A-Z][A-Z][0-9][0-9], A0A0H2[A-Z][A-Z][A-Z][0-9], A0A0H2[A-Z][0-9][0-9][0-9], or [A-Z][0-9][A-Z][0-9][A-Z][0-9]&lt;br /&gt;
** Examples: A0A0H2USI9, A0A0H2USA4, A0A0H2V010, A5A6A8&lt;br /&gt;
* RefSeq: NP_######, WP_#########, YP_#########, YP_######&lt;br /&gt;
** Examples: NP_858405, WP_000002440, YP_001449236, YP_145811 (only one of this)&lt;br /&gt;
* GeneID (EntrezGene from NCBI):&lt;br /&gt;
* GO: #######&lt;br /&gt;
* OrderedLocusNames: CP####, S#### (or S####.#), and SF#### (or SF####.#)&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine ===&lt;br /&gt;
* The command used to count the number of IDs is:&lt;br /&gt;
 select count(*) from genenametype where type = &amp;quot;ordered locus&amp;quot; and value ~ &amp;quot;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&amp;quot;;&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Postgres results OTS 20151203.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* The result above is exactly twice as much as the number of OrderedLocusNames from TallyEngine: 15134 / 2 = 7567 IDs&lt;br /&gt;
* A quick peek at the results after the command &amp;lt;code&amp;gt;select value from genenametype where type = &amp;#039;ordered locus&amp;#039; and value ~ &amp;#039;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&amp;#039;;&amp;lt;/code&amp;gt; is used and the results are exported to Excel reveals that this is because every single entry is entered twice: &lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Postgres results excel form OTS 20151203.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* Adding the keyword &amp;quot;distinct&amp;quot; would resolve the double counting:&lt;br /&gt;
 select distinct value from genenametype where type = &amp;#039;ordered locus&amp;#039; and value ~ &amp;#039;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&lt;br /&gt;
&lt;br /&gt;
=== Analysis ===&lt;br /&gt;
* The total number of OrderedLocusNames in TallyEngine is &amp;#039;&amp;#039;&amp;#039;7567&amp;#039;&amp;#039;&amp;#039;.&lt;br /&gt;
* Using the (best I could) regular expression in Match, the result is &amp;#039;&amp;#039;&amp;#039;7573&amp;#039;&amp;#039;&amp;#039;. The additional 6 IDs emerged since those are originally captured by the regular expression &amp;lt;code&amp;gt;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&amp;lt;/name&amp;gt;&amp;lt;/code&amp;gt; and trying to capture the IDs of the form &amp;lt;code&amp;gt;SF?####/SF?####&amp;lt;/code&amp;gt; would duplicate those captured IDs.&lt;br /&gt;
* The total of entries in PostGreSQL is &amp;#039;&amp;#039;&amp;#039;15134&amp;#039;&amp;#039;&amp;#039;, but this is only because each gene is repeated twice. As a result, diving by 2 would actually yield &amp;#039;&amp;#039;&amp;#039;7567&amp;#039;&amp;#039;&amp;#039;.&lt;br /&gt;
* Microsoft Access yielded &amp;#039;&amp;#039;&amp;#039;7569&amp;#039;&amp;#039;&amp;#039; in the OrderedLocusNames window. The extra 2 genes came from the IDs of the form &amp;lt;code&amp;gt;SF?####/SF?####&amp;lt;/code&amp;gt; since the export broke up the two IDs that represent the same ID.&lt;br /&gt;
** 49 are of the form &amp;lt;code&amp;gt;CP####&amp;lt;/code&amp;gt;&lt;br /&gt;
** 3413 are of the form &amp;lt;code&amp;gt;S####&amp;lt;/code&amp;gt;&lt;br /&gt;
*** 14 are of the form &amp;lt;code&amp;gt;S####.#&amp;lt;/code&amp;gt;&lt;br /&gt;
** 4107 are of the form &amp;lt;code&amp;gt;SF####&amp;lt;/code&amp;gt;&lt;br /&gt;
*** 35 are of the form &amp;lt;code&amp;gt;SF####.#&amp;lt;/code&amp;gt;&lt;br /&gt;
* Inspecting the UniProt XML file was necessary in identifying the IDs. Looking through what was inside, I discovered (with help from Dondi) that I had to add the end tag &amp;quot;&amp;lt;/name&amp;gt;&amp;quot; in order to narrow down the results in Match&lt;br /&gt;
&lt;br /&gt;
== Export from &amp;quot;Build 2&amp;quot; ==&lt;br /&gt;
Name of .gdb file: [[Media:Sf-Std 20151207.gdb | Sf-Std 20151207.gdb]]&lt;br /&gt;
* Date: &amp;#039;&amp;#039;&amp;#039; 12/7/15 &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039; 4 hours, 24 minutes and 1 second &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;9:13:45 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039; 1:37:46 AM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: This export had to be redone since the PSQL database had twice as much entries.&lt;br /&gt;
&lt;br /&gt;
== Export Information (Re-imported) Build 2 ==&lt;br /&gt;
Version of GenMAPP Builder:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; gmbuilder-3.0.0-build-5 &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Computer on which export was run: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; Front of the room, 3rd computer from the right. &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Postgres Database name: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Shigella_flexneri_20151208&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
UniProt XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): &amp;#039;&amp;#039;&amp;#039;UniProt release 2015_11&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* UniProt XML download link: &amp;#039;&amp;#039;&amp;#039;[http://www.uniprot.org/uniprot/?query=proteome:UP000001006 Click here]&amp;#039;&amp;#039;&amp;#039; &lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;4.43 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GO OBO-XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): &amp;#039;&amp;#039;&amp;#039;Version created on 11/19/2015 (at 2:24 AM)&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GO OBO-XML download link: &amp;#039;&amp;#039;&amp;#039;[http://archive.geneontology.org/latest-termdb/go_daily-termdb.obo-xml.gz Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;6.84 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to process: &amp;#039;&amp;#039;&amp;#039;5.49 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GOA filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): &amp;#039;&amp;#039;&amp;#039;Version released on .&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GOA download link: &amp;#039;&amp;#039;&amp;#039;[http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/103.S_flexneri_301.goa Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;0.06 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
Name of .gdb file (give filename and upload and link to compressed file): &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151208.gdb | Sf-Std 20151208.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039;1 hour, 38 minutes, 42 seconds  &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;4:30:59 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039;6:09:41 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: The reason why I had to re-import everything into a new database is because the one I have been using had some files imported twice. Thus, the reports given by PostGres were all twice as much.&lt;br /&gt;
&lt;br /&gt;
=== Using TallyEngine ===&lt;br /&gt;
* The database used is the same one described in the section above: &amp;#039;&amp;#039;&amp;#039;Shigella_flexneri_20151208&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Notice in the image below that there is an error in the cells. It turns out that we did not even need to add the Ordered Locus since that was the default. We will definitely need to do one last build in order to fix that issue.&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Shigella flexneri tallyEngine results build 2.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Using XMLPipeDB match to Validate the XML Results from the TallyEngine ===&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:regex1_OTS.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:regex2_OTS.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* When added together, the results becomes 7566 + 3 = 7569.&lt;br /&gt;
&lt;br /&gt;
=== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine ===&lt;br /&gt;
* The following command in PostGreSQL resulted in 7567 entries:&lt;br /&gt;
 select value from genenametype where type = &amp;#039;ordered locus&amp;#039; and value ~ &amp;#039;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&amp;#039;;&lt;br /&gt;
* The following command resulted in 214 entries:&lt;br /&gt;
 select value from genenametype where type = &amp;#039;ORF&amp;#039; and value ~ &amp;#039;(CP|SF?)(_p)?[0-9][0-9][0-9][0-9](\.[0-9])?&amp;#039;;&lt;br /&gt;
&lt;br /&gt;
=== OriginalRowCounts Comparison ===&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Ms access originalrowcounts.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* The OrderedLocusNames row seems to report on the same number of IDs as our previous builds&lt;br /&gt;
&lt;br /&gt;
=== Visual Inspection ===&lt;br /&gt;
Perform visual inspection of individual tables to see if there are any problems.&lt;br /&gt;
&lt;br /&gt;
* Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?&lt;br /&gt;
** Yes, there are dates present for GeneOntology, InterPro, GeneID, RefSeq, UniProt, EMBL, PDB, Pfam, OrderedLocusNames, and EnsemblBacteria.&lt;br /&gt;
* Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?&lt;br /&gt;
** Yes, all of them seem to follow the same format (there ares more or less, 3 variations on the IDs for each of the tables).&lt;br /&gt;
&lt;br /&gt;
=== Excel Inspection ===&lt;br /&gt;
* [[Media:In-search-of-the-missing-ids.xlsx| Excel file]]&lt;br /&gt;
&lt;br /&gt;
== Export Information (final) ==&lt;br /&gt;
* Date: 12/14/15&lt;br /&gt;
Name of .gdb file (give filename and upload and link to compressed file): &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151208.gdb | Sf-Std 20151208.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039;1 hour, 38 minutes, 42 seconds  &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;9:35 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039; PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Gene_Database_Testing_Report_-_Oregon_Trail_Survivors&amp;diff=7895</id>
		<title>Gene Database Testing Report - Oregon Trail Survivors</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Gene_Database_Testing_Report_-_Oregon_Trail_Survivors&amp;diff=7895"/>
				<updated>2015-12-15T06:15:17Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: /* Export Information for Build with Coder Changes */ Edited the header&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Template:Oregon Trail Survivors}}&lt;br /&gt;
&lt;br /&gt;
== Things to note ==&lt;br /&gt;
* Taxonomy ID: 623&lt;br /&gt;
* UP000001006&lt;br /&gt;
* File management system: Wiki&lt;br /&gt;
&lt;br /&gt;
== Initial (Vanilla) Export Information ==&lt;br /&gt;
Version of GenMAPP Builder:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; gmbuilder-3.0.0-build-5 &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Computer on which export was run: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; Front of the room, 3rd computer from the right. &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Postgres Database name: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Shigella flexneri 20151911&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
UniProt XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): &amp;#039;&amp;#039;&amp;#039;UniProt release 2015_11&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* UniProt XML download link: &amp;#039;&amp;#039;&amp;#039;[http://www.uniprot.org/uniprot/?query=proteome:UP000001006 Click here]&amp;#039;&amp;#039;&amp;#039; &lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;4.48 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GO OBO-XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): &amp;#039;&amp;#039;&amp;#039;Version created on 11/19/2015 (at 2:24 AM)&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GO OBO-XML download link: &amp;#039;&amp;#039;&amp;#039;[http://archive.geneontology.org/latest-termdb/go_daily-termdb.obo-xml.gz Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;7.00 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to process: &amp;#039;&amp;#039;&amp;#039;4.99 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GOA filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): &amp;#039;&amp;#039;&amp;#039;Version released on .&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GOA download link: &amp;#039;&amp;#039;&amp;#039;[http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/103.S_flexneri_301.goa Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;0.06 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
Name of .gdb file (give filename and upload and link to compressed file): &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151119 OTS.gdb | Sf-Std_20151119_OTS.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039; 1 Hours, 32 Minutes, 33 Seconds &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;4:06:13 PM PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039;5:38:46 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
== Export Information for Build with Coder Changes # 1 ==&lt;br /&gt;
=== Build 1 ===&lt;br /&gt;
Name of .gdb file: &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151201.gdb | Sf-Std_20151201.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039; &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;4:19:22 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039; 8:30:08 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** I have confirmed that the necessary information in the .gdb file exist in the new build (e.g. the URL of the database we are using).&lt;br /&gt;
&lt;br /&gt;
=== Build 2 ===&lt;br /&gt;
Name of .gdb file: [[Media:Sf-Std 20151207.gdb | Sf-Std 20151207.gdb]]&lt;br /&gt;
* Date: &amp;#039;&amp;#039;&amp;#039; 12/7/15 &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039; 4 hours, 24 minutes and 1 second &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;9:13:45 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039; 1:37:46 AM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
==TallyEngine==&lt;br /&gt;
&lt;br /&gt;
* Run the TallyEngine in GenMAPP Builder and record the number of records for UniProt and GO in the XML data and in the Postgres databases.&lt;br /&gt;
** Choose the menu item Tallies &amp;gt; Run XML and Database Tallies for UniProt and GO...&lt;br /&gt;
** Choose the UniProt and GO OBO XML files that was uploaded from the previous sections of this assignment.&lt;br /&gt;
** Here is the screenshot of the tally result:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot; style=&amp;quot;width: auto; margin-left: auto; margin-right: auto;&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:TallyEngine results OTS 112115.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Using XMLPipeDB match to Validate the XML Results from the TallyEngine==&lt;br /&gt;
[[How_Do_I_Count_Thee%3F_Let_Me_Count_The_Ways | Follow the instructions found on this page to run XMLPipeDB match.]]&lt;br /&gt;
* In the Thawspace directory, I created a folder called &amp;quot;Shigella_flexneri_BioDB_2015&amp;quot; and created subfolders called &amp;quot;Source&amp;quot; and &amp;quot;Working&amp;quot; to store the source files (i.e., the compressed files) and the working files (i.e., the files I will actually be processing).&lt;br /&gt;
* As a result, I had to cd to these directories first before using the command for using Match.&lt;br /&gt;
** In order to change into the ThawSpace0\Shigella_flexneri_BioDB_2015\Working directory, use the following commands on the command prompt window:&lt;br /&gt;
 T: &amp;amp;&amp;amp; cd &amp;quot;Shigella_flexneri_BioDB_2015\Working&amp;quot;&lt;br /&gt;
* The command I used once inside the directory I want is:&lt;br /&gt;
 java -jar xmlpipedb-match-1.1.1/xmlpipedb-match-1.1.1.jar &amp;quot;SF[0-9][0-9][0-9][0-9]&amp;quot; &amp;lt; uniprot-proteome%3AUP000001006.xml&lt;br /&gt;
* The results are as follows:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Match results OTS 112115.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
These results did not match up with what the TallyEngine gave (TallyEngine: 7567 vs. Match: 4610)&lt;br /&gt;
* As a result, the commands would have to be modified somehow so that the numbers match: &amp;lt;code&amp;gt;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?(/|&amp;lt;/name&amp;gt;)&amp;lt;/code&amp;gt;&lt;br /&gt;
* The overall command to write to a text file is as follows:&lt;br /&gt;
 java -jar xmlpipedb-match-1.1.1/xmlpipedb-match-1.1.1.jar &amp;quot;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?(/|&amp;lt;/name&amp;gt;)&amp;quot; &amp;lt; uniprot-proteome%3AUP000001006.xml &amp;gt; shigella_flexneri_results.txt&lt;br /&gt;
* Then our results became:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Match results OTS 20151203 more accurate.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* Observations: &lt;br /&gt;
** In order to lessen the number of matches, we had to add the end tag &amp;quot;&amp;lt;/name&amp;gt;&amp;quot; to our regular expression. This brought down the number of matches from over 8000, to just 7517. Since TallyEngine&amp;#039;s results were 7567, this means that 150 IDs were not being caught. In order to account for this, we had to add the genes with ID&amp;#039;s of the form CP#### (there were 50 instances of these), and those with the form SF####.# or S####.#. This led us to get 7566 gene IDs. &lt;br /&gt;
** When I looked at the IDs in Microsoft Access, the IDs total 7569. In order to account for this last piece of gene formatting, we also had to account for the genes with the form SF?####/SF?####. These 2 extra genes that were not accounted for by TallyEngine is actually not supposed to be separated since the genes are formatted such that it can be interpreted that the IDs are interchangeable. When the gdb file was created, it would seem that these genes have been split down the &amp;quot;/&amp;quot;.&lt;br /&gt;
** In other words, there are 3 ordered locus names with formatting that is different from the rest: SF2223/SF2224, S2352/S2353, and S3359/S3360. &lt;br /&gt;
** I wasn&amp;#039;t able to exactly hit the number outputted by Tally Engine since there are other genes with the same format that were already caught with the patterns SF#### or S####.&lt;br /&gt;
** Note: It turns out the ShiBASE database only uses the pattern SF#### instead of S#### or CP#### so the regular expression would really have to be just &amp;lt;code&amp;gt;SF?[0-9][0-9][0-9][0-9](\.[0-9])?(/|&amp;lt;/name&amp;gt;)&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
== System IDs ==&lt;br /&gt;
* UniProt: A0A0H2[A-Z][A-Z][0-9][0-9], A0A0H2[A-Z][A-Z][A-Z][0-9], A0A0H2[A-Z][0-9][0-9][0-9], or [A-Z][0-9][A-Z][0-9][A-Z][0-9]&lt;br /&gt;
** Examples: A0A0H2USI9, A0A0H2USA4, A0A0H2V010, A5A6A8&lt;br /&gt;
* RefSeq: NP_######, WP_#########, YP_#########, YP_######&lt;br /&gt;
** Examples: NP_858405, WP_000002440, YP_001449236, YP_145811 (only one of this)&lt;br /&gt;
* GeneID (EntrezGene from NCBI):&lt;br /&gt;
* GO: #######&lt;br /&gt;
* OrderedLocusNames: CP####, S#### (or S####.#), and SF#### (or SF####.#)&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine ==&lt;br /&gt;
* The command used to count the number of IDs is:&lt;br /&gt;
 select count(*) from genenametype where type = &amp;quot;ordered locus&amp;quot; and value ~ &amp;quot;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&amp;quot;;&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Postgres results OTS 20151203.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* The result above is exactly twice as much as the number of OrderedLocusNames from TallyEngine: 15134 / 2 = 7567 IDs&lt;br /&gt;
* A quick peek at the results after the command &amp;lt;code&amp;gt;select value from genenametype where type = &amp;#039;ordered locus&amp;#039; and value ~ &amp;#039;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&amp;#039;;&amp;lt;/code&amp;gt; is used and the results are exported to Excel reveals that this is because every single entry is entered twice: &lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Postgres results excel form OTS 20151203.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* Adding the keyword &amp;quot;distinct&amp;quot; would resolve the double counting:&lt;br /&gt;
 select distinct value from genenametype where type = &amp;#039;ordered locus&amp;#039; and value ~ &amp;#039;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&lt;br /&gt;
&lt;br /&gt;
== Analysis == &lt;br /&gt;
* The total number of OrderedLocusNames in TallyEngine is &amp;#039;&amp;#039;&amp;#039;7567&amp;#039;&amp;#039;&amp;#039;.&lt;br /&gt;
* Using the (best I could) regular expression in Match, the result is &amp;#039;&amp;#039;&amp;#039;7573&amp;#039;&amp;#039;&amp;#039;. The additional 6 IDs emerged since those are originally captured by the regular expression &amp;lt;code&amp;gt;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&amp;lt;/name&amp;gt;&amp;lt;/code&amp;gt; and trying to capture the IDs of the form &amp;lt;code&amp;gt;SF?####/SF?####&amp;lt;/code&amp;gt; would duplicate those captured IDs.&lt;br /&gt;
* The total of entries in PostGreSQL is &amp;#039;&amp;#039;&amp;#039;15134&amp;#039;&amp;#039;&amp;#039;, but this is only because each gene is repeated twice. As a result, diving by 2 would actually yield &amp;#039;&amp;#039;&amp;#039;7567&amp;#039;&amp;#039;&amp;#039;.&lt;br /&gt;
* Microsoft Access yielded &amp;#039;&amp;#039;&amp;#039;7569&amp;#039;&amp;#039;&amp;#039; in the OrderedLocusNames window. The extra 2 genes came from the IDs of the form &amp;lt;code&amp;gt;SF?####/SF?####&amp;lt;/code&amp;gt; since the export broke up the two IDs that represent the same ID.&lt;br /&gt;
** 49 are of the form &amp;lt;code&amp;gt;CP####&amp;lt;/code&amp;gt;&lt;br /&gt;
** 3413 are of the form &amp;lt;code&amp;gt;S####&amp;lt;/code&amp;gt;&lt;br /&gt;
*** 14 are of the form &amp;lt;code&amp;gt;S####.#&amp;lt;/code&amp;gt;&lt;br /&gt;
** 4107 are of the form &amp;lt;code&amp;gt;SF####&amp;lt;/code&amp;gt;&lt;br /&gt;
*** 35 are of the form &amp;lt;code&amp;gt;SF####.#&amp;lt;/code&amp;gt;&lt;br /&gt;
* Inspecting the UniProt XML file was necessary in identifying the IDs. Looking through what was inside, I discovered (with help from Dondi) that I had to add the end tag &amp;quot;&amp;lt;/name&amp;gt;&amp;quot; in order to narrow down the results in Match&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Oregon_Trail_Survivors&amp;diff=7894</id>
		<title>Oregon Trail Survivors</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Oregon_Trail_Survivors&amp;diff=7894"/>
				<updated>2015-12-15T06:13:57Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: /* Week 15 */ Added Trixie&amp;#039;s entry&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;div style=&amp;quot;text-align: center; font-size: 250%; line-height: 1.25em&amp;quot;&amp;gt;&amp;#039;&amp;#039;&amp;#039;Oregon Trail Survivors&amp;#039;&amp;#039;&amp;#039;&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Oregon-trail-dysentery 5 biodb.jpg | thumb | right | 350px | The third leading cause of death in the Oregon Trail.]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Group Members ==&lt;br /&gt;
*Coder: [[User:Jwoodlee | Jake Woodlee]]&lt;br /&gt;
*Quality Assurance: [[User:Troque | Trixie Roque]]&lt;br /&gt;
*GenMAPP Users: [[User:Eyanosch | Erich Yanoschik]] &amp;amp; [[User:Kzebrows | Kristin Zebrowski]]&lt;br /&gt;
* Project Manager: [[User:Kzebrows | Kristin Zebrowski]]&lt;br /&gt;
&lt;br /&gt;
{{Template:Oregon Trail Survivors}}&lt;br /&gt;
&lt;br /&gt;
=== Presentation (QA/Coder) ===&lt;br /&gt;
* PDF can be seen [[Media: Genome Paper Presentation BioDB.pdf | here]]&lt;br /&gt;
&lt;br /&gt;
===Group Meeting Times===&lt;br /&gt;
*Thursday, November 5th at 8:00 pm&lt;br /&gt;
*Met most Sundays and Monday evenings in the Biol DB lab to check in with one another.&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
Over the upcoming weeks our group will be investigating &amp;#039;&amp;#039;Shigella flexneri&amp;#039;&amp;#039;. &lt;br /&gt;
&lt;br /&gt;
===Week 10===&lt;br /&gt;
&lt;br /&gt;
# Find genome sequence paper&lt;br /&gt;
# Find 4-8 microarray data and paper that goes with the genome paper&lt;br /&gt;
# Compile team page to and create a ranked annotated bibliography&lt;br /&gt;
&lt;br /&gt;
===Week 11===&lt;br /&gt;
&lt;br /&gt;
#Prepare for journal club presentations in Weeks 12 and 13&lt;br /&gt;
#Begin initial tasks on research project&lt;br /&gt;
&lt;br /&gt;
Click on username links for more information regarding each team member&amp;#039;s contributions for Week 11.&lt;br /&gt;
&lt;br /&gt;
[[Jwoodlee Week 11 | Jake]]: Read through the genome paper and tried to get through the accessible things I had the ability to understand.  Made an outline for the genome paper. Worked on the presentation with Trixie and found a database.  And of course I answered the assigned questions.&lt;br /&gt;
&lt;br /&gt;
[[Troque Week 11 | Trixie]]: Mainly focused on the Genome paper presentation with Jake. This includes searching for a viable database that we will be using for the rest of the group assignment and actually creating the presentation we will be doing for October 17th, 2015. I&amp;#039;ve also updated our group page to reflect what Dr. Dahlquist suggested would improve our team page.&lt;br /&gt;
&lt;br /&gt;
[[Eyanosch Week 11 | Erich]]: Analyzed the microarray paper in order to describe the experimental design of the microarray data, treatments, number of replicates, and dye swaps. Worked with Kristin to produce the power point for the GennMAP users presentation at Journal Club. Worked on the individual journal entry and created an outline of the microarray paper.&lt;br /&gt;
&lt;br /&gt;
[[Kzebrows Week 11 | Kristin]]: Using the team&amp;#039;s selected microarray paper I developed an outline including background information, experimental outline/methods and how samples corresponded to the data, a brief description of the results, and a discussion including the implications of the research and its results in comparison to previous studies. Using this outline, I created a flow chart corresponding to the research. I also worked with Erich in order to create a PowerPoint for the Journal Club presentation on Nov. 24.&lt;br /&gt;
&lt;br /&gt;
=== Week 12 ===&lt;br /&gt;
#QA will be doing an initial database export. &lt;br /&gt;
#Coder will be setting up version control.&lt;br /&gt;
#GenMAPP users will compile the raw data from the micorarray file to prepare for normalization and statistic analysis (will begin if time permits after consultation with Dr. Dahlquist). Additionally, the GenMAPP users will be determining the number of biological or technical replicates and how samples were labeled.&lt;br /&gt;
#Coder and QA will present on genome paper in class Tuesday, Nov. 24. &lt;br /&gt;
&lt;br /&gt;
Click on username links for more information regarding each team member&amp;#039;s contributions for Week 12.&lt;br /&gt;
* [[Jwoodlee Week 12 | Jake]]:Setup my environment in eclipse, created the s-flexneri branch, created my own copy of GenMAPP that I can modify for later use and I cloned the repository with the Git commands.&lt;br /&gt;
* [[Troque Week 12 | Trixie]]: Finished the preliminary export of the XML and GOA files and the corresponding Gene Testing Report. Also started identifying the gene id&amp;#039;s for the specie. Decided on file management system with Jake.&lt;br /&gt;
* [[Eyanosch Week 12 | Erich]]: Worked with Kristin in determining the total number of biological and technical replicates. Compiled the raw data for RP samples, specifically the ID and Log ratio columns. Incorporated the RP and RX data into one spreadsheet with Kristins data. We created a table of the sample data and file each corresponds with, also figured out there were no dye swaps in the experiment(The control was the Cy3 dye and the treatment the Cy5 dye).&lt;br /&gt;
* [[Kzebrows Week 12 | Kristin]]: Determined that there were 3 biological replicates per treatment for 6 treatments total. Compiled raw data for RX samples by re-naming columns for ID and Log Ratio and putting into same worksheet, which was later combined with Erich&amp;#039;s worksheet for RP samples. Erich and I met and worked together to create a table of which samples correspond to which file.&lt;br /&gt;
&lt;br /&gt;
===Week 14===&lt;br /&gt;
#QA will be documenting the IDs using MATCH, Postgres, Microsoft Access, and Excel and get a head start of Milestone 3, which is customizing the TallyEngine.&lt;br /&gt;
#Coder will determine and document any modified export behavior that the GenMAPP Builder will have and resolve bugs. Coder will also work with QA by uploading GM Builder for additional export. &lt;br /&gt;
#GenMAPP Users will perform statistical analysis on Excel (normalization, tests) and format for import into GenMAPP. Users will also import data into GenMAPP and run MAPPFinder, and then document these test runs. &lt;br /&gt;
&lt;br /&gt;
Click on username links for more information regarding each team member&amp;#039;s contributions for Week 14.&lt;br /&gt;
* [[Jwoodlee Week 14 | Jake]]: Finished custom GenMAPP builder, committed to GitHub, and ran the export with the custom software.  This created a custom .gdb which was opened in Microsoft Access and GenMAPP to check for accuracy.&lt;br /&gt;
* [[Troque Week 14 | Trixie]]: Trixie has finished identifying the gene IDs using MATCH, Postgres, Microsoft Access, and Excel. It was discovered that some IDs are in &amp;quot;dbReference/property&amp;amp;type&amp;amp;gene ID&amp;quot;, and so another export was done on 12/7/15 to add the newly discovered gene IDs.&lt;br /&gt;
* [[Eyanosch Week 14 | Erich]]: Kristin and I completed the corrections provided via Dr. Dhalquist on Kristins talk page. We split the work into two halves and I worked on the RP data. We completed the statistics, Bonferroni p value correction, and the sanity check. I downloaded the database and formatted/exported the file for GenMAPP, and tried to create a GO tree for one of the trail points with RX.&lt;br /&gt;
* [[Kzebrows Week 14 | Kristin]]: This week Erich and I made corrections from the talk page and normalized log ratios for the slides in the experiment. I completed the statistical analysis for RX samples and calculated the Bonferroni p value correction. I also performed a sanity check for the RX samples and, going off of that, I calculated the Benjamini &amp;amp; Hochberg p value correction for RX-1-30, which had the most statistically significant changes in gene expression. I also formatted and exported the file for GenMAPP, downloaded the database, and attempted to create color sets to run the data set through MappFINDER. &lt;br /&gt;
&lt;br /&gt;
==== Reflection ====&lt;br /&gt;
&lt;br /&gt;
Each team member should reflect on the team&amp;#039;s progress:&lt;br /&gt;
# What worked?&lt;br /&gt;
# What didn&amp;#039;t work?&lt;br /&gt;
# What will I do next to fix what didn&amp;#039;t work?&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;Kristin&amp;#039;&amp;#039;: &lt;br /&gt;
#What worked?&lt;br /&gt;
#*In terms of communication is having a group text. We also meet at least once a week outside of class in order to work together on the assignments and make sure we are all on the same page. So far, this has allowed us to troubleshoot and address bugs together as a team quickly. It also worked for Erich and I to divide up the samples so that I did all RX and Erich did all RP. Then, we could work at the same time and double-check procedures with each other but we were still getting the work done twice as quickly. &lt;br /&gt;
#What didn&amp;#039;t work?&lt;br /&gt;
#*After creating the initial compiled raw data file, I had to make several corrections before the file could be run through GenMAPP. First of all, I had to get rid of the &amp;quot;.&amp;quot;, and I also had to change all #DIV/0! with a space character for the file to be read at all. Also, although we were unable to find all of the b#### and CP#### gene ID&amp;#039;s in UniProt or ShiBASE. Also, after creating my color set and trying to run MAPPFinder, I tried three computers and all of them crashed with the &amp;quot;not responding&amp;quot; message.&lt;br /&gt;
#What will I do next to fix what didn&amp;#039;t work?&lt;br /&gt;
#*I will communicate with the QA and Coder in order to create a database with a minimal number of &amp;quot;Gene ID not found&amp;#039;s&amp;quot; and then communicate with Erich when we try to run our dataset through MappFinder. Once the gene database is re-customized and the export is complete I can try and re-run my dataset to see if that makes a difference.&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039; Trixie &amp;#039;&amp;#039;:&lt;br /&gt;
# What worked?&lt;br /&gt;
#* What worked in identifying the gene IDs is to look export .gdb file into Excel and compare with what the OrderedLocusNames table had (from Microsoft Access). From doing this, it was easier to find which genes were not found in the .gdb file and made it easier to look through them in the UniProt XML file. With the Excel file comparing the lists of gene IDs and using the CTRL+F shortcut, I was also able to discern which tags to include into the new builds for the databases. Because of this, I was able to confirm that some genes indeed do not exist in the XML file, while only a couple exist within the &amp;quot;dbReference&amp;quot; tag. In terms of group work, what worked is posting all our files into a single page as we progress through the assignment. Night meetings were also helpful in order to better communicate with the rest of my group.&lt;br /&gt;
# What didn&amp;#039;t work?&lt;br /&gt;
#* What didn&amp;#039;t work is using Match multiple times without thinking. Even when I was trying to match the number of gene IDs with what Tally Engine gives me, Match didn&amp;#039;t really help me in identifying where to find the genes in the XML file. Waiting for the database to finish didn&amp;#039;t help much at all since our builds would take more than 4 hours to finish.&lt;br /&gt;
# What will I do next to fix what didn&amp;#039;t work?&lt;br /&gt;
#* What I would do next to fix what didn&amp;#039;t work is to actually use Match in conjunction to the XML file, or just use the Excel method completely since that was actually more helpful in finding the necessary tags than the Match method. I would probably have to time myself to check the lab after about 4.5 hours since one of our builds lasted that long.&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;Jake&amp;#039;&amp;#039;:&lt;br /&gt;
#What worked?&lt;br /&gt;
#*Almost every procedural action I took from Dondi worked. The only hiccup I had was in regard to Eclipse and navigating the directories.&lt;br /&gt;
#What didn&amp;#039;t work?&lt;br /&gt;
#*In Eclipse, my edits to the GenMAPP builder source code were causing red error marks, but after selecting &amp;quot;Organize Imports&amp;quot; from the source menu the errors were fixed easily and the proper classes were imported. Also I had difficulty navigating to the dist file in my Temp drive, however I traced this back within Eclipse and was able to make a zip that I could hand off to Trixie for export.&lt;br /&gt;
#What will I do next week to fix what didn&amp;#039;t work?&lt;br /&gt;
#*It seems to me that there wasn&amp;#039;t a whole lot that went wrong with my procedure. What wasn&amp;#039;t working I already fixed. Currently Trixie and I are running an export that will take 4 hours with the new additions in the property files, so there may be some new hiccups when that export is finished but we will have to wait and see.&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;Erich&amp;#039;&amp;#039;:&lt;br /&gt;
# What worked?&lt;br /&gt;
#*Having a GenMAPP user meeting with Dr. Dhalquist helped focus on what goals we wanted to achieve by the time of our next meeting. A group text helped organize meeting times of both the coders and GenMAPP users helped keep us on schedule. &lt;br /&gt;
# What didn&amp;#039;t work?&lt;br /&gt;
#*The GenMapp Gene Ontology Tree was unable to pull files for each GO selection. We need to work on and make sure the GO files can be found. We also had to remove and edit our compiled raw data files so that they are able to be read by GenMAPP.&lt;br /&gt;
# What will I do next to fix what didn&amp;#039;t work?&lt;br /&gt;
#*A new .gex was created, so this might help with the problems experienced in the MappBuilder. Also communicating with the QA and coder to make sure we finish up the GO tree smoothly in order to assess the results of the Publication we chose for Shigella Flexneri.&lt;br /&gt;
&lt;br /&gt;
===Week 15===&lt;br /&gt;
#Coder: Work with QA to fix bugs.&lt;br /&gt;
#QA: Work with coder to fix bugs in the .gdb.&lt;br /&gt;
#GenMAPP Users: Finish Milestone 3. Run tests with GenMAPP. Do a journal club outline of the paper to use in the Discussion section of group report and presentation. Create a .mapp file showing one changed pathway from the data.&lt;br /&gt;
#All team members will be working together to put together deliverables including the final report and presentation for next Tuesday. &lt;br /&gt;
&lt;br /&gt;
Click on username links for more information regarding each team member&amp;#039;s contributions for Week 14.&lt;br /&gt;
* [[Jwoodlee Week 15 | Jake]]: Pulled Dondi&amp;#039;s changes, and then created a new clean distribution.  I then uploaded that distribution to our OTS Files page.  Edited properties file for TallyEngine.&lt;br /&gt;
* [[Troque Week 15 | Trixie]]: Had to re-import to PostgreSQL due to having imported twice -- this resulted in the number of counts being twice as much as what was in the XML file. Also worked with Dr. Dionisio in order to find ~92 new IDs from the XML file that were not caught before and collaborated with Jake in order to make 2 more builds that should, ideally, produce the intended 92 genes.&lt;br /&gt;
* [[Eyanosch Week 15 | Erich]]: Used Kristin&amp;#039;s color sets criterion GO files to fill out my gene expression MAPP. Made MAPPS for pathways that were significantly affected such as Metabolic procceses (glycolysis, TCA cycle), Flagellar Assembly, and Ribosome. Incorporated the data into slides for the power point and analyzed the data obtained with that produced from the microarray paper.&lt;br /&gt;
* [[Kzebrows Week 15 | Kristin]]: I created color sets with Increased/Decreased criteria for all of the 12 treatment/time point combos. Then, based on the criterion.go files, I created tables by filtering the results comparing the most commonly induced or repressed genes for the 1 x MIC at 60 minutes and 0.5 x MIC at 10 minutes between RX and RP. Strikingly, we found that between RX and RP the effects were very similar. I then compared them with the .mapp files that Erich created and put my portion of the project (compiled sanity check, color set, comparison tables) in the power point.&lt;br /&gt;
&lt;br /&gt;
==Overview of Genome Paper==&lt;br /&gt;
*Used the genome sequencing article to perform a prospective search in the [https://apps.webofknowledge.com/UA_GeneralSearch_input.do?product=UA&amp;amp;search_mode=GeneralSearch&amp;amp;SID=1FRKcNxUgxiGX6spITI&amp;amp;preferencesSaved= Web of Science] database.&lt;br /&gt;
*Overview of the search:&lt;br /&gt;
**How many articles does this article cite? 37&lt;br /&gt;
**How many articles cite this article? 303&lt;br /&gt;
**Based on the titles and abstracts of the papers, what type of research directions have been taken now that the genome for that organism has been sequenced? &lt;br /&gt;
***Now that the genome has been sequenced, a majority of research has been done on discovering which genes are responsible for virulence and pathogenesis as well as potential antibiotics. Genomic research is also focused on how &amp;#039;&amp;#039;S. flexneri&amp;#039;&amp;#039; has been able to develop resistance to multiple drugs. Furthermore, &amp;#039;&amp;#039;Shigella&amp;#039;&amp;#039; is suspected to have evolved from &amp;#039;&amp;#039;Escherichia coli&amp;#039;&amp;#039; so a lot of research has been done in how and when pathogenic &amp;#039;&amp;#039;Shigella&amp;#039;&amp;#039; split from &amp;#039;&amp;#039;E. coli&amp;#039;&amp;#039; on the evolutionary tree.&lt;br /&gt;
&lt;br /&gt;
==Annotated Bibliography==&lt;br /&gt;
=== Genome Paper ===&lt;br /&gt;
Jin, Q., Yuan, Z., Xu, J., Wang, Y., Shen, Y., Lu, W., … Yu, J. (2002). Genome sequence of &amp;#039;&amp;#039;Shigella flexneri&amp;#039;&amp;#039; 2a: insights into pathogenicity through comparison with genomes of Escherichia coli K12 and O157. Nucleic Acids Research, 30(20), 4432–4441.&lt;br /&gt;
* PubMed Abstract: http://www.ncbi.nlm.nih.gov/pubmed/?term=Genome+sequence+of+Shigella+flexneri+2a%3A+insights+into+pathogenicity+through+comparison+with+genomes+of+Escherichia+coli+K12+and+O157&lt;br /&gt;
* PubMed Central:  http://www.ncbi.nlm.nih.gov/pmc/articles/PMC137130/&lt;br /&gt;
* Publisher Full Text (HTML):  http://nar.oxfordjournals.org/content/30/20/4432.full&lt;br /&gt;
* Publisher Full Text (PDF):  http://nar.oxfordjournals.org/content/30/20/4432.full.pdf+html&lt;br /&gt;
* Copyright:  2002 Oxford University Press&lt;br /&gt;
* Publisher:   Oxford University Press&lt;br /&gt;
* Availability:  in print and online&lt;br /&gt;
* Did LMU pay a fee for this article: no&lt;br /&gt;
&lt;br /&gt;
===Microarray Paper===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--====Paper Rankings====&lt;br /&gt;
&lt;br /&gt;
It would have been helpful for you to actually lit the papers in this ranked order.  &amp;#039;&amp;#039;&amp;amp;mdash; [[User:Kdahlquist|Kdahlquist]] ([[User talk:Kdahlquist|talk]]) 10:32, 10 November 2015 (PST)&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
#Fu H, Liu L, Zhang X, Zhu Y, Zhao L, Peng J, et al. (2012) Common Changes in Global Gene Expression Induced by RNA Polymerase Inhibitors in &amp;#039;&amp;#039;Shigella flexneri&amp;#039;&amp;#039;. PLoS ONE 7(3): e33240. doi:10.1371/journal.pone.0033240&lt;br /&gt;
#* This paper is suitable for your project.  &amp;#039;&amp;#039;&amp;amp;mdash; [[User:Kdahlquist|Kdahlquist]] ([[User talk:Kdahlquist|talk]]) 10:38, 10 November 2015 (PST)&amp;#039;&amp;#039;&lt;br /&gt;
#Morris, Carolyn R, et al. ‘Characterization of Intracellular Growth Regulator IcgR by Utilizing Transcriptomics to Identify Mediators of Pathogenesis in Shigella Flexneri’. Infection and Immunity 81.9 (Sep. 2013): 3068–3076. 6 Nov. 2015.&lt;br /&gt;
#Global analysis of a plasmid-cured Shigella flexneri strain: new insights into the interaction between the chromosome and a virulence plasmid. Li Zhu, Xiankai Liu, Xuexue Zheng, Xin Bu, Ge Zhao, Chaohua Xie, Jingfei Zhang, Na Li, Erling Feng, Jie Wang, Yongqiang Jiang, Peitang Huang, Hengliang Wang J Proteome Res. 2010 February 5; 9(2): 843–854. doi: 10.1021/pr9007514&lt;br /&gt;
#Peng J, Yang J, Jin Q (2011) An Integrated Approach for Finding Overlooked Genes in Shigella. PLoS ONE 6(4): e18509. doi: 10.1371/journal.pone.0018509&lt;br /&gt;
#Waddell, C. D., Walter, T. J., Pacheco, S. A., Purdy, G. E., &amp;amp; Runyen-Janecky, L. J. (2014). NtrBC and Nac Contribute to Efficient Shigella flexneri Intracellular Replication. Journal of Bacteriology, 196(14), 2578–2586. http://doi.org/10.1128/JB.01613-14&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- ==== Kristin ====&lt;br /&gt;
Peng J, Yang J, Jin Q (2011) An Integrated Approach for Finding Overlooked Genes in Shigella. PLoS ONE 6(4): e18509. doi: 10.1371/journal.pone.0018509&lt;br /&gt;
*PubMed Abstract: [http://www.ncbi.nlm.nih.gov/pubmed/21483688 Abstract]&lt;br /&gt;
*PubMedCentral: [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3071730/ PMC]&lt;br /&gt;
*Publisher Full Text (HTML format): [http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0018509 HTML]&lt;br /&gt;
*Publisher Full Text (PDF): [http://www.plosone.org/article/fetchObject.action?uri=info:doi/10.1371/journal.pone.0018509&amp;amp;representation=PDF PDF]&lt;br /&gt;
*Copyright: 2011 Peng et al. Article is Open Access and the authors own the copyright, not the journal, under a Creative Commons license.&lt;br /&gt;
*Publisher: PLOS One&lt;br /&gt;
**Is the article available under &amp;quot;Open Access&amp;quot;? Yes&lt;br /&gt;
*Availability: online only&lt;br /&gt;
*Did LMU pay a fee for this article: no&lt;br /&gt;
*Database used to find the data and article: ArrayExpress&lt;br /&gt;
*Terms searched: &amp;#039;&amp;#039;Shigella flexneri&amp;#039;&amp;#039;&lt;br /&gt;
**Filtered by organism: &amp;#039;&amp;#039;Shigella flexneri&amp;#039;&amp;#039;&lt;br /&gt;
**Filtered by experiment type: RNA assay, array assay&lt;br /&gt;
*Search overview&lt;br /&gt;
**Results: 7&lt;br /&gt;
**Assessment: All of the articles were relevant but not all had enough assays to be able to be used for this assignment. All involved transcription profiling by array but obviously the experiments differed. Expression analysis was used to examine an RNA polymerase inhibitor, comparing wild type to mutant &amp;#039;&amp;#039;Shigella&amp;#039;&amp;#039;, and virulence plasmid-cured strains amongst others.&lt;br /&gt;
*Search in Web of Knowledge&lt;br /&gt;
**Number of articles this article cites: 71&lt;br /&gt;
**Number of times this article has been cited: 1&lt;br /&gt;
**What research directions have been taken since this article has been published? The only article that cited this paper involved detecting infectious diarrheal diseases by chemiluminescence imaging. &lt;br /&gt;
**[https://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-22800/samples/?keywords=shigella+flexneri&amp;amp;organism=Shigella+flexneri&amp;amp;exptype%5B%5D=%22rna+assay%22&amp;amp;exptype%5B%5D=%22array+assay%22&amp;amp;array= Microarray data]&lt;br /&gt;
**What experiment was performed? What was the &amp;quot;treatment&amp;quot; and the &amp;quot;control&amp;quot;? &lt;br /&gt;
***The experiment performed was to identify overlooked small RNAs (sRNAs) and small open reading frames (sORFs) in &amp;#039;&amp;#039;Shigella&amp;#039;&amp;#039; that were overlooked in the initial genome sequences. Microarrays were performed to search for sRNAs as well as RT-PCR and northern blots were used to identify sRNAs and regions for possible sRNAs. 64 sRNAs that were previously confirmed were used as controls. As a treatment, cells were harvested in the lag, log, and stationary phases at 37C in LB medium and then in the log and stationary phases at 37C in LB medium with 0.01% Congo red, a salt. &lt;br /&gt;
**Were replicate experiments of the &amp;quot;treatment&amp;quot; and &amp;quot;control&amp;quot; conditions conducted? Were these biological or technical replicates? How many of each? Competitive hybridization was conducted three times for each condition. These were technical replicates because the conditions were different samples (treated differently) measured in different conditions. &lt;br /&gt;
&lt;br /&gt;
Waddell, C. D., Walter, T. J., Pacheco, S. A., Purdy, G. E., &amp;amp; Runyen-Janecky, L. J. (2014). NtrBC and Nac Contribute to Efficient Shigella flexneri Intracellular Replication. Journal of Bacteriology, 196(14), 2578–2586. http://doi.org/10.1128/JB.01613-14&lt;br /&gt;
*PubMed Abstract: [http://www.ncbi.nlm.nih.gov/pubmed/?term=Shigella+flexneri+ntrBC+and+nac+mutant+expression+analysis Abstract]&lt;br /&gt;
*PubMedCentral: [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4097594/ PMC]&lt;br /&gt;
*Publisher Full Text (HTML format): [http://jb.asm.org/content/196/14/2578.long HTML]&lt;br /&gt;
*Publisher Full Text (PDF): [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4097594/pdf/zjb2578.pdf PDF]&lt;br /&gt;
*Copyright: 2014 American Society for Microbiology. The ASM is a non-profit organization with numerous publications, some of which are open access and some of which are not. &lt;br /&gt;
*Publisher: American Society for Microbiology&lt;br /&gt;
**Is the article available under &amp;quot;Open Access&amp;quot;? It is available open access after 6 months.&lt;br /&gt;
*Availability: online and in print&lt;br /&gt;
*Did LMU pay a fee for this article: no&lt;br /&gt;
*Database used to find the data and article: ArrayExpress&lt;br /&gt;
*Terms searched: &amp;#039;&amp;#039;Shigella flexneri&amp;#039;&amp;#039;&lt;br /&gt;
**Filtered by organism: &amp;#039;&amp;#039;Shigella flexneri&amp;#039;&amp;#039;&lt;br /&gt;
**Filtered by experiment type: RNA assay, array assay&lt;br /&gt;
*Search overview&lt;br /&gt;
**Results: 7&lt;br /&gt;
**Assessment: All of the articles were relevant but not all had enough assays to be able to be used for this assignment. All involved transcription profiling by array but obviously the experiments differed. Expression analysis was used to examine an RNA polymerase inhibitor, comparing wild type to mutant &amp;#039;&amp;#039;Shigella&amp;#039;&amp;#039;, and virulence plasmid-cured strains amongst others.&lt;br /&gt;
*Search in Web of Knowledge&lt;br /&gt;
**Number of articles this article cites: 70&lt;br /&gt;
**Number of times this article has been cited: 0&lt;br /&gt;
**What research directions have been taken since this article has been published? This article has not been cited at all. It was published in July 2014 (pretty recently), which may contribute to this.&lt;br /&gt;
**link to [https://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-49939/samples/?keywords=shigella+flexneri&amp;amp;organism=Shigella+flexneri&amp;amp;exptype%5B%5D=%22rna+assay%22&amp;amp;exptype%5B%5D=%22array+assay%22&amp;amp;array= microarray data]&lt;br /&gt;
**What experiment was performed? What was the &amp;quot;treatment&amp;quot; and the &amp;quot;control&amp;quot;? &lt;br /&gt;
***The experimenters examined 12 two-component regulatory systems in &amp;#039;&amp;#039;Shigella flexneri&amp;#039;&amp;#039; for their abilities to sense changes in environmental conditions and regulate gene expression in response. Virulence was testing by infecting Henle cells with wild type and mutant TCRS. They found four systems required for the formation of plaque in wild-type and microarray analysis was performed to identify which genes were regulated differently by the NtrBC system or by Nac.&lt;br /&gt;
***The treatment for this experiment was to create &amp;#039;&amp;#039;Shigella&amp;#039;&amp;#039; TCRS mutants using phages or transduction and to test their effectiveness in invading Henle cells. Assays were then done to compare gene expression in these mutants with wild type &amp;#039;&amp;#039;Shigella&amp;#039;&amp;#039; TCRS. The control for this experiment was DNA-ase treated RNA and assays performed with avirulent strains of &amp;#039;&amp;#039;Shigella&amp;#039;&amp;#039;. &lt;br /&gt;
**Were replicate experiments of the &amp;quot;treatment&amp;quot; and &amp;quot;control&amp;quot; conditions conducted? Were these biological or technical replicates? How many of each? Assays were conducted three times. These were technical replicates because the conditions were different samples.&lt;br /&gt;
&lt;br /&gt;
==== Erich Yanoschik ==== &lt;br /&gt;
&lt;br /&gt;
Global analysis of a plasmid-cured Shigella flexneri strain: new insights into the interaction between the chromosome and a virulence plasmid.&lt;br /&gt;
Li Zhu, Xiankai Liu, Xuexue Zheng, Xin Bu, Ge Zhao, Chaohua Xie, Jingfei Zhang, Na Li, Erling Feng, Jie Wang, Yongqiang Jiang, Peitang Huang, Hengliang Wang&lt;br /&gt;
J Proteome Res. 2010 February 5; 9(2): 843–854. doi: 10.1021/pr9007514&lt;br /&gt;
* PubMed Abstract: http://www.ncbi.nlm.nih.gov/pubmed?LinkName=gds_pubmed&amp;amp;from_uid=200012535&lt;br /&gt;
* PubMed Central:  N/A&lt;br /&gt;
* Publisher Full Text (HTML): http://pubs.acs.org/doi/full/10.1021/pr9007514&lt;br /&gt;
* Publisher Full Text (PDF):  http://pubs.acs.org/doi/pdf/10.1021/pr9007514&lt;br /&gt;
* Copyright:  2009 American Chemical Society&lt;br /&gt;
* Publisher:   Journal of Proteome Research&lt;br /&gt;
* Availability:  in print and online&lt;br /&gt;
* Did LMU pay a fee for this article: no&lt;br /&gt;
*The publisher is a sceintific society. The Journal of Proteome Research publishes content encompassing all aspects of global protein analysis and function, including the dynamic aspects of genomics, spatio-temporal proteomics, metabonomics and metabolomics, clinical and agricultural proteomics, as well as advances in methodology including bioinformatics. The theme and emphasis is on a multidisciplinary approach to the life sciences through the synergy between the different types of &amp;quot;omics&amp;quot;. -quote from the about section http://pubs.acs.org/page/jprobs/about.html&lt;br /&gt;
*Used the ISI Web of Science/Knowledge database to search this article&lt;br /&gt;
** The article has 28 cited references&lt;br /&gt;
** The article is cited 4 times &lt;br /&gt;
** Directions of research has been focused towards profiling which parts of the shigella flexneri genome is responsible for virulence and pathogenicity factors along with chromosomal inactivation.&lt;br /&gt;
# Global patterns of &amp;#039;&amp;#039;&amp;#039;&amp;#039;&amp;#039;gene expression&amp;#039;&amp;#039;&amp;#039;&amp;#039;&amp;#039; of a virulence cured plasmid strain compared with the wild-type strain were analyzed using 2-DE combined with MALDI-TOF MS.&lt;br /&gt;
#* There are 6 biological replicates total. &lt;br /&gt;
#* The control sample is derived from mRNA&lt;br /&gt;
# Overview of Search Results&lt;br /&gt;
#* The results of the search mainly consisted of E.coli and Shigella flexneri transcriptional profiling.&lt;br /&gt;
#* There are 178 results in the GEO DataSets Database and 22283 in GEO profiles database.&lt;br /&gt;
#* The results were mostly relevant, the first results were datasets. Anything related to the bacteria came up, the order was seemingly relevant.&lt;br /&gt;
#** The micro array data can be found http://pubs.acs.org/doi/abs/10.1021/pr9007514&lt;br /&gt;
# The experiment was contrasting the pathegenicity of a virulence cured plasmid strain versus a wild type shigella flexneri, a virulence plasmid cured strain was constructed through plasmid incompatibility. The control was the wild type Shigella flexneri strain in each experimental construct.&lt;br /&gt;
#* There were at least 3 biological replicates of each experiment conducted and 2 techincal replicates.&lt;br /&gt;
&lt;br /&gt;
==== Trixie ====&lt;br /&gt;
Morris, Carolyn R, et al. ‘Characterization of Intracellular Growth Regulator IcgR by Utilizing Transcriptomics to Identify Mediators of Pathogenesis in Shigella Flexneri’. Infection and Immunity 81.9 (Sep. 2013): 3068–3076. 6 Nov. 2015.&lt;br /&gt;
&lt;br /&gt;
* PubMed Abstract: http://www.ncbi.nlm.nih.gov/pubmed/?term=Characterization+of+Intracellular+Growth+Regulator+icgR+by+Utilizing+Transcriptomics+To+Identify+Mediators+of+Pathogenesis+in+Shigella+flexneri&lt;br /&gt;
* PubMed Central: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3754207/&lt;br /&gt;
* Publisher Full Text (HTML): http://iai.asm.org/content/81/9/3068.full&lt;br /&gt;
* Publisher Full Text (PDF): http://iai.asm.org/content/81/9/3068.full.pdf+html&lt;br /&gt;
* Copyright: 2013, American Society for Microbiology. All Rights Reserved.&lt;br /&gt;
* Publisher: American Society for Microbiology&lt;br /&gt;
* Availability: only online&lt;br /&gt;
* Did LMU pay a fee for this article: no&lt;br /&gt;
* doi: 10.1128/IAI.00537-13&lt;br /&gt;
&lt;br /&gt;
Database used to find the data and article: ArrayExpress&lt;br /&gt;
* Terms searched: &amp;#039;&amp;#039;Shigella flexneri&amp;#039;&amp;#039;&lt;br /&gt;
** Filtered by organism: &amp;#039;&amp;#039;Shigella flexneri&amp;#039;&amp;#039;&lt;br /&gt;
** Filtered by experiment type: RNA assay, array assay&lt;br /&gt;
* Search overview&lt;br /&gt;
** Results: 7&lt;br /&gt;
** Assessment: Some of the results only used 2-4 assays so we immediately felt suspicious as to the accuracy of the results they would provide. Out of the 7 results, 5 had 9 or more assays so we decided to look at those data.&lt;br /&gt;
&lt;br /&gt;
Web of Science:&lt;br /&gt;
* Link to microarray data: [http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-40851/samples/?keywords=%22Shigella+flexneri%22+&amp;amp;organism=Shigella+flexneri&amp;amp;exptype%5B%5D=%22rna+assay%22&amp;amp;exptype%5B%5D=%22array+assay%22&amp;amp;array= Microarray data]&lt;br /&gt;
* How many articles does this article cite? 2&lt;br /&gt;
* How many articles cite this article? 52&lt;br /&gt;
* Based on the titles and abstracts of the papers, what type of research directions have been taken now that the genome for that organism has been sequenced? &lt;br /&gt;
** Since the organism&amp;#039;s genome has been sequenced, new research about this specie now tends to focus more on its pathogenesis using bioinformatic methods with in vitro and in vivo microarray data. For example, the article &amp;quot;Analysis of the Proteome of Intracellular Shigella flexneri Reveals Pathways Important for Intracellular Growth&amp;quot; that cites this article analyzes the metabolic pathways that allow the organism to grow.&lt;br /&gt;
* What experiment was performed? What was the &amp;quot;treatment&amp;quot; and what was the &amp;quot;control&amp;quot; in the experiment? &lt;br /&gt;
** This experiment involved combining high-throughput bioinformatic methods with in vitro and in vivo assays to provide new insights into pathogenesis. The intracellular growth regulator was deleted in order to observe its effects and compare to the wild type, or the control in the experiment. The &amp;quot;treatment&amp;quot; involved culturing the strains in Luria broth or tryptic soy agar with Congo red (TSA/CR) medium supplemented with the appropriate antibiotics (15 μg/ml chloramphenicol, 50 μg/ml kanamycin, and 100 μg/ml ampicillin) and allowing them to invade colonic epithelial cells for a set period of time.&lt;br /&gt;
* Were replicate experiments of the &amp;quot;treatment&amp;quot; and &amp;quot;control&amp;quot; conditions conducted? Were these biological or technical replicates? How many of each?&lt;br /&gt;
** The experiment had both biological and technical replicates. Since the experiment involved analyzing the pathogenesis of the organism, the researchers tried deleting the gene they believe is involve in intracellular growth, which they called the icgR. In their documentation, they wrote that they compared the results of subjecting the ΔicgR strain (and its complement, ΔicgR(pSECicgR), or ΔicgR mutant transformed with pSECicgR) to certain conditions to the control, the wild type 2457T. In other words, the experiment involved 3 biological strains (namely the wild type, ΔicgR, and ΔicgR complement). 5 technical replicates were then conducted for each different strain, resulting in a grand total of 15 microarrays.&lt;br /&gt;
&lt;br /&gt;
====Jake====&lt;br /&gt;
&lt;br /&gt;
The complete bibliographic reference in the APA style (see the Writing LibGuide) You will be using one of three formats, “journal article from database (with DOI), journal article from database (no DOI) or journal article in print (no DOI).) --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Fu H, Liu L, Zhang X, Zhu Y, Zhao L, Peng J, et al. (2012) Common Changes in Global Gene Expression Induced by RNA Polymerase Inhibitors in &amp;#039;&amp;#039;shigella flexneri&amp;#039;&amp;#039;. PLoS ONE 7(3): e33240. doi:10.1371/journal.pone.0033240&lt;br /&gt;
&lt;br /&gt;
*The link to the [http://www.ncbi.nlm.nih.gov/pubmed/22428000 abstract]&lt;br /&gt;
*The link to the [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3299763/ full text of the article] in PubMed Central&lt;br /&gt;
*The link to the [http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0033240 full text of the article] (HTML format) from the publisher web site.&lt;br /&gt;
*The link to the [http://www.plosone.org/article/fetchObject.action?uri=info:doi/10.1371/journal.pone.0033240&amp;amp;representation=PDF full PDF version] of the article from the publisher web site.&lt;br /&gt;
*Copyright: © 2012 Fu et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.&lt;br /&gt;
*Does the journal own the copyright? NO&lt;br /&gt;
*Do the authors own the copyright? Yes&lt;br /&gt;
*Do the authors own the rights under a Creative Commons license? Yes&lt;br /&gt;
*Is the article available “Open Access”? Yes&lt;br /&gt;
*What organization is the publisher of the article? What type of organization is it? PLoS One is the publisher/Journal.  It hosts open access research articles. (Public Library of Science)&lt;br /&gt;
*Is this article available in print or online only? Online only&lt;br /&gt;
*Has LMU paid a subscription or other fee for your access to this article? No LMU has not paid a subscription or other fee because it is open access on the Public Library of Science.&lt;br /&gt;
*Use the genome sequencing article you found to perform a prospective search in the ISI Web of Science/Knowledge database.&lt;br /&gt;
**How many articles does this article cite? 25 cited references&lt;br /&gt;
**How many articles cite this article? 0 articles cite this article&lt;br /&gt;
**Based on the titles and abstracts of the papers, what type of research directions have been taken now that the genome for that organism has been sequenced?&lt;br /&gt;
*Well given that there are no papers that cite this paper there hasn&amp;#039;t been anything done to build on this specific topic.  In regards to the genome I think this paper has built on the work of the people who sequenced the first genome of Shigella flexneri as well as the other micro array papers.&lt;br /&gt;
*State which database you used to find the data and article: ArrayExpress&lt;br /&gt;
*State what you used as search terms and what type of search terms they were: &amp;quot;shigella flexneri&amp;quot; filtered by organism, experiment type: &amp;quot;rna assay&amp;quot;, experiment type: &amp;quot;array assay&amp;quot;&lt;br /&gt;
*Give an overview of the results of the search.&lt;br /&gt;
**How many results did you get? 7 results returned with 6 viable options due to the number assays.&lt;br /&gt;
**Give an assessment of how relevant the results were: Very relevant, 6/7 results were viable.&lt;br /&gt;
*Link to [http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-32978/?keywords=shigella+flexneri&amp;amp;organism=Shigella+flexneri&amp;amp;exptype%5B%5D=%22rna+assay%22&amp;amp;exptype%5B%5D=%22array+assay%22&amp;amp;array= microarray data]&lt;br /&gt;
*What experiment was performed? What was the &amp;quot;treatment&amp;quot; and what was the &amp;quot;control&amp;quot; in the experiment?&lt;br /&gt;
**Antibiotics (RNA Polymerase Inhibitors) were added to &amp;#039;&amp;#039;Shigella flexneri&amp;#039;&amp;#039; in order to see if bacteria became less active.  The control was a group of bacteria with no drugs added to them, and the treatment was a group of bacteria with drugs added to them.&lt;br /&gt;
*Were replicate experiments of the &amp;quot;treatment&amp;quot; and &amp;quot;control&amp;quot; conditions conducted? Were these biological or technical replicates? How many of each?&lt;br /&gt;
**There are two drugs RX and RP with 6 samples per drug. The experiment was run 3 times which yielded 36 assays. I believe that means 3 biological replicates and 12 technical replicates within each experiment, but I am not 100 percent sure.&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Troque_Week_15&amp;diff=7893</id>
		<title>Troque Week 15</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Troque_Week_15&amp;diff=7893"/>
				<updated>2015-12-15T06:10:53Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: Edited this page&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Template:Troque}}&lt;br /&gt;
&lt;br /&gt;
== Export Information (Re-imported) Build 2 ==&lt;br /&gt;
Version of GenMAPP Builder:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; gmbuilder-3.0.0-build-5 &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Computer on which export was run: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; Front of the room, 3rd computer from the right. &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Postgres Database name: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Shigella_flexneri_20151208&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
UniProt XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): &amp;#039;&amp;#039;&amp;#039;UniProt release 2015_11&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* UniProt XML download link: &amp;#039;&amp;#039;&amp;#039;[http://www.uniprot.org/uniprot/?query=proteome:UP000001006 Click here]&amp;#039;&amp;#039;&amp;#039; &lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;4.43 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GO OBO-XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): &amp;#039;&amp;#039;&amp;#039;Version created on 11/19/2015 (at 2:24 AM)&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GO OBO-XML download link: &amp;#039;&amp;#039;&amp;#039;[http://archive.geneontology.org/latest-termdb/go_daily-termdb.obo-xml.gz Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;6.84 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to process: &amp;#039;&amp;#039;&amp;#039;5.49 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GOA filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): &amp;#039;&amp;#039;&amp;#039;Version released on .&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GOA download link: &amp;#039;&amp;#039;&amp;#039;[http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/103.S_flexneri_301.goa Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;0.06 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
Name of .gdb file (give filename and upload and link to compressed file): &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151208.gdb | Sf-Std 20151208.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039;1 hour, 38 minutes, 42 seconds  &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;4:30:59 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039;6:09:41 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
The reason why I had to re-import everything into a new database is because the one I have been using had some files imported twice. Thus, the reports given by PostGres were all twice as much.&lt;br /&gt;
&lt;br /&gt;
== Using TallyEngine ==&lt;br /&gt;
* The database used is the same one described in the section above: &amp;#039;&amp;#039;&amp;#039;Shigella_flexneri_20151208&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Notice in the image below that there is an error in the cells. It turns out that we did not even need to add the Ordered Locus since that was the default. We will definitely need to do one last build in order to fix that issue.&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Shigella flexneri tallyEngine results build 2.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Using XMLPipeDB match to Validate the XML Results from the TallyEngine ==&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:regex1_OTS.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:regex2_OTS.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* When added together, the results becomes 7566 + 3 = 7569.&lt;br /&gt;
&lt;br /&gt;
== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine ==&lt;br /&gt;
* The following command in PostGreSQL resulted in 7567 entries:&lt;br /&gt;
 select value from genenametype where type = &amp;#039;ordered locus&amp;#039; and value ~ &amp;#039;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&amp;#039;;&lt;br /&gt;
* The following command resulted in 214 entries:&lt;br /&gt;
 select value from genenametype where type = &amp;#039;ORF&amp;#039; and value ~ &amp;#039;(CP|SF?)(_p)?[0-9][0-9][0-9][0-9](\.[0-9])?&amp;#039;;&lt;br /&gt;
&lt;br /&gt;
== OriginalRowCounts Comparison == &lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Ms access originalrowcounts.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* The OrderedLocusNames row seems to report on the same number of IDs as our previous builds&lt;br /&gt;
&lt;br /&gt;
== Visual Inspection ==&lt;br /&gt;
Perform visual inspection of individual tables to see if there are any problems.&lt;br /&gt;
&lt;br /&gt;
* Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?&lt;br /&gt;
** Yes, there are dates present for GeneOntology, InterPro, GeneID, RefSeq, UniProt, EMBL, PDB, Pfam, OrderedLocusNames, and EnsemblBacteria.&lt;br /&gt;
* Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?&lt;br /&gt;
** Yes, all of them seem to follow the same format (there ares more or less, 3 variations on the IDs for each of the tables).&lt;br /&gt;
&lt;br /&gt;
== Excel Inspection ==&lt;br /&gt;
* [[Media:In-search-of-the-missing-ids.xlsx| Excel file]]&lt;br /&gt;
&lt;br /&gt;
== Export Information (final) ==&lt;br /&gt;
* Date: 12/14/15&lt;br /&gt;
Name of .gdb file (give filename and upload and link to compressed file): &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151208.gdb | Sf-Std 20151208.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039;1 hour, 38 minutes, 42 seconds  &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;9:35 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039; PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
== Powerpoint Presentation Meetings ==&lt;br /&gt;
* Our group met on 12/14/15 in order to complete the slides by midnight.&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Troque_Week_15&amp;diff=7628</id>
		<title>Troque Week 15</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Troque_Week_15&amp;diff=7628"/>
				<updated>2015-12-11T01:00:13Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: linked file&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Template:Troque}}&lt;br /&gt;
&lt;br /&gt;
== Export Information (Re-imported) Build 2 ==&lt;br /&gt;
Version of GenMAPP Builder:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; gmbuilder-3.0.0-build-5 &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Computer on which export was run: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; Front of the room, 3rd computer from the right. &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Postgres Database name: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Shigella_flexneri_20151208&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
UniProt XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): &amp;#039;&amp;#039;&amp;#039;UniProt release 2015_11&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* UniProt XML download link: &amp;#039;&amp;#039;&amp;#039;[http://www.uniprot.org/uniprot/?query=proteome:UP000001006 Click here]&amp;#039;&amp;#039;&amp;#039; &lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;4.43 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GO OBO-XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): &amp;#039;&amp;#039;&amp;#039;Version created on 11/19/2015 (at 2:24 AM)&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GO OBO-XML download link: &amp;#039;&amp;#039;&amp;#039;[http://archive.geneontology.org/latest-termdb/go_daily-termdb.obo-xml.gz Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;6.84 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to process: &amp;#039;&amp;#039;&amp;#039;5.49 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GOA filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): &amp;#039;&amp;#039;&amp;#039;Version released on .&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GOA download link: &amp;#039;&amp;#039;&amp;#039;[http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/103.S_flexneri_301.goa Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;0.06 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
Name of .gdb file (give filename and upload and link to compressed file): &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151208.gdb | Sf-Std 20151208.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039;1 hour, 38 minutes, 42 seconds  &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;4:30:59 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039;6:09:41 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
The reason why I had to re-import everything into a new database is because the one I have been using had some files imported twice. Thus, the reports given by PostGres were all twice as much.&lt;br /&gt;
&lt;br /&gt;
== Using TallyEngine ==&lt;br /&gt;
* The database used is the same one described in the section above: &amp;#039;&amp;#039;&amp;#039;Shigella_flexneri_20151208&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Notice in the image below that there is an error in the cells. It turns out that we did not even need to add the Ordered Locus since that was the default. We will definitely need to do one last build in order to fix that issue.&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Shigella flexneri tallyEngine results build 2.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Using XMLPipeDB match to Validate the XML Results from the TallyEngine ==&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:regex1_OTS.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:regex2_OTS.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* When added together, the results becomes 7566 + 3 = 7569.&lt;br /&gt;
&lt;br /&gt;
== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine ==&lt;br /&gt;
* The following command in PostGreSQL resulted in 7567 entries:&lt;br /&gt;
 select value from genenametype where type = &amp;#039;ordered locus&amp;#039; and value ~ &amp;#039;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&amp;#039;;&lt;br /&gt;
* The following command resulted in 214 entries:&lt;br /&gt;
 select value from genenametype where type = &amp;#039;ORF&amp;#039; and value ~ &amp;#039;(CP|SF?)(_p)?[0-9][0-9][0-9][0-9](\.[0-9])?&amp;#039;;&lt;br /&gt;
&lt;br /&gt;
== OriginalRowCounts Comparison == &lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Ms access originalrowcounts.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Visual Inspection ==&lt;br /&gt;
Perform visual inspection of individual tables to see if there are any problems.&lt;br /&gt;
&lt;br /&gt;
* Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?&lt;br /&gt;
**Yes, there are dates present for GeneOntology, InterPro, GeneID, RefSeq, UniProt, EMBL, PDB, Pfam, OrderedLocusNames, and EnsemblBacteria.&lt;br /&gt;
* Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?&lt;br /&gt;
**&lt;br /&gt;
&lt;br /&gt;
== Excel Inspection ==&lt;br /&gt;
* [[Media:In-search-of-the-missing-ids.xlsx| Excel file]]&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:In-search-of-the-missing-ids.xlsx&amp;diff=7627</id>
		<title>File:In-search-of-the-missing-ids.xlsx</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:In-search-of-the-missing-ids.xlsx&amp;diff=7627"/>
				<updated>2015-12-11T00:58:10Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: uploading&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;uploading&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Simple_ordered_locus_names.png&amp;diff=7576</id>
		<title>File:Simple ordered locus names.png</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Simple_ordered_locus_names.png&amp;diff=7576"/>
				<updated>2015-12-09T11:59:44Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: uploading&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;uploading&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Id_misnomers2.png&amp;diff=7575</id>
		<title>File:Id misnomers2.png</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Id_misnomers2.png&amp;diff=7575"/>
				<updated>2015-12-09T11:58:56Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: uploading&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;uploading&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Id_misnomers.png&amp;diff=7574</id>
		<title>File:Id misnomers.png</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Id_misnomers.png&amp;diff=7574"/>
				<updated>2015-12-09T11:58:44Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: uploading&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;uploading&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Dual_ordered_locus_names.png&amp;diff=7573</id>
		<title>File:Dual ordered locus names.png</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Dual_ordered_locus_names.png&amp;diff=7573"/>
				<updated>2015-12-09T11:58:30Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: uploading&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;uploading&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Match_pic.png&amp;diff=7572</id>
		<title>File:Match pic.png</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Match_pic.png&amp;diff=7572"/>
				<updated>2015-12-09T11:52:36Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: uploading match pic&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;uploading match pic&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Troque_Week_15&amp;diff=7571</id>
		<title>Troque Week 15</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Troque_Week_15&amp;diff=7571"/>
				<updated>2015-12-09T11:51:20Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: Added some sections that will need to be filled out later&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Template:Troque}}&lt;br /&gt;
&lt;br /&gt;
== Export Information (Re-imported) Build 2 ==&lt;br /&gt;
Version of GenMAPP Builder:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; gmbuilder-3.0.0-build-5 &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Computer on which export was run: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; Front of the room, 3rd computer from the right. &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Postgres Database name: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Shigella_flexneri_20151208&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
UniProt XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): &amp;#039;&amp;#039;&amp;#039;UniProt release 2015_11&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* UniProt XML download link: &amp;#039;&amp;#039;&amp;#039;[http://www.uniprot.org/uniprot/?query=proteome:UP000001006 Click here]&amp;#039;&amp;#039;&amp;#039; &lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;4.43 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GO OBO-XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): &amp;#039;&amp;#039;&amp;#039;Version created on 11/19/2015 (at 2:24 AM)&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GO OBO-XML download link: &amp;#039;&amp;#039;&amp;#039;[http://archive.geneontology.org/latest-termdb/go_daily-termdb.obo-xml.gz Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;6.84 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to process: &amp;#039;&amp;#039;&amp;#039;5.49 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GOA filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): &amp;#039;&amp;#039;&amp;#039;Version released on .&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GOA download link: &amp;#039;&amp;#039;&amp;#039;[http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/103.S_flexneri_301.goa Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;0.06 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
Name of .gdb file (give filename and upload and link to compressed file): &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151208.gdb | Sf-Std 20151208.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039;1 hour, 38 minutes, 42 seconds  &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;4:30:59 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039;6:09:41 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
The reason why I had to re-import everything into a new database is because the one I have been using had some files imported twice. Thus, the reports given by PostGres were all twice as much.&lt;br /&gt;
&lt;br /&gt;
== Using TallyEngine ==&lt;br /&gt;
* The database used is the same one described in the section above: &amp;#039;&amp;#039;&amp;#039;Shigella_flexneri_20151208&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Notice in the image below that there is an error in the cells. It turns out that we did not even need to add the Ordered Locus since that was the default. We will definitely need to do one last build in order to fix that issue.&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Shigella flexneri tallyEngine results build 2.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Using XMLPipeDB match to Validate the XML Results from the TallyEngine ==&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:regex1_OTS.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:regex2_OTS.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* When added together, the results becomes 7566 + 3 = 7569.&lt;br /&gt;
&lt;br /&gt;
== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine ==&lt;br /&gt;
* The following command in PostGreSQL resulted in 7567 entries:&lt;br /&gt;
 select value from genenametype where type = &amp;#039;ordered locus&amp;#039; and value ~ &amp;#039;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&amp;#039;;&lt;br /&gt;
* The following command resulted in 214 entries:&lt;br /&gt;
 select value from genenametype where type = &amp;#039;ORF&amp;#039; and value ~ &amp;#039;(CP|SF?)(_p)?[0-9][0-9][0-9][0-9](\.[0-9])?&amp;#039;;&lt;br /&gt;
&lt;br /&gt;
== OriginalRowCounts Comparison == &lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Ms access originalrowcounts.png]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Visual Inspection ==&lt;br /&gt;
Perform visual inspection of individual tables to see if there are any problems.&lt;br /&gt;
&lt;br /&gt;
* Look at the Systems table. Is there a date in the Date field for all gene ID systems present in the database?&lt;br /&gt;
**Yes, there are dates present for GeneOntology, InterPro, GeneID, RefSeq, UniProt, EMBL, PDB, Pfam, OrderedLocusNames, and EnsemblBacteria.&lt;br /&gt;
* Open the UniProt, RefSeq, and OrderedLocusNames tables. Scroll down through the table. Do all of the IDs look like they take the correct form for that type of ID?&lt;br /&gt;
**&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Ms_access_originalrowcounts.png&amp;diff=7570</id>
		<title>File:Ms access originalrowcounts.png</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Ms_access_originalrowcounts.png&amp;diff=7570"/>
				<updated>2015-12-09T11:45:50Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: uploading&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;uploading&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Regex2_OTS.png&amp;diff=7569</id>
		<title>File:Regex2 OTS.png</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Regex2_OTS.png&amp;diff=7569"/>
				<updated>2015-12-09T11:33:30Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: Uploading regex2&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Uploading regex2&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Regex1_OTS.png&amp;diff=7568</id>
		<title>File:Regex1 OTS.png</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Regex1_OTS.png&amp;diff=7568"/>
				<updated>2015-12-09T11:33:14Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: Uploading regex1&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Uploading regex1&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Shigella_flexneri_tallyEngine_results_build_2.png&amp;diff=7567</id>
		<title>File:Shigella flexneri tallyEngine results build 2.png</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Shigella_flexneri_tallyEngine_results_build_2.png&amp;diff=7567"/>
				<updated>2015-12-09T11:28:58Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: uploading tally engine results&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;uploading tally engine results&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Troque_Week_15&amp;diff=7566</id>
		<title>Troque Week 15</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Troque_Week_15&amp;diff=7566"/>
				<updated>2015-12-09T11:25:13Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: Slightly modified the header&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Template:Troque}}&lt;br /&gt;
&lt;br /&gt;
== Export Information (Re-imported) Build 2 ==&lt;br /&gt;
Version of GenMAPP Builder:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; gmbuilder-3.0.0-build-5 &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Computer on which export was run: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; Front of the room, 3rd computer from the right. &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Postgres Database name: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Shigella_flexneri_20151208&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
UniProt XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): &amp;#039;&amp;#039;&amp;#039;UniProt release 2015_11&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* UniProt XML download link: &amp;#039;&amp;#039;&amp;#039;[http://www.uniprot.org/uniprot/?query=proteome:UP000001006 Click here]&amp;#039;&amp;#039;&amp;#039; &lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;4.43 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GO OBO-XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): &amp;#039;&amp;#039;&amp;#039;Version created on 11/19/2015 (at 2:24 AM)&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GO OBO-XML download link: &amp;#039;&amp;#039;&amp;#039;[http://archive.geneontology.org/latest-termdb/go_daily-termdb.obo-xml.gz Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;6.84 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to process: &amp;#039;&amp;#039;&amp;#039;5.49 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GOA filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): &amp;#039;&amp;#039;&amp;#039;Version released on .&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GOA download link: &amp;#039;&amp;#039;&amp;#039;[http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/103.S_flexneri_301.goa Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;0.06 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
Name of .gdb file (give filename and upload and link to compressed file): &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151208.gdb | Sf-Std 20151208.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039;1 hour, 38 minutes, 42 seconds  &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;4:30:59 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039;6:09:41 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
The reason why I had to re-import everything into a new database is because the one I have been using had some files imported twice. Thus, the reports given by PostGres were all twice as much.&lt;br /&gt;
&lt;br /&gt;
{{Template:Troque_Journal}}&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Troque_Week_15&amp;diff=7565</id>
		<title>Troque Week 15</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Troque_Week_15&amp;diff=7565"/>
				<updated>2015-12-09T10:15:18Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: /* Export Information */ Added note at the end&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Template:Troque}}&lt;br /&gt;
&lt;br /&gt;
== Export Information (Re-imported) ==&lt;br /&gt;
Version of GenMAPP Builder:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; gmbuilder-3.0.0-build-5 &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Computer on which export was run: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; Front of the room, 3rd computer from the right. &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Postgres Database name: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Shigella_flexneri_20151208&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
UniProt XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): &amp;#039;&amp;#039;&amp;#039;UniProt release 2015_11&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* UniProt XML download link: &amp;#039;&amp;#039;&amp;#039;[http://www.uniprot.org/uniprot/?query=proteome:UP000001006 Click here]&amp;#039;&amp;#039;&amp;#039; &lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;4.43 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GO OBO-XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): &amp;#039;&amp;#039;&amp;#039;Version created on 11/19/2015 (at 2:24 AM)&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GO OBO-XML download link: &amp;#039;&amp;#039;&amp;#039;[http://archive.geneontology.org/latest-termdb/go_daily-termdb.obo-xml.gz Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;6.84 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to process: &amp;#039;&amp;#039;&amp;#039;5.49 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GOA filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): &amp;#039;&amp;#039;&amp;#039;Version released on .&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GOA download link: &amp;#039;&amp;#039;&amp;#039;[http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/103.S_flexneri_301.goa Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;0.06 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
Name of .gdb file (give filename and upload and link to compressed file): &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151208.gdb | Sf-Std 20151208.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039;1 hour, 38 minutes, 42 seconds  &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;4:30:59 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039;6:09:41 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
The reason why I had to re-import everything into a new database is because the one I have been using had some files imported twice. Thus, the reports given by PostGres were all twice as much.&lt;br /&gt;
&lt;br /&gt;
{{Template:Troque_Journal}}&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Troque_Week_15&amp;diff=7561</id>
		<title>Troque Week 15</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Troque_Week_15&amp;diff=7561"/>
				<updated>2015-12-09T03:16:43Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: /* Export Information */ Updated with all info&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Template:Troque}}&lt;br /&gt;
&lt;br /&gt;
== Export Information ==&lt;br /&gt;
Version of GenMAPP Builder:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; gmbuilder-3.0.0-build-5 &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Computer on which export was run: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; Front of the room, 3rd computer from the right. &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Postgres Database name: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Shigella_flexneri_20151208&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
UniProt XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): &amp;#039;&amp;#039;&amp;#039;UniProt release 2015_11&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* UniProt XML download link: &amp;#039;&amp;#039;&amp;#039;[http://www.uniprot.org/uniprot/?query=proteome:UP000001006 Click here]&amp;#039;&amp;#039;&amp;#039; &lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;4.43 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GO OBO-XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): &amp;#039;&amp;#039;&amp;#039;Version created on 11/19/2015 (at 2:24 AM)&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GO OBO-XML download link: &amp;#039;&amp;#039;&amp;#039;[http://archive.geneontology.org/latest-termdb/go_daily-termdb.obo-xml.gz Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;6.84 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to process: &amp;#039;&amp;#039;&amp;#039;5.49 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GOA filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): &amp;#039;&amp;#039;&amp;#039;Version released on .&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GOA download link: &amp;#039;&amp;#039;&amp;#039;[http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/103.S_flexneri_301.goa Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;0.06 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
Name of .gdb file (give filename and upload and link to compressed file): &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151208.gdb | Sf-Std 20151208.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039;1 hour, 38 minutes, 42 seconds  &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;4:30:59 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039;6:09:41 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Template:Troque_Journal}}&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Troque_Week_15&amp;diff=7560</id>
		<title>Troque Week 15</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Troque_Week_15&amp;diff=7560"/>
				<updated>2015-12-09T03:15:23Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: /* Export Information */ Linked export&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Template:Troque}}&lt;br /&gt;
&lt;br /&gt;
== Export Information ==&lt;br /&gt;
Version of GenMAPP Builder:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; gmbuilder-3.0.0-build-5 &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Computer on which export was run: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; Front of the room, 3rd computer from the right. &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Postgres Database name: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Shigella_flexneri_20151208&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
UniProt XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): &amp;#039;&amp;#039;&amp;#039;UniProt release 2015_11&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* UniProt XML download link: &amp;#039;&amp;#039;&amp;#039;[http://www.uniprot.org/uniprot/?query=proteome:UP000001006 Click here]&amp;#039;&amp;#039;&amp;#039; &lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;4.43 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GO OBO-XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): &amp;#039;&amp;#039;&amp;#039;Version created on 11/19/2015 (at 2:24 AM)&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GO OBO-XML download link: &amp;#039;&amp;#039;&amp;#039;[http://archive.geneontology.org/latest-termdb/go_daily-termdb.obo-xml.gz Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;6.84 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to process: &amp;#039;&amp;#039;&amp;#039;5.49 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GOA filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): &amp;#039;&amp;#039;&amp;#039;Version released on .&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GOA download link: &amp;#039;&amp;#039;&amp;#039;[http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/103.S_flexneri_301.goa Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;0.06 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
Name of .gdb file (give filename and upload and link to compressed file): &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151208.gdb | Sf-Std 20151208.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039;  &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;4:30:59 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039;6:09:41 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Template:Troque_Journal}}&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Sf-Std_20151208.gdb&amp;diff=7559</id>
		<title>File:Sf-Std 20151208.gdb</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Sf-Std_20151208.gdb&amp;diff=7559"/>
				<updated>2015-12-09T03:14:55Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: Uploading new gdb file&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Uploading new gdb file&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Troque_Week_15&amp;diff=7558</id>
		<title>Troque Week 15</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Troque_Week_15&amp;diff=7558"/>
				<updated>2015-12-09T03:08:58Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: /* Export Information */ Saved end time&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Template:Troque}}&lt;br /&gt;
&lt;br /&gt;
== Export Information ==&lt;br /&gt;
Version of GenMAPP Builder:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; gmbuilder-3.0.0-build-5 &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Computer on which export was run: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; Front of the room, 3rd computer from the right. &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Postgres Database name: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Shigella_flexneri_20151208&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
UniProt XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): &amp;#039;&amp;#039;&amp;#039;UniProt release 2015_11&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* UniProt XML download link: &amp;#039;&amp;#039;&amp;#039;[http://www.uniprot.org/uniprot/?query=proteome:UP000001006 Click here]&amp;#039;&amp;#039;&amp;#039; &lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;4.43 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GO OBO-XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): &amp;#039;&amp;#039;&amp;#039;Version created on 11/19/2015 (at 2:24 AM)&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GO OBO-XML download link: &amp;#039;&amp;#039;&amp;#039;[http://archive.geneontology.org/latest-termdb/go_daily-termdb.obo-xml.gz Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;6.84 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to process: &amp;#039;&amp;#039;&amp;#039;5.49 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GOA filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): &amp;#039;&amp;#039;&amp;#039;Version released on .&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GOA download link: &amp;#039;&amp;#039;&amp;#039;[http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/103.S_flexneri_301.goa Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;0.06 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
Name of .gdb file (give filename and upload and link to compressed file): &amp;#039;&amp;#039;&amp;#039;[[ | ]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039;  &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;4:30:59 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039;6:09:41 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Template:Troque_Journal}}&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Troque_Week_15&amp;diff=7546</id>
		<title>Troque Week 15</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Troque_Week_15&amp;diff=7546"/>
				<updated>2015-12-09T00:27:48Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: /* Export Information */ Filled up info&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Template:Troque}}&lt;br /&gt;
&lt;br /&gt;
== Export Information ==&lt;br /&gt;
Version of GenMAPP Builder:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; gmbuilder-3.0.0-build-5 &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Computer on which export was run: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; Front of the room, 3rd computer from the right. &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Postgres Database name: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Shigella_flexneri_20151208&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
UniProt XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): &amp;#039;&amp;#039;&amp;#039;UniProt release 2015_11&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* UniProt XML download link: &amp;#039;&amp;#039;&amp;#039;[http://www.uniprot.org/uniprot/?query=proteome:UP000001006 Click here]&amp;#039;&amp;#039;&amp;#039; &lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;4.43 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GO OBO-XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): &amp;#039;&amp;#039;&amp;#039;Version created on 11/19/2015 (at 2:24 AM)&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GO OBO-XML download link: &amp;#039;&amp;#039;&amp;#039;[http://archive.geneontology.org/latest-termdb/go_daily-termdb.obo-xml.gz Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;6.84 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to process: &amp;#039;&amp;#039;&amp;#039;5.49 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GOA filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): &amp;#039;&amp;#039;&amp;#039;Version released on .&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GOA download link: &amp;#039;&amp;#039;&amp;#039;[http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/103.S_flexneri_301.goa Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;0.06 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
Name of .gdb file (give filename and upload and link to compressed file): &amp;#039;&amp;#039;&amp;#039;[[ | ]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039;  &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;4:30:59 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039; PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Template:Troque_Journal}}&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Troque_Week_15&amp;diff=7539</id>
		<title>Troque Week 15</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Troque_Week_15&amp;diff=7539"/>
				<updated>2015-12-09T00:11:55Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: Minor edit on template link&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Template:Troque}}&lt;br /&gt;
&lt;br /&gt;
== Export Information ==&lt;br /&gt;
Version of GenMAPP Builder:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; gmbuilder-3.0.0-build-5 &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Computer on which export was run: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; Front of the room, 3rd computer from the right. &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Postgres Database name: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Shigella_flexneri_20151208&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
UniProt XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): &amp;#039;&amp;#039;&amp;#039;UniProt release 2015_11&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* UniProt XML download link: &amp;#039;&amp;#039;&amp;#039;[http://www.uniprot.org/uniprot/?query=proteome:UP000001006 Click here]&amp;#039;&amp;#039;&amp;#039; &lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;4.43 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GO OBO-XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): &amp;#039;&amp;#039;&amp;#039;Version created on 11/19/2015 (at 2:24 AM)&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GO OBO-XML download link: &amp;#039;&amp;#039;&amp;#039;[http://archive.geneontology.org/latest-termdb/go_daily-termdb.obo-xml.gz Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039; minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to process: &amp;#039;&amp;#039;&amp;#039; minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GOA filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): &amp;#039;&amp;#039;&amp;#039;Version released on .&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GOA download link: &amp;#039;&amp;#039;&amp;#039;[http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/103.S_flexneri_301.goa Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039; minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
Name of .gdb file (give filename and upload and link to compressed file): &amp;#039;&amp;#039;&amp;#039;[[ | ]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039;  &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039; PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039; PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Template:Troque_Journal}}&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Troque_Week_15&amp;diff=7537</id>
		<title>Troque Week 15</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Troque_Week_15&amp;diff=7537"/>
				<updated>2015-12-09T00:11:28Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: Creating this page&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Template:Troque}}&lt;br /&gt;
&lt;br /&gt;
== Export Information ==&lt;br /&gt;
Version of GenMAPP Builder:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; gmbuilder-3.0.0-build-5 &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Computer on which export was run: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; Front of the room, 3rd computer from the right. &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Postgres Database name: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Shigella_flexneri_20151208&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
UniProt XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): &amp;#039;&amp;#039;&amp;#039;UniProt release 2015_11&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* UniProt XML download link: &amp;#039;&amp;#039;&amp;#039;[http://www.uniprot.org/uniprot/?query=proteome:UP000001006 Click here]&amp;#039;&amp;#039;&amp;#039; &lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;4.43 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GO OBO-XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): &amp;#039;&amp;#039;&amp;#039;Version created on 11/19/2015 (at 2:24 AM)&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GO OBO-XML download link: &amp;#039;&amp;#039;&amp;#039;[http://archive.geneontology.org/latest-termdb/go_daily-termdb.obo-xml.gz Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039; minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to process: &amp;#039;&amp;#039;&amp;#039; minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GOA filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): &amp;#039;&amp;#039;&amp;#039;Version released on .&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GOA download link: &amp;#039;&amp;#039;&amp;#039;[http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/103.S_flexneri_301.goa Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039; minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
Name of .gdb file (give filename and upload and link to compressed file): &amp;#039;&amp;#039;&amp;#039;[[ | ]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039;  &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039; PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039; PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{Template:Troque:Journal}}&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Gene_Database_Testing_Report_-_Oregon_Trail_Survivors&amp;diff=7514</id>
		<title>Gene Database Testing Report - Oregon Trail Survivors</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Gene_Database_Testing_Report_-_Oregon_Trail_Survivors&amp;diff=7514"/>
				<updated>2015-12-08T22:41:31Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: /* Export Information for Build with Coder Changes */ Added Build 2&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Template:Oregon Trail Survivors}}&lt;br /&gt;
&lt;br /&gt;
== Things to note ==&lt;br /&gt;
* Taxonomy ID: 623&lt;br /&gt;
* UP000001006&lt;br /&gt;
* File management system: Wiki&lt;br /&gt;
&lt;br /&gt;
== Initial (Vanilla) Export Information ==&lt;br /&gt;
Version of GenMAPP Builder:&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; gmbuilder-3.0.0-build-5 &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Computer on which export was run: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039; Front of the room, 3rd computer from the right. &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Postgres Database name: &lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Shigella flexneri 20151911&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
UniProt XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): &amp;#039;&amp;#039;&amp;#039;UniProt release 2015_11&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* UniProt XML download link: &amp;#039;&amp;#039;&amp;#039;[http://www.uniprot.org/uniprot/?query=proteome:UP000001006 Click here]&amp;#039;&amp;#039;&amp;#039; &lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;4.48 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GO OBO-XML filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): &amp;#039;&amp;#039;&amp;#039;Version created on 11/19/2015 (at 2:24 AM)&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GO OBO-XML download link: &amp;#039;&amp;#039;&amp;#039;[http://archive.geneontology.org/latest-termdb/go_daily-termdb.obo-xml.gz Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;7.00 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to process: &amp;#039;&amp;#039;&amp;#039;4.99 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note: &lt;br /&gt;
&lt;br /&gt;
GOA filename (give filename and upload and link to compressed file): &lt;br /&gt;
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): &amp;#039;&amp;#039;&amp;#039;Version released on .&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* GOA download link: &amp;#039;&amp;#039;&amp;#039;[http://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/103.S_flexneri_301.goa Click here to download].&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to import: &amp;#039;&amp;#039;&amp;#039;0.06 minutes&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
Name of .gdb file (give filename and upload and link to compressed file): &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151119 OTS.gdb | Sf-Std_20151119_OTS.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039; 1 Hours, 32 Minutes, 33 Seconds &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;4:06:13 PM PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039;5:38:46 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
== Export Information for Build with Coder Changes ==&lt;br /&gt;
=== Build 1 ===&lt;br /&gt;
Name of .gdb file: &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151201.gdb | Sf-Std_20151201.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039; &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;4:19:22 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039; 8:30:08 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** I have confirmed that the necessary information in the .gdb file exist in the new build (e.g. the URL of the database we are using).&lt;br /&gt;
&lt;br /&gt;
=== Build 2 ===&lt;br /&gt;
Name of .gdb file: [[Media:Sf-Std 20151207.gdb | Sf-Std 20151207.gdb]]&lt;br /&gt;
* Date: &amp;#039;&amp;#039;&amp;#039; 12/7/15 &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039; 4 hours, 24 minutes and 1 second &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;9:13:45 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039; 1:37:46 AM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
==TallyEngine==&lt;br /&gt;
&lt;br /&gt;
* Run the TallyEngine in GenMAPP Builder and record the number of records for UniProt and GO in the XML data and in the Postgres databases.&lt;br /&gt;
** Choose the menu item Tallies &amp;gt; Run XML and Database Tallies for UniProt and GO...&lt;br /&gt;
** Choose the UniProt and GO OBO XML files that was uploaded from the previous sections of this assignment.&lt;br /&gt;
** Here is the screenshot of the tally result:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot; style=&amp;quot;width: auto; margin-left: auto; margin-right: auto;&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:TallyEngine results OTS 112115.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Using XMLPipeDB match to Validate the XML Results from the TallyEngine==&lt;br /&gt;
[[How_Do_I_Count_Thee%3F_Let_Me_Count_The_Ways | Follow the instructions found on this page to run XMLPipeDB match.]]&lt;br /&gt;
* In the Thawspace directory, I created a folder called &amp;quot;Shigella_flexneri_BioDB_2015&amp;quot; and created subfolders called &amp;quot;Source&amp;quot; and &amp;quot;Working&amp;quot; to store the source files (i.e., the compressed files) and the working files (i.e., the files I will actually be processing).&lt;br /&gt;
* As a result, I had to cd to these directories first before using the command for using Match.&lt;br /&gt;
** In order to change into the ThawSpace0\Shigella_flexneri_BioDB_2015\Working directory, use the following commands on the command prompt window:&lt;br /&gt;
 T: &amp;amp;&amp;amp; cd &amp;quot;Shigella_flexneri_BioDB_2015\Working&amp;quot;&lt;br /&gt;
* The command I used once inside the directory I want is:&lt;br /&gt;
 java -jar xmlpipedb-match-1.1.1/xmlpipedb-match-1.1.1.jar &amp;quot;SF[0-9][0-9][0-9][0-9]&amp;quot; &amp;lt; uniprot-proteome%3AUP000001006.xml&lt;br /&gt;
* The results are as follows:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Match results OTS 112115.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
These results did not match up with what the TallyEngine gave (TallyEngine: 7567 vs. Match: 4610)&lt;br /&gt;
* As a result, the commands would have to be modified somehow so that the numbers match: &amp;lt;code&amp;gt;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?(/|&amp;lt;/name&amp;gt;)&amp;lt;/code&amp;gt;&lt;br /&gt;
* The overall command to write to a text file is as follows:&lt;br /&gt;
 java -jar xmlpipedb-match-1.1.1/xmlpipedb-match-1.1.1.jar &amp;quot;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?(/|&amp;lt;/name&amp;gt;)&amp;quot; &amp;lt; uniprot-proteome%3AUP000001006.xml &amp;gt; shigella_flexneri_results.txt&lt;br /&gt;
* Then our results became:&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Match results OTS 20151203 more accurate.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* Observations: &lt;br /&gt;
** In order to lessen the number of matches, we had to add the end tag &amp;quot;&amp;lt;/name&amp;gt;&amp;quot; to our regular expression. This brought down the number of matches from over 8000, to just 7517. Since TallyEngine&amp;#039;s results were 7567, this means that 150 IDs were not being caught. In order to account for this, we had to add the genes with ID&amp;#039;s of the form CP#### (there were 50 instances of these), and those with the form SF####.# or S####.#. This led us to get 7566 gene IDs. &lt;br /&gt;
** When I looked at the IDs in Microsoft Access, the IDs total 7569. In order to account for this last piece of gene formatting, we also had to account for the genes with the form SF?####/SF?####. These 2 extra genes that were not accounted for by TallyEngine is actually not supposed to be separated since the genes are formatted such that it can be interpreted that the IDs are interchangeable. When the gdb file was created, it would seem that these genes have been split down the &amp;quot;/&amp;quot;.&lt;br /&gt;
** In other words, there are 3 ordered locus names with formatting that is different from the rest: SF2223/SF2224, S2352/S2353, and S3359/S3360. &lt;br /&gt;
** I wasn&amp;#039;t able to exactly hit the number outputted by Tally Engine since there are other genes with the same format that were already caught with the patterns SF#### or S####.&lt;br /&gt;
** Note: It turns out the ShiBASE database only uses the pattern SF#### instead of S#### or CP#### so the regular expression would really have to be just &amp;lt;code&amp;gt;SF?[0-9][0-9][0-9][0-9](\.[0-9])?(/|&amp;lt;/name&amp;gt;)&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
== System IDs ==&lt;br /&gt;
* UniProt: A0A0H2[A-Z][A-Z][0-9][0-9], A0A0H2[A-Z][A-Z][A-Z][0-9], A0A0H2[A-Z][0-9][0-9][0-9], or [A-Z][0-9][A-Z][0-9][A-Z][0-9]&lt;br /&gt;
** Examples: A0A0H2USI9, A0A0H2USA4, A0A0H2V010, A5A6A8&lt;br /&gt;
* RefSeq: NP_######, WP_#########, YP_#########, YP_######&lt;br /&gt;
** Examples: NP_858405, WP_000002440, YP_001449236, YP_145811 (only one of this)&lt;br /&gt;
* GeneID (EntrezGene from NCBI):&lt;br /&gt;
* GO: #######&lt;br /&gt;
* OrderedLocusNames: CP####, S#### (or S####.#), and SF#### (or SF####.#)&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Using SQL Queries to Validate the PostgreSQL Database Results from the TallyEngine ==&lt;br /&gt;
* The command used to count the number of IDs is:&lt;br /&gt;
 select count(*) from genenametype where type = &amp;quot;ordered locus&amp;quot; and value ~ &amp;quot;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&amp;quot;;&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Postgres results OTS 20151203.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* The result above is exactly twice as much as the number of OrderedLocusNames from TallyEngine: 15134 / 2 = 7567 IDs&lt;br /&gt;
* A quick peek at the results after the command &amp;lt;code&amp;gt;select value from genenametype where type = &amp;#039;ordered locus&amp;#039; and value ~ &amp;#039;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&amp;#039;;&amp;lt;/code&amp;gt; is used and the results are exported to Excel reveals that this is because every single entry is entered twice: &lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Postgres results excel form OTS 20151203.jpg]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
* Adding the keyword &amp;quot;distinct&amp;quot; would resolve the double counting:&lt;br /&gt;
 select distinct value from genenametype where type = &amp;#039;ordered locus&amp;#039; and value ~ &amp;#039;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&lt;br /&gt;
&lt;br /&gt;
== Analysis == &lt;br /&gt;
* The total number of OrderedLocusNames in TallyEngine is &amp;#039;&amp;#039;&amp;#039;7567&amp;#039;&amp;#039;&amp;#039;.&lt;br /&gt;
* Using the (best I could) regular expression in Match, the result is &amp;#039;&amp;#039;&amp;#039;7573&amp;#039;&amp;#039;&amp;#039;. The additional 6 IDs emerged since those are originally captured by the regular expression &amp;lt;code&amp;gt;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?&amp;lt;/name&amp;gt;&amp;lt;/code&amp;gt; and trying to capture the IDs of the form &amp;lt;code&amp;gt;SF?####/SF?####&amp;lt;/code&amp;gt; would duplicate those captured IDs.&lt;br /&gt;
* The total of entries in PostGreSQL is &amp;#039;&amp;#039;&amp;#039;15134&amp;#039;&amp;#039;&amp;#039;, but this is only because each gene is repeated twice. As a result, diving by 2 would actually yield &amp;#039;&amp;#039;&amp;#039;7567&amp;#039;&amp;#039;&amp;#039;.&lt;br /&gt;
* Microsoft Access yielded &amp;#039;&amp;#039;&amp;#039;7569&amp;#039;&amp;#039;&amp;#039; in the OrderedLocusNames window. The extra 2 genes came from the IDs of the form &amp;lt;code&amp;gt;SF?####/SF?####&amp;lt;/code&amp;gt; since the export broke up the two IDs that represent the same ID.&lt;br /&gt;
** 49 are of the form &amp;lt;code&amp;gt;CP####&amp;lt;/code&amp;gt;&lt;br /&gt;
** 3413 are of the form &amp;lt;code&amp;gt;S####&amp;lt;/code&amp;gt;&lt;br /&gt;
*** 14 are of the form &amp;lt;code&amp;gt;S####.#&amp;lt;/code&amp;gt;&lt;br /&gt;
** 4107 are of the form &amp;lt;code&amp;gt;SF####&amp;lt;/code&amp;gt;&lt;br /&gt;
*** 35 are of the form &amp;lt;code&amp;gt;SF####.#&amp;lt;/code&amp;gt;&lt;br /&gt;
* Inspecting the UniProt XML file was necessary in identifying the IDs. Looking through what was inside, I discovered (with help from Dondi) that I had to add the end tag &amp;quot;&amp;lt;/name&amp;gt;&amp;quot; in order to narrow down the results in Match&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Troque_Week_14&amp;diff=7513</id>
		<title>Troque Week 14</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Troque_Week_14&amp;diff=7513"/>
				<updated>2015-12-08T22:40:51Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: /* Build 2 */ Updated Build 2&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Template:Troque}}&lt;br /&gt;
== Running New Builds ==&lt;br /&gt;
=== Build 1 ===&lt;br /&gt;
Name of .gdb file (give filename and upload and link to compressed file): &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151201.gdb | Sf-Std_20151201.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039; 4 hours, 10 minutes, 46 seconds &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;4:19:22 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039; 8:30:08 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
=== Build 2 ===&lt;br /&gt;
Name of .gdb file: [[Media:Sf-Std 20151207.gdb | Sf-Std 20151207.gdb]]&lt;br /&gt;
* Date: &amp;#039;&amp;#039;&amp;#039; 12/7/15 &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039; 4 hours, 24 minutes and 1 second &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;9:13:45 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039; 1:37:46 AM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
== Important Files ==&lt;br /&gt;
* [[Media: Shigella flexneri results.txt | Text file written from Match]]&lt;br /&gt;
* [[Media:Shigella flexneri OrderedLocusNames OTS 20151201.xlsx | Ordered Locus Names from Microsoft Access]]&lt;br /&gt;
&lt;br /&gt;
== Identifying the Gene IDs ==&lt;br /&gt;
* Regular expression: &amp;lt;code&amp;gt;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?(/|&amp;lt;/name&amp;gt;)&amp;lt;/code&amp;gt;&lt;br /&gt;
* Observations: &lt;br /&gt;
** In order to lessen the number of matches, we had to add the end tag &amp;quot;&amp;lt;/name&amp;gt;&amp;quot; to our regular expression. This brought down the number of matches from over 8000, to just 7517. Since TallyEngine&amp;#039;s results were 7567, this means that 150 IDs were not being caught. In order to account for this, we had to add the genes with ID&amp;#039;s of the form CP#### (there were 50 instances of these), and those with the form SF####.# or S####.#. This led us to get 7566 gene IDs. &lt;br /&gt;
** When I looked at the IDs in Microsoft Access, the IDs total 7569. In order to account for this last piece of gene formatting, we also had to account for the genes with the form SF?####/SF?####. These 2 extra genes that were not accounted for by TallyEngine is actually not supposed to be separated since the genes are formatted such that it can be interpreted that the IDs are interchangeable. When the gdb file was created, it would seem that these genes have been split down the &amp;quot;/&amp;quot;.&lt;br /&gt;
** In other words, there are 3 ordered locus names with formatting that is different from the rest: SF2223/SF2224, S2352/S2353, and S3359/S3360. &lt;br /&gt;
** I wasn&amp;#039;t able to exactly hit the number outputted by Tally Engine since there are other genes with the same format that were already caught with the patterns SF#### or S####.&lt;br /&gt;
** Note: It turns out the ShiBASE database only uses the pattern SF#### and CP#### instead of S#### so the regular expression would really have to be just &amp;lt;code&amp;gt;SF?[0-9][0-9][0-9][0-9](\.[0-9])?(/|&amp;lt;/name&amp;gt;)&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;FOR THE FULL REPORT ON IDENTIFYING THE ID, VISIT THE [[Gene_Database_Testing_Report_-_Oregon_Trail_Survivors | GENE DATABASE TESTING REPORT PAGE]].&lt;br /&gt;
&lt;br /&gt;
== Reflection ==&lt;br /&gt;
# What worked?&lt;br /&gt;
#* What worked in identifying the gene IDs is to look export .gdb file into Excel and compare with what the OrderedLocusNames table had (from Microsoft Access). From doing this, it was easier to find which genes were not found in the .gdb file and made it easier to look through them in the UniProt XML file. With the Excel file comparing the lists of gene IDs and using the CTRL+F shortcut, I was also able to discern which tags to include into the new builds for the databases. Because of this, I was able to confirm that some genes indeed do not exist in the XML file, while only a couple exist within the &amp;quot;dbReference&amp;quot; tag.&lt;br /&gt;
# What didn&amp;#039;t work?&lt;br /&gt;
#* What didn&amp;#039;t work is using Match multiple times without thinking. Even when I was trying to match the number of gene IDs with what Tally Engine gives me, Match didn&amp;#039;t really help me in identifying where to find the genes in the XML file.&lt;br /&gt;
# What will I do next to fix what didn&amp;#039;t work?&lt;br /&gt;
#* What I would do next to fix what didn&amp;#039;t work is to actually use Match in conjunction to the XML file, or just use the Excel method completely since that was actually more helpful in finding the necessary tags than the Match method.&lt;br /&gt;
&lt;br /&gt;
{{Template:Troque_Journal}}&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Sf-Std_20151207.gdb&amp;diff=7511</id>
		<title>File:Sf-Std 20151207.gdb</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=File:Sf-Std_20151207.gdb&amp;diff=7511"/>
				<updated>2015-12-08T22:40:20Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Troque_Week_14&amp;diff=7509</id>
		<title>Troque Week 14</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Troque_Week_14&amp;diff=7509"/>
				<updated>2015-12-08T22:38:09Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: /* Running New Builds */ Updated End time&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Template:Troque}}&lt;br /&gt;
== Running New Builds ==&lt;br /&gt;
=== Build 1 ===&lt;br /&gt;
Name of .gdb file (give filename and upload and link to compressed file): &amp;#039;&amp;#039;&amp;#039;[[Media:Sf-Std 20151201.gdb | Sf-Std_20151201.gdb]]&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039; 4 hours, 10 minutes, 46 seconds &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;4:19:22 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039; 8:30:08 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
=== Build 2 ===&lt;br /&gt;
Name of .gdb file:&lt;br /&gt;
* Date: &amp;#039;&amp;#039;&amp;#039; 12/7/15 &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Time taken to export: &amp;#039;&amp;#039;&amp;#039; &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Start time: &amp;#039;&amp;#039;&amp;#039;9:13:45 PM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** End time: &amp;#039;&amp;#039;&amp;#039; 1:37:46 AM PDT&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Note:&lt;br /&gt;
&lt;br /&gt;
== Important Files ==&lt;br /&gt;
* [[Media: Shigella flexneri results.txt | Text file written from Match]]&lt;br /&gt;
* [[Media:Shigella flexneri OrderedLocusNames OTS 20151201.xlsx | Ordered Locus Names from Microsoft Access]]&lt;br /&gt;
&lt;br /&gt;
== Identifying the Gene IDs ==&lt;br /&gt;
* Regular expression: &amp;lt;code&amp;gt;(CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?(/|&amp;lt;/name&amp;gt;)&amp;lt;/code&amp;gt;&lt;br /&gt;
* Observations: &lt;br /&gt;
** In order to lessen the number of matches, we had to add the end tag &amp;quot;&amp;lt;/name&amp;gt;&amp;quot; to our regular expression. This brought down the number of matches from over 8000, to just 7517. Since TallyEngine&amp;#039;s results were 7567, this means that 150 IDs were not being caught. In order to account for this, we had to add the genes with ID&amp;#039;s of the form CP#### (there were 50 instances of these), and those with the form SF####.# or S####.#. This led us to get 7566 gene IDs. &lt;br /&gt;
** When I looked at the IDs in Microsoft Access, the IDs total 7569. In order to account for this last piece of gene formatting, we also had to account for the genes with the form SF?####/SF?####. These 2 extra genes that were not accounted for by TallyEngine is actually not supposed to be separated since the genes are formatted such that it can be interpreted that the IDs are interchangeable. When the gdb file was created, it would seem that these genes have been split down the &amp;quot;/&amp;quot;.&lt;br /&gt;
** In other words, there are 3 ordered locus names with formatting that is different from the rest: SF2223/SF2224, S2352/S2353, and S3359/S3360. &lt;br /&gt;
** I wasn&amp;#039;t able to exactly hit the number outputted by Tally Engine since there are other genes with the same format that were already caught with the patterns SF#### or S####.&lt;br /&gt;
** Note: It turns out the ShiBASE database only uses the pattern SF#### and CP#### instead of S#### so the regular expression would really have to be just &amp;lt;code&amp;gt;SF?[0-9][0-9][0-9][0-9](\.[0-9])?(/|&amp;lt;/name&amp;gt;)&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;FOR THE FULL REPORT ON IDENTIFYING THE ID, VISIT THE [[Gene_Database_Testing_Report_-_Oregon_Trail_Survivors | GENE DATABASE TESTING REPORT PAGE]].&lt;br /&gt;
&lt;br /&gt;
== Reflection ==&lt;br /&gt;
# What worked?&lt;br /&gt;
#* What worked in identifying the gene IDs is to look export .gdb file into Excel and compare with what the OrderedLocusNames table had (from Microsoft Access). From doing this, it was easier to find which genes were not found in the .gdb file and made it easier to look through them in the UniProt XML file. With the Excel file comparing the lists of gene IDs and using the CTRL+F shortcut, I was also able to discern which tags to include into the new builds for the databases. Because of this, I was able to confirm that some genes indeed do not exist in the XML file, while only a couple exist within the &amp;quot;dbReference&amp;quot; tag.&lt;br /&gt;
# What didn&amp;#039;t work?&lt;br /&gt;
#* What didn&amp;#039;t work is using Match multiple times without thinking. Even when I was trying to match the number of gene IDs with what Tally Engine gives me, Match didn&amp;#039;t really help me in identifying where to find the genes in the XML file.&lt;br /&gt;
# What will I do next to fix what didn&amp;#039;t work?&lt;br /&gt;
#* What I would do next to fix what didn&amp;#039;t work is to actually use Match in conjunction to the XML file, or just use the Excel method completely since that was actually more helpful in finding the necessary tags than the Match method.&lt;br /&gt;
&lt;br /&gt;
{{Template:Troque_Journal}}&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	<entry>
		<id>https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Oregon_Trail_Survivors&amp;diff=7438</id>
		<title>Oregon Trail Survivors</title>
		<link rel="alternate" type="text/html" href="https://xmlpipedb.lmucs.io/biodb/fall2015/index.php?title=Oregon_Trail_Survivors&amp;diff=7438"/>
				<updated>2015-12-08T05:58:32Z</updated>
		
		<summary type="html">&lt;p&gt;Troque: /* Reflection */ Edited Trixie&amp;#039;s reflection&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;div style=&amp;quot;text-align: center; font-size: 250%; line-height: 1.25em&amp;quot;&amp;gt;&amp;#039;&amp;#039;&amp;#039;Oregon Trail Survivors&amp;#039;&amp;#039;&amp;#039;&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div class=&amp;quot;center&amp;quot;&amp;gt;&lt;br /&gt;
[[Image:Oregon-trail-dysentery 5 biodb.jpg | thumb | right | 350px | The third leading cause of death in the Oregon Trail.]]&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Group Members ==&lt;br /&gt;
*Coder: [[User:Jwoodlee | Jake Woodlee]]&lt;br /&gt;
*Quality Assurance: [[User:Troque | Trixie Roque]]&lt;br /&gt;
*GenMAPP Users: [[User:Eyanosch | Erich Yanoschik]] &amp;amp; [[User:Kzebrows | Kristin Zebrowski]]&lt;br /&gt;
* Project Manager: [[User:Kzebrows | Kristin Zebrowski]]&lt;br /&gt;
&lt;br /&gt;
{{Template:Oregon Trail Survivors}}&lt;br /&gt;
&lt;br /&gt;
=== Presentation (QA/Coder) ===&lt;br /&gt;
* PDF can be seen [[Media: Genome Paper Presentation BioDB.pdf | here]]&lt;br /&gt;
&lt;br /&gt;
===Group Meeting Times===&lt;br /&gt;
Thursday, November 5th at 8:00 pm&lt;br /&gt;
&lt;br /&gt;
== Goals ==&lt;br /&gt;
Over the upcoming weeks our group will be investigating &amp;#039;&amp;#039;Shigella flexneri&amp;#039;&amp;#039;. &lt;br /&gt;
&lt;br /&gt;
====Week 10====&lt;br /&gt;
&lt;br /&gt;
# Find genome sequence paper&lt;br /&gt;
# Find 4-8 microarray data and paper that goes with the genome paper&lt;br /&gt;
# Compile team page to and create a ranked annotated bibliography&lt;br /&gt;
&lt;br /&gt;
====Week 11====&lt;br /&gt;
&lt;br /&gt;
#Prepare for journal club presentations in Weeks 12 and 13&lt;br /&gt;
#Begin initial tasks on research project&lt;br /&gt;
&lt;br /&gt;
Click on username links for more information regarding each team member&amp;#039;s contributions for Week 11.&lt;br /&gt;
&lt;br /&gt;
[[Jwoodlee Week 11 | Jake]]: Read through the genome paper and tried to get through the accessible things I had the ability to understand.  Made an outline for the genome paper. Worked on the presentation with Trixie and found a database.  And of course I answered the assigned questions.&lt;br /&gt;
&lt;br /&gt;
[[Troque Week 11 | Trixie]]: Mainly focused on the Genome paper presentation with Jake. This includes searching for a viable database that we will be using for the rest of the group assignment and actually creating the presentation we will be doing for October 17th, 2015. I&amp;#039;ve also updated our group page to reflect what Dr. Dahlquist suggested would improve our team page.&lt;br /&gt;
&lt;br /&gt;
[[Eyanosch Week 11 | Erich]]: Analyzed the microarray paper in order to describe the experimental design of the microarray data, treatments, number of replicates, and dye swaps. Worked with Kristin to produce the power point for the GennMAP users presentation at Journal Club. Worked on the individual journal entry and created an outline of the microarray paper.&lt;br /&gt;
&lt;br /&gt;
[[Kzebrows Week 11 | Kristin]]: Using the team&amp;#039;s selected microarray paper I developed an outline including background information, experimental outline/methods and how samples corresponded to the data, a brief description of the results, and a discussion including the implications of the research and its results in comparison to previous studies. Using this outline, I created a flow chart corresponding to the research. I also worked with Erich in order to create a PowerPoint for the Journal Club presentation on Nov. 24.&lt;br /&gt;
&lt;br /&gt;
==== Week 12 ====&lt;br /&gt;
#QA will be doing an initial database export. &lt;br /&gt;
#Coder will be setting up version control.&lt;br /&gt;
#GenMAPP users will compile the raw data from the micorarray file to prepare for normalization and statistic analysis (will begin if time permits after consultation with Dr. Dahlquist). Additionally, the GenMAPP users will be determining the number of biological or technical replicates and how samples were labeled.&lt;br /&gt;
#Coder and QA will present on genome paper in class Tuesday, Nov. 24. &lt;br /&gt;
&lt;br /&gt;
Click on username links for more information regarding each team member&amp;#039;s contributions for Week 12.&lt;br /&gt;
* [[Jwoodlee Week 12 | Jake]]:Setup my environment in eclipse, created the s-flexneri branch, created my own copy of GenMAPP that I can modify for later use and I cloned the repository with the Git commands.&lt;br /&gt;
* [[Troque Week 12 | Trixie]]: Finished the preliminary export of the XML and GOA files and the corresponding Gene Testing Report. Also started identifying the gene id&amp;#039;s for the specie. Decided on file management system with Jake.&lt;br /&gt;
* [[Eyanosch Week 12 | Erich]]: Worked with Kristin in determining the total number of biological and technical replicates. Compiled the raw data for RP samples, specifically the ID and Log ratio columns. Incorporated the RP and RX data into one spreadsheet with Kristins data. We created a table of the sample data and file each corresponds with, also figured out there were no dye swaps in the experiment(The control was the Cy3 dye and the treatment the Cy5 dye).&lt;br /&gt;
* [[Kzebrows Week 12 | Kristin]]: Determined that there were 3 biological replicates per treatment for 6 treatments total. Compiled raw data for RX samples by re-naming columns for ID and Log Ratio and putting into same worksheet, which was later combined with Erich&amp;#039;s worksheet for RP samples. Erich and I met and worked together to create a table of which samples correspond to which file.&lt;br /&gt;
&lt;br /&gt;
===Week 14===&lt;br /&gt;
#QA will be documenting the IDs using MATCH, Postgres, Microsoft Access, and Excel and get a head start of Milestone 3, which is customizing the TallyEngine.&lt;br /&gt;
#Coder will determine and document any modified export behavior that the GenMAPP Builder will have and resolve bugs. Coder will also work with QA by uploading GM Builder for additional export. &lt;br /&gt;
#GenMAPP Users will perform statistical analysis on Excel (normalization, tests) and format for import into GenMAPP. Users will also import data into GenMAPP and run MAPPFinder, and then document these test runs. &lt;br /&gt;
&lt;br /&gt;
Click on username links for more information regarding each team member&amp;#039;s contributions for Week 14.&lt;br /&gt;
* [[Jwoodlee Week 14 | Jake]]: Finished custom GenMAPP builder, committed to GitHub, and ran the export with the custom software.  This created a custom .gdb which was opened in Microsoft Access and GenMAPP to check for accuracy.&lt;br /&gt;
* [[Troque Week 14 | Trixie]]: Trixie has finished identifying the gene IDs using MATCH, Postgres, Microsoft Access, and Excel. It was discovered that some IDs are in &amp;quot;dbReference/property&amp;amp;type&amp;amp;gene ID&amp;quot;, and so another export was done on 12/7/15 to add the newly discovered gene IDs.&lt;br /&gt;
* [[Eyanosch Week 14 | Erich]]: &lt;br /&gt;
* [[Kzebrows Week 14 | Kristin]]: This week Erich and I made corrections from the talk page and normalized log ratios for the slides in the experiment. I completed the statistical analysis for RX samples and calculated the Bonferroni p value correction. I also performed a sanity check for the RX samples and, going off of that, I calculated the Benjamini &amp;amp; Hochberg p value correction for RX-1-30, which had the most statistically significant changes in gene expression. I also formatted and exported the file for GenMAPP, downloaded the database, and attempted to create color sets to run the data set through MappFINDER. &lt;br /&gt;
&lt;br /&gt;
==== Reflection ====&lt;br /&gt;
&lt;br /&gt;
Each team member should reflect on the team&amp;#039;s progress:&lt;br /&gt;
# What worked?&lt;br /&gt;
# What didn&amp;#039;t work?&lt;br /&gt;
# What will I do next to fix what didn&amp;#039;t work?&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;Kristin&amp;#039;&amp;#039;: &lt;br /&gt;
#What worked in terms of communication is having a group text. We also meet at least once a week outside of class in order to work together on the assignments and make sure we are all on the same page. So far, this has allowed us to troubleshoot and address bugs together as a team quickly. &lt;br /&gt;
#After creating the initial compiled raw data file, I had to make several corrections before the file could be run through GenMAPP. First of all, I had to get rid of the &amp;quot;.&amp;quot;, and I also had to change all #DIV/0! with a space character for the file to be read at all. Also, although we were unable to find all of the b#### and CP#### gene ID&amp;#039;s in UniProt or ShiBASE. Also, after creating my color set and trying to run MAPPFinder, I tried three computers and all of them crashed with the &amp;quot;not responding&amp;quot; message.&lt;br /&gt;
#I will communicate with the QA and Coder in order to create a database with a minimal number of &amp;quot;Gene ID not found&amp;#039;s&amp;quot; and then communicate with Erich when we try to run our dataset through MappFinder. Once the gene database is re-customized and the export is complete I can try and re-run my dataset to see if that makes a difference.&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039; Trixie &amp;#039;&amp;#039;:&lt;br /&gt;
# What worked?&lt;br /&gt;
#* What worked in identifying the gene IDs is to look export .gdb file into Excel and compare with what the OrderedLocusNames table had (from Microsoft Access). From doing this, it was easier to find which genes were not found in the .gdb file and made it easier to look through them in the UniProt XML file. With the Excel file comparing the lists of gene IDs and using the CTRL+F shortcut, I was also able to discern which tags to include into the new builds for the databases. Because of this, I was able to confirm that some genes indeed do not exist in the XML file, while only a couple exist within the &amp;quot;dbReference&amp;quot; tag. In terms of group work, what worked is posting all our files into a single page as we progress through the assignment. Night meetings were also helpful in order to better communicate with the rest of my group.&lt;br /&gt;
# What didn&amp;#039;t work?&lt;br /&gt;
#* What didn&amp;#039;t work is using Match multiple times without thinking. Even when I was trying to match the number of gene IDs with what Tally Engine gives me, Match didn&amp;#039;t really help me in identifying where to find the genes in the XML file. Waiting for the database to finish didn&amp;#039;t help much at all since our builds would take more than 4 hours to finish.&lt;br /&gt;
# What will I do next to fix what didn&amp;#039;t work?&lt;br /&gt;
#* What I would do next to fix what didn&amp;#039;t work is to actually use Match in conjunction to the XML file, or just use the Excel method completely since that was actually more helpful in finding the necessary tags than the Match method. I would probably have to time myself to check the lab after about 4.5 hours since one of our builds lasted that long.&lt;br /&gt;
&lt;br /&gt;
==Overview of Genome Paper==&lt;br /&gt;
*Used the genome sequencing article to perform a prospective search in the [https://apps.webofknowledge.com/UA_GeneralSearch_input.do?product=UA&amp;amp;search_mode=GeneralSearch&amp;amp;SID=1FRKcNxUgxiGX6spITI&amp;amp;preferencesSaved= Web of Science] database.&lt;br /&gt;
*Overview of the search:&lt;br /&gt;
**How many articles does this article cite? 37&lt;br /&gt;
**How many articles cite this article? 303&lt;br /&gt;
**Based on the titles and abstracts of the papers, what type of research directions have been taken now that the genome for that organism has been sequenced? &lt;br /&gt;
***Now that the genome has been sequenced, a majority of research has been done on discovering which genes are responsible for virulence and pathogenesis as well as potential antibiotics. Genomic research is also focused on how &amp;#039;&amp;#039;S. flexneri&amp;#039;&amp;#039; has been able to develop resistance to multiple drugs. Furthermore, &amp;#039;&amp;#039;Shigella&amp;#039;&amp;#039; is suspected to have evolved from &amp;#039;&amp;#039;Escherichia coli&amp;#039;&amp;#039; so a lot of research has been done in how and when pathogenic &amp;#039;&amp;#039;Shigella&amp;#039;&amp;#039; split from &amp;#039;&amp;#039;E. coli&amp;#039;&amp;#039; on the evolutionary tree.&lt;br /&gt;
&lt;br /&gt;
==Annotated Bibliography==&lt;br /&gt;
=== Genome Paper ===&lt;br /&gt;
Jin, Q., Yuan, Z., Xu, J., Wang, Y., Shen, Y., Lu, W., … Yu, J. (2002). Genome sequence of &amp;#039;&amp;#039;Shigella flexneri&amp;#039;&amp;#039; 2a: insights into pathogenicity through comparison with genomes of Escherichia coli K12 and O157. Nucleic Acids Research, 30(20), 4432–4441.&lt;br /&gt;
* PubMed Abstract: http://www.ncbi.nlm.nih.gov/pubmed/?term=Genome+sequence+of+Shigella+flexneri+2a%3A+insights+into+pathogenicity+through+comparison+with+genomes+of+Escherichia+coli+K12+and+O157&lt;br /&gt;
* PubMed Central:  http://www.ncbi.nlm.nih.gov/pmc/articles/PMC137130/&lt;br /&gt;
* Publisher Full Text (HTML):  http://nar.oxfordjournals.org/content/30/20/4432.full&lt;br /&gt;
* Publisher Full Text (PDF):  http://nar.oxfordjournals.org/content/30/20/4432.full.pdf+html&lt;br /&gt;
* Copyright:  2002 Oxford University Press&lt;br /&gt;
* Publisher:   Oxford University Press&lt;br /&gt;
* Availability:  in print and online&lt;br /&gt;
* Did LMU pay a fee for this article: no&lt;br /&gt;
&lt;br /&gt;
===Microarray Paper===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--====Paper Rankings====&lt;br /&gt;
&lt;br /&gt;
It would have been helpful for you to actually lit the papers in this ranked order.  &amp;#039;&amp;#039;&amp;amp;mdash; [[User:Kdahlquist|Kdahlquist]] ([[User talk:Kdahlquist|talk]]) 10:32, 10 November 2015 (PST)&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
#Fu H, Liu L, Zhang X, Zhu Y, Zhao L, Peng J, et al. (2012) Common Changes in Global Gene Expression Induced by RNA Polymerase Inhibitors in &amp;#039;&amp;#039;Shigella flexneri&amp;#039;&amp;#039;. PLoS ONE 7(3): e33240. doi:10.1371/journal.pone.0033240&lt;br /&gt;
#* This paper is suitable for your project.  &amp;#039;&amp;#039;&amp;amp;mdash; [[User:Kdahlquist|Kdahlquist]] ([[User talk:Kdahlquist|talk]]) 10:38, 10 November 2015 (PST)&amp;#039;&amp;#039;&lt;br /&gt;
#Morris, Carolyn R, et al. ‘Characterization of Intracellular Growth Regulator IcgR by Utilizing Transcriptomics to Identify Mediators of Pathogenesis in Shigella Flexneri’. Infection and Immunity 81.9 (Sep. 2013): 3068–3076. 6 Nov. 2015.&lt;br /&gt;
#Global analysis of a plasmid-cured Shigella flexneri strain: new insights into the interaction between the chromosome and a virulence plasmid. Li Zhu, Xiankai Liu, Xuexue Zheng, Xin Bu, Ge Zhao, Chaohua Xie, Jingfei Zhang, Na Li, Erling Feng, Jie Wang, Yongqiang Jiang, Peitang Huang, Hengliang Wang J Proteome Res. 2010 February 5; 9(2): 843–854. doi: 10.1021/pr9007514&lt;br /&gt;
#Peng J, Yang J, Jin Q (2011) An Integrated Approach for Finding Overlooked Genes in Shigella. PLoS ONE 6(4): e18509. doi: 10.1371/journal.pone.0018509&lt;br /&gt;
#Waddell, C. D., Walter, T. J., Pacheco, S. A., Purdy, G. E., &amp;amp; Runyen-Janecky, L. J. (2014). NtrBC and Nac Contribute to Efficient Shigella flexneri Intracellular Replication. Journal of Bacteriology, 196(14), 2578–2586. http://doi.org/10.1128/JB.01613-14&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- ==== Kristin ====&lt;br /&gt;
Peng J, Yang J, Jin Q (2011) An Integrated Approach for Finding Overlooked Genes in Shigella. PLoS ONE 6(4): e18509. doi: 10.1371/journal.pone.0018509&lt;br /&gt;
*PubMed Abstract: [http://www.ncbi.nlm.nih.gov/pubmed/21483688 Abstract]&lt;br /&gt;
*PubMedCentral: [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3071730/ PMC]&lt;br /&gt;
*Publisher Full Text (HTML format): [http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0018509 HTML]&lt;br /&gt;
*Publisher Full Text (PDF): [http://www.plosone.org/article/fetchObject.action?uri=info:doi/10.1371/journal.pone.0018509&amp;amp;representation=PDF PDF]&lt;br /&gt;
*Copyright: 2011 Peng et al. Article is Open Access and the authors own the copyright, not the journal, under a Creative Commons license.&lt;br /&gt;
*Publisher: PLOS One&lt;br /&gt;
**Is the article available under &amp;quot;Open Access&amp;quot;? Yes&lt;br /&gt;
*Availability: online only&lt;br /&gt;
*Did LMU pay a fee for this article: no&lt;br /&gt;
*Database used to find the data and article: ArrayExpress&lt;br /&gt;
*Terms searched: &amp;#039;&amp;#039;Shigella flexneri&amp;#039;&amp;#039;&lt;br /&gt;
**Filtered by organism: &amp;#039;&amp;#039;Shigella flexneri&amp;#039;&amp;#039;&lt;br /&gt;
**Filtered by experiment type: RNA assay, array assay&lt;br /&gt;
*Search overview&lt;br /&gt;
**Results: 7&lt;br /&gt;
**Assessment: All of the articles were relevant but not all had enough assays to be able to be used for this assignment. All involved transcription profiling by array but obviously the experiments differed. Expression analysis was used to examine an RNA polymerase inhibitor, comparing wild type to mutant &amp;#039;&amp;#039;Shigella&amp;#039;&amp;#039;, and virulence plasmid-cured strains amongst others.&lt;br /&gt;
*Search in Web of Knowledge&lt;br /&gt;
**Number of articles this article cites: 71&lt;br /&gt;
**Number of times this article has been cited: 1&lt;br /&gt;
**What research directions have been taken since this article has been published? The only article that cited this paper involved detecting infectious diarrheal diseases by chemiluminescence imaging. &lt;br /&gt;
**[https://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-22800/samples/?keywords=shigella+flexneri&amp;amp;organism=Shigella+flexneri&amp;amp;exptype%5B%5D=%22rna+assay%22&amp;amp;exptype%5B%5D=%22array+assay%22&amp;amp;array= Microarray data]&lt;br /&gt;
**What experiment was performed? What was the &amp;quot;treatment&amp;quot; and the &amp;quot;control&amp;quot;? &lt;br /&gt;
***The experiment performed was to identify overlooked small RNAs (sRNAs) and small open reading frames (sORFs) in &amp;#039;&amp;#039;Shigella&amp;#039;&amp;#039; that were overlooked in the initial genome sequences. Microarrays were performed to search for sRNAs as well as RT-PCR and northern blots were used to identify sRNAs and regions for possible sRNAs. 64 sRNAs that were previously confirmed were used as controls. As a treatment, cells were harvested in the lag, log, and stationary phases at 37C in LB medium and then in the log and stationary phases at 37C in LB medium with 0.01% Congo red, a salt. &lt;br /&gt;
**Were replicate experiments of the &amp;quot;treatment&amp;quot; and &amp;quot;control&amp;quot; conditions conducted? Were these biological or technical replicates? How many of each? Competitive hybridization was conducted three times for each condition. These were technical replicates because the conditions were different samples (treated differently) measured in different conditions. &lt;br /&gt;
&lt;br /&gt;
Waddell, C. D., Walter, T. J., Pacheco, S. A., Purdy, G. E., &amp;amp; Runyen-Janecky, L. J. (2014). NtrBC and Nac Contribute to Efficient Shigella flexneri Intracellular Replication. Journal of Bacteriology, 196(14), 2578–2586. http://doi.org/10.1128/JB.01613-14&lt;br /&gt;
*PubMed Abstract: [http://www.ncbi.nlm.nih.gov/pubmed/?term=Shigella+flexneri+ntrBC+and+nac+mutant+expression+analysis Abstract]&lt;br /&gt;
*PubMedCentral: [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4097594/ PMC]&lt;br /&gt;
*Publisher Full Text (HTML format): [http://jb.asm.org/content/196/14/2578.long HTML]&lt;br /&gt;
*Publisher Full Text (PDF): [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4097594/pdf/zjb2578.pdf PDF]&lt;br /&gt;
*Copyright: 2014 American Society for Microbiology. The ASM is a non-profit organization with numerous publications, some of which are open access and some of which are not. &lt;br /&gt;
*Publisher: American Society for Microbiology&lt;br /&gt;
**Is the article available under &amp;quot;Open Access&amp;quot;? It is available open access after 6 months.&lt;br /&gt;
*Availability: online and in print&lt;br /&gt;
*Did LMU pay a fee for this article: no&lt;br /&gt;
*Database used to find the data and article: ArrayExpress&lt;br /&gt;
*Terms searched: &amp;#039;&amp;#039;Shigella flexneri&amp;#039;&amp;#039;&lt;br /&gt;
**Filtered by organism: &amp;#039;&amp;#039;Shigella flexneri&amp;#039;&amp;#039;&lt;br /&gt;
**Filtered by experiment type: RNA assay, array assay&lt;br /&gt;
*Search overview&lt;br /&gt;
**Results: 7&lt;br /&gt;
**Assessment: All of the articles were relevant but not all had enough assays to be able to be used for this assignment. All involved transcription profiling by array but obviously the experiments differed. Expression analysis was used to examine an RNA polymerase inhibitor, comparing wild type to mutant &amp;#039;&amp;#039;Shigella&amp;#039;&amp;#039;, and virulence plasmid-cured strains amongst others.&lt;br /&gt;
*Search in Web of Knowledge&lt;br /&gt;
**Number of articles this article cites: 70&lt;br /&gt;
**Number of times this article has been cited: 0&lt;br /&gt;
**What research directions have been taken since this article has been published? This article has not been cited at all. It was published in July 2014 (pretty recently), which may contribute to this.&lt;br /&gt;
**link to [https://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-49939/samples/?keywords=shigella+flexneri&amp;amp;organism=Shigella+flexneri&amp;amp;exptype%5B%5D=%22rna+assay%22&amp;amp;exptype%5B%5D=%22array+assay%22&amp;amp;array= microarray data]&lt;br /&gt;
**What experiment was performed? What was the &amp;quot;treatment&amp;quot; and the &amp;quot;control&amp;quot;? &lt;br /&gt;
***The experimenters examined 12 two-component regulatory systems in &amp;#039;&amp;#039;Shigella flexneri&amp;#039;&amp;#039; for their abilities to sense changes in environmental conditions and regulate gene expression in response. Virulence was testing by infecting Henle cells with wild type and mutant TCRS. They found four systems required for the formation of plaque in wild-type and microarray analysis was performed to identify which genes were regulated differently by the NtrBC system or by Nac.&lt;br /&gt;
***The treatment for this experiment was to create &amp;#039;&amp;#039;Shigella&amp;#039;&amp;#039; TCRS mutants using phages or transduction and to test their effectiveness in invading Henle cells. Assays were then done to compare gene expression in these mutants with wild type &amp;#039;&amp;#039;Shigella&amp;#039;&amp;#039; TCRS. The control for this experiment was DNA-ase treated RNA and assays performed with avirulent strains of &amp;#039;&amp;#039;Shigella&amp;#039;&amp;#039;. &lt;br /&gt;
**Were replicate experiments of the &amp;quot;treatment&amp;quot; and &amp;quot;control&amp;quot; conditions conducted? Were these biological or technical replicates? How many of each? Assays were conducted three times. These were technical replicates because the conditions were different samples.&lt;br /&gt;
&lt;br /&gt;
==== Erich Yanoschik ==== &lt;br /&gt;
&lt;br /&gt;
Global analysis of a plasmid-cured Shigella flexneri strain: new insights into the interaction between the chromosome and a virulence plasmid.&lt;br /&gt;
Li Zhu, Xiankai Liu, Xuexue Zheng, Xin Bu, Ge Zhao, Chaohua Xie, Jingfei Zhang, Na Li, Erling Feng, Jie Wang, Yongqiang Jiang, Peitang Huang, Hengliang Wang&lt;br /&gt;
J Proteome Res. 2010 February 5; 9(2): 843–854. doi: 10.1021/pr9007514&lt;br /&gt;
* PubMed Abstract: http://www.ncbi.nlm.nih.gov/pubmed?LinkName=gds_pubmed&amp;amp;from_uid=200012535&lt;br /&gt;
* PubMed Central:  N/A&lt;br /&gt;
* Publisher Full Text (HTML): http://pubs.acs.org/doi/full/10.1021/pr9007514&lt;br /&gt;
* Publisher Full Text (PDF):  http://pubs.acs.org/doi/pdf/10.1021/pr9007514&lt;br /&gt;
* Copyright:  2009 American Chemical Society&lt;br /&gt;
* Publisher:   Journal of Proteome Research&lt;br /&gt;
* Availability:  in print and online&lt;br /&gt;
* Did LMU pay a fee for this article: no&lt;br /&gt;
*The publisher is a sceintific society. The Journal of Proteome Research publishes content encompassing all aspects of global protein analysis and function, including the dynamic aspects of genomics, spatio-temporal proteomics, metabonomics and metabolomics, clinical and agricultural proteomics, as well as advances in methodology including bioinformatics. The theme and emphasis is on a multidisciplinary approach to the life sciences through the synergy between the different types of &amp;quot;omics&amp;quot;. -quote from the about section http://pubs.acs.org/page/jprobs/about.html&lt;br /&gt;
*Used the ISI Web of Science/Knowledge database to search this article&lt;br /&gt;
** The article has 28 cited references&lt;br /&gt;
** The article is cited 4 times &lt;br /&gt;
** Directions of research has been focused towards profiling which parts of the shigella flexneri genome is responsible for virulence and pathogenicity factors along with chromosomal inactivation.&lt;br /&gt;
# Global patterns of &amp;#039;&amp;#039;&amp;#039;&amp;#039;&amp;#039;gene expression&amp;#039;&amp;#039;&amp;#039;&amp;#039;&amp;#039; of a virulence cured plasmid strain compared with the wild-type strain were analyzed using 2-DE combined with MALDI-TOF MS.&lt;br /&gt;
#* There are 6 biological replicates total. &lt;br /&gt;
#* The control sample is derived from mRNA&lt;br /&gt;
# Overview of Search Results&lt;br /&gt;
#* The results of the search mainly consisted of E.coli and Shigella flexneri transcriptional profiling.&lt;br /&gt;
#* There are 178 results in the GEO DataSets Database and 22283 in GEO profiles database.&lt;br /&gt;
#* The results were mostly relevant, the first results were datasets. Anything related to the bacteria came up, the order was seemingly relevant.&lt;br /&gt;
#** The micro array data can be found http://pubs.acs.org/doi/abs/10.1021/pr9007514&lt;br /&gt;
# The experiment was contrasting the pathegenicity of a virulence cured plasmid strain versus a wild type shigella flexneri, a virulence plasmid cured strain was constructed through plasmid incompatibility. The control was the wild type Shigella flexneri strain in each experimental construct.&lt;br /&gt;
#* There were at least 3 biological replicates of each experiment conducted and 2 techincal replicates.&lt;br /&gt;
&lt;br /&gt;
==== Trixie ====&lt;br /&gt;
Morris, Carolyn R, et al. ‘Characterization of Intracellular Growth Regulator IcgR by Utilizing Transcriptomics to Identify Mediators of Pathogenesis in Shigella Flexneri’. Infection and Immunity 81.9 (Sep. 2013): 3068–3076. 6 Nov. 2015.&lt;br /&gt;
&lt;br /&gt;
* PubMed Abstract: http://www.ncbi.nlm.nih.gov/pubmed/?term=Characterization+of+Intracellular+Growth+Regulator+icgR+by+Utilizing+Transcriptomics+To+Identify+Mediators+of+Pathogenesis+in+Shigella+flexneri&lt;br /&gt;
* PubMed Central: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3754207/&lt;br /&gt;
* Publisher Full Text (HTML): http://iai.asm.org/content/81/9/3068.full&lt;br /&gt;
* Publisher Full Text (PDF): http://iai.asm.org/content/81/9/3068.full.pdf+html&lt;br /&gt;
* Copyright: 2013, American Society for Microbiology. All Rights Reserved.&lt;br /&gt;
* Publisher: American Society for Microbiology&lt;br /&gt;
* Availability: only online&lt;br /&gt;
* Did LMU pay a fee for this article: no&lt;br /&gt;
* doi: 10.1128/IAI.00537-13&lt;br /&gt;
&lt;br /&gt;
Database used to find the data and article: ArrayExpress&lt;br /&gt;
* Terms searched: &amp;#039;&amp;#039;Shigella flexneri&amp;#039;&amp;#039;&lt;br /&gt;
** Filtered by organism: &amp;#039;&amp;#039;Shigella flexneri&amp;#039;&amp;#039;&lt;br /&gt;
** Filtered by experiment type: RNA assay, array assay&lt;br /&gt;
* Search overview&lt;br /&gt;
** Results: 7&lt;br /&gt;
** Assessment: Some of the results only used 2-4 assays so we immediately felt suspicious as to the accuracy of the results they would provide. Out of the 7 results, 5 had 9 or more assays so we decided to look at those data.&lt;br /&gt;
&lt;br /&gt;
Web of Science:&lt;br /&gt;
* Link to microarray data: [http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-40851/samples/?keywords=%22Shigella+flexneri%22+&amp;amp;organism=Shigella+flexneri&amp;amp;exptype%5B%5D=%22rna+assay%22&amp;amp;exptype%5B%5D=%22array+assay%22&amp;amp;array= Microarray data]&lt;br /&gt;
* How many articles does this article cite? 2&lt;br /&gt;
* How many articles cite this article? 52&lt;br /&gt;
* Based on the titles and abstracts of the papers, what type of research directions have been taken now that the genome for that organism has been sequenced? &lt;br /&gt;
** Since the organism&amp;#039;s genome has been sequenced, new research about this specie now tends to focus more on its pathogenesis using bioinformatic methods with in vitro and in vivo microarray data. For example, the article &amp;quot;Analysis of the Proteome of Intracellular Shigella flexneri Reveals Pathways Important for Intracellular Growth&amp;quot; that cites this article analyzes the metabolic pathways that allow the organism to grow.&lt;br /&gt;
* What experiment was performed? What was the &amp;quot;treatment&amp;quot; and what was the &amp;quot;control&amp;quot; in the experiment? &lt;br /&gt;
** This experiment involved combining high-throughput bioinformatic methods with in vitro and in vivo assays to provide new insights into pathogenesis. The intracellular growth regulator was deleted in order to observe its effects and compare to the wild type, or the control in the experiment. The &amp;quot;treatment&amp;quot; involved culturing the strains in Luria broth or tryptic soy agar with Congo red (TSA/CR) medium supplemented with the appropriate antibiotics (15 μg/ml chloramphenicol, 50 μg/ml kanamycin, and 100 μg/ml ampicillin) and allowing them to invade colonic epithelial cells for a set period of time.&lt;br /&gt;
* Were replicate experiments of the &amp;quot;treatment&amp;quot; and &amp;quot;control&amp;quot; conditions conducted? Were these biological or technical replicates? How many of each?&lt;br /&gt;
** The experiment had both biological and technical replicates. Since the experiment involved analyzing the pathogenesis of the organism, the researchers tried deleting the gene they believe is involve in intracellular growth, which they called the icgR. In their documentation, they wrote that they compared the results of subjecting the ΔicgR strain (and its complement, ΔicgR(pSECicgR), or ΔicgR mutant transformed with pSECicgR) to certain conditions to the control, the wild type 2457T. In other words, the experiment involved 3 biological strains (namely the wild type, ΔicgR, and ΔicgR complement). 5 technical replicates were then conducted for each different strain, resulting in a grand total of 15 microarrays.&lt;br /&gt;
&lt;br /&gt;
====Jake====&lt;br /&gt;
&lt;br /&gt;
The complete bibliographic reference in the APA style (see the Writing LibGuide) You will be using one of three formats, “journal article from database (with DOI), journal article from database (no DOI) or journal article in print (no DOI).) --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Fu H, Liu L, Zhang X, Zhu Y, Zhao L, Peng J, et al. (2012) Common Changes in Global Gene Expression Induced by RNA Polymerase Inhibitors in &amp;#039;&amp;#039;shigella flexneri&amp;#039;&amp;#039;. PLoS ONE 7(3): e33240. doi:10.1371/journal.pone.0033240&lt;br /&gt;
&lt;br /&gt;
*The link to the [http://www.ncbi.nlm.nih.gov/pubmed/22428000 abstract]&lt;br /&gt;
*The link to the [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3299763/ full text of the article] in PubMed Central&lt;br /&gt;
*The link to the [http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0033240 full text of the article] (HTML format) from the publisher web site.&lt;br /&gt;
*The link to the [http://www.plosone.org/article/fetchObject.action?uri=info:doi/10.1371/journal.pone.0033240&amp;amp;representation=PDF full PDF version] of the article from the publisher web site.&lt;br /&gt;
*Copyright: © 2012 Fu et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.&lt;br /&gt;
*Does the journal own the copyright? NO&lt;br /&gt;
*Do the authors own the copyright? Yes&lt;br /&gt;
*Do the authors own the rights under a Creative Commons license? Yes&lt;br /&gt;
*Is the article available “Open Access”? Yes&lt;br /&gt;
*What organization is the publisher of the article? What type of organization is it? PLoS One is the publisher/Journal.  It hosts open access research articles. (Public Library of Science)&lt;br /&gt;
*Is this article available in print or online only? Online only&lt;br /&gt;
*Has LMU paid a subscription or other fee for your access to this article? No LMU has not paid a subscription or other fee because it is open access on the Public Library of Science.&lt;br /&gt;
*Use the genome sequencing article you found to perform a prospective search in the ISI Web of Science/Knowledge database.&lt;br /&gt;
**How many articles does this article cite? 25 cited references&lt;br /&gt;
**How many articles cite this article? 0 articles cite this article&lt;br /&gt;
**Based on the titles and abstracts of the papers, what type of research directions have been taken now that the genome for that organism has been sequenced?&lt;br /&gt;
*Well given that there are no papers that cite this paper there hasn&amp;#039;t been anything done to build on this specific topic.  In regards to the genome I think this paper has built on the work of the people who sequenced the first genome of Shigella flexneri as well as the other micro array papers.&lt;br /&gt;
*State which database you used to find the data and article: ArrayExpress&lt;br /&gt;
*State what you used as search terms and what type of search terms they were: &amp;quot;shigella flexneri&amp;quot; filtered by organism, experiment type: &amp;quot;rna assay&amp;quot;, experiment type: &amp;quot;array assay&amp;quot;&lt;br /&gt;
*Give an overview of the results of the search.&lt;br /&gt;
**How many results did you get? 7 results returned with 6 viable options due to the number assays.&lt;br /&gt;
**Give an assessment of how relevant the results were: Very relevant, 6/7 results were viable.&lt;br /&gt;
*Link to [http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-32978/?keywords=shigella+flexneri&amp;amp;organism=Shigella+flexneri&amp;amp;exptype%5B%5D=%22rna+assay%22&amp;amp;exptype%5B%5D=%22array+assay%22&amp;amp;array= microarray data]&lt;br /&gt;
*What experiment was performed? What was the &amp;quot;treatment&amp;quot; and what was the &amp;quot;control&amp;quot; in the experiment?&lt;br /&gt;
**Antibiotics (RNA Polymerase Inhibitors) were added to &amp;#039;&amp;#039;Shigella flexneri&amp;#039;&amp;#039; in order to see if bacteria became less active.  The control was a group of bacteria with no drugs added to them, and the treatment was a group of bacteria with drugs added to them.&lt;br /&gt;
*Were replicate experiments of the &amp;quot;treatment&amp;quot; and &amp;quot;control&amp;quot; conditions conducted? Were these biological or technical replicates? How many of each?&lt;br /&gt;
**There are two drugs RX and RP with 6 samples per drug. The experiment was run 3 times which yielded 36 assays. I believe that means 3 biological replicates and 12 technical replicates within each experiment, but I am not 100 percent sure.&lt;/div&gt;</summary>
		<author><name>Troque</name></author>	</entry>

	</feed>