LMU BioDB 2013 - User contributions [en]

File:Report for Final Project HDDWKS 20131213.pdf

2013-12-14T00:37:17Z

Ksherbina: Ksherbina uploaded a new version of "File:Report for Final Project HDDWKS 20131213.pdf"

Team H(oo)KD Final Project Deliverables

2013-12-14T00:34:29Z

Ksherbina: /* Deliverables */ Added report

{{Team H(oo)KD}}

==Deliverables==

*Gene database for ''Chlamydia trachomatis'' A/HAR-13 (.gdb): [[Media:Ct-Std External 20131121.gdb|Ct-Std_External_20131121.gdb]]
*ReadMe file including the gene database schema:[[Media:ReadMe Ct-Std External 20131122.pdf|ReadMe_Ct-Std_External_20131122.pdf]]
*Gene database testing report: [[Media:Ct External 20131121 Gene Database Testing Report.pdf|Ct_External_20131121_Gene_Database_Testing_Report.pdf]]
*Processed and analyzed DNA microarray dataset (.xls):
*GenMAPP Expression Dataset file (.gex):
:*For data collected in absence of rifampicin: [[Media: For GenMAPP Chlamydia V4 20131205 KS.gex|For_GenMAPP_Absence_of_Rifampicin.gex]]
:*For data collected in presence of rifampicin: [[Media:For GenMAPP Chlamydia V4 20131212 KS Presence of Rifampicin.gex|For_GenMAPP Chlamydia_V4_20131212_KS_Presence_of_Rifampicin.gex]]
*Filtered MAPPFinder Results (.xls):
*[[Media: EB to RB No Rif 20131207 KS DW-Criterion0-GO.xls|EB to RB No Rif Increased]]
*[[Media: EB to RB No Rif 20131207 KS DW-Criterion1-GO.xls|EB to RB No Rif decreased]]
*[[Media: MAPPFinder Results EB to RB Rifampicin 20131212-Criterion0-GO (1).xls ‎| EB to RB Rif Increased]]
*[[Media:EB to RB No Rif 20131207 KS DW-Criterion1-GO.xls|EB to RB No Rif decreased]]
*Sample MAPP file of a relevant biological pathway for your species (.mapp):[[Media:EBtoRB_Rifampicin_Cellular_Carbohydrate_Metabolic_Process.mapp]]
:*Picture of the MAPP (.jpeg):[[Media:EBtoRB Rifampicin Cellular Carbohydrate Metabolic Process.jpg]]
*Group Report (.pdf): [[Media:Report for Final Project HDDWKS 20131213.pdf|Report_for_Final_Project_HDDWKS_20131213.pdf]]
*PowerPoint presentation (given on Thursday, December 12): [[Media:Transcriptional_Analysis_of_the_Developmental_Stages_of_Chlamydia_trachomatis_A_HAR-13_HDKSDW_20131212.pdf|Transcriptional_Analysis_of_the_Developmental_Stages_of_Chlamydia_trachomatis_A_HAR-13_HDKSDW_20131212.pdf]]

Team H(oo)KD Final Project Deliverables

2013-12-14T00:33:53Z

Ksherbina: /* Deliverables */ Added powerpoint

{{Team H(oo)KD}}

==Deliverables==

*Gene database for ''Chlamydia trachomatis'' A/HAR-13 (.gdb): [[Media:Ct-Std External 20131121.gdb|Ct-Std_External_20131121.gdb]]
*ReadMe file including the gene database schema:[[Media:ReadMe Ct-Std External 20131122.pdf|ReadMe_Ct-Std_External_20131122.pdf]]
*Gene database testing report: [[Media:Ct External 20131121 Gene Database Testing Report.pdf|Ct_External_20131121_Gene_Database_Testing_Report.pdf]]
*Processed and analyzed DNA microarray dataset (.xls):
*GenMAPP Expression Dataset file (.gex):
:*For data collected in absence of rifampicin: [[Media: For GenMAPP Chlamydia V4 20131205 KS.gex|For_GenMAPP_Absence_of_Rifampicin.gex]]
:*For data collected in presence of rifampicin: [[Media:For GenMAPP Chlamydia V4 20131212 KS Presence of Rifampicin.gex|For_GenMAPP Chlamydia_V4_20131212_KS_Presence_of_Rifampicin.gex]]
*Filtered MAPPFinder Results (.xls):
*[[Media: EB to RB No Rif 20131207 KS DW-Criterion0-GO.xls|EB to RB No Rif Increased]]
*[[Media: EB to RB No Rif 20131207 KS DW-Criterion1-GO.xls|EB to RB No Rif decreased]]
*[[Media: MAPPFinder Results EB to RB Rifampicin 20131212-Criterion0-GO (1).xls ‎| EB to RB Rif Increased]]
*[[Media:EB to RB No Rif 20131207 KS DW-Criterion1-GO.xls|EB to RB No Rif decreased]]
*Sample MAPP file of a relevant biological pathway for your species (.mapp):[[Media:EBtoRB_Rifampicin_Cellular_Carbohydrate_Metabolic_Process.mapp]]
:*Picture of the MAPP (.jpeg):[[Media:EBtoRB Rifampicin Cellular Carbohydrate Metabolic Process.jpg]]
*Group Report (.pdf):
*PowerPoint presentation (given on Thursday, December 12): [[Media:Transcriptional_Analysis_of_the_Developmental_Stages_of_Chlamydia_trachomatis_A_HAR-13_HDKSDW_20131212.pdf|Transcriptional_Analysis_of_the_Developmental_Stages_of_Chlamydia_trachomatis_A_HAR-13_HDKSDW_20131212.pdf]]

File:Report for Final Project HDDWKS 20131213.pdf

2013-12-14T00:31:31Z

Ksherbina:

Team H(oo)KD Final Project Deliverables

2013-12-14T00:17:38Z

Ksherbina: /* Deliverables */

Team H(oo)KD Final Project Deliverables

2013-12-14T00:17:11Z

Ksherbina: /* Deliverables */ added file

File:ReadMe Ct-Std External 20131122.pdf

2013-12-14T00:16:30Z

Ksherbina:

File:Transcriptional Analysis of the Developmental Stages of Chlamydia trachomatis A HAR-13 HDKSDW 20131212.pdf

2013-12-14T00:01:52Z

Ksherbina:

Team H(oo)KD Final Project Deliverables

2013-12-13T23:59:50Z

Ksherbina: /* Deliverables */

{{Team H(oo)KD}}

==Deliverables==

*Gene database for ''Chlamydia trachomatis'' A/HAR-13 (.gdb): [[Media:Ct-Std External 20131121.gdb|Ct-Std_External_20131121.gdb]]
*ReadMe file including the gene database schema:
*Gene database testing report: [[Media:Ct External 20131121 Gene Database Testing Report.pdf|Ct_External_20131121_Gene_Database_Testing_Report.pdf]]
*Processed and analyzed DNA microarray dataset (.xls):
*GenMAPP Expression Dataset file (.gex):
:*For data collected in absence of rifampicin:
:*For data collected in presence of rifampicin: [[Media:For GenMAPP Chlamydia V4 20131212 KS Presence of Rifampicin.gex|For_GenMAPP Chlamydia_V4_20131212_KS_Presence_of_Rifampicin.gex]]
*Filtered MAPPFinder Results (.xls):
*Sample MAPP file of a relevant biological pathway for your species (.mapp):
:*Picture of the MAPP (.jpeg):
*Group Report (.pdf):
*PowerPoint presentation (given on Thursday, December 12):

File:For GenMAPP Chlamydia V4 20131212 KS Presence of Rifampicin.gex

2013-12-13T23:59:15Z

Ksherbina:

Team H(oo)KD Final Project Deliverables

2013-12-13T23:57:58Z

Ksherbina: /* Deliverables */

{{Team H(oo)KD}}

==Deliverables==

*Gene database for ''Chlamydia trachomatis'' A/HAR-13 (.gdb): [[Media:Ct-Std External 20131121.gdb|Ct-Std_External_20131121.gdb]]
*ReadMe file including the gene database schema:
*Gene database testing report: [[Media:Ct External 20131121 Gene Database Testing Report.pdf|Ct_External_20131121_Gene_Database_Testing_Report.pdf]]
*Processed and analyzed DNA microarray dataset (.xls):
*GenMAPP Expression Dataset file (.gex):
:*For data collected in absence of rifampicin:
:*For data collected in presence of rifampicin:
*Filtered MAPPFinder Results (.xls):
*Sample MAPP file of a relevant biological pathway for your species (.mapp):
:*Picture of the MAPP (.jpeg):
*Group Report (.pdf):
*PowerPoint presentation (given on Thursday, December 12):

Team H(oo)KD Final Project Deliverables

2013-12-13T23:57:17Z

Ksherbina: /* Deliverables */ Changed testing report

{{Team H(oo)KD}}

==Deliverables==

*Gene database for ''Chlamydia trachomatis'' A/HAR-13 (.gdb): [[Media:Ct-Std External 20131121.gdb|Ct-Std_External_20131121.gdb]]
*ReadMe file including the gene database schema:
*Gene database testing report: [[Media:Ct External 20131121 Gene Database Testing Report.pdf|Ct_External_20131121_Gene_Database_Testing_Report.pdf]]
*Processed and analyzed DNA microarray dataset (.xls):
*GenMAPP Expression Dataset file (.gex):
*Filtered MAPPFinder Results (.xls):
*Sample MAPP file of a relevant biological pathway for your species (.mapp):
:*Picture of the MAPP (.jpeg):
*Group Report (.pdf):
*PowerPoint presentation (given on Thursday, December 12):

File:Ct External 20131121 Gene Database Testing Report.pdf

2013-12-13T23:55:51Z

Ksherbina:

Team H(oo)KD Final Project Deliverables

2013-12-13T12:03:47Z

Ksherbina: /* Deliverables */ Added link to final testing report

{{Team H(oo)KD}}

==Deliverables==

*Gene database for ''Chlamydia trachomatis'' A/HAR-13 (.gdb): [[Media:Ct-Std External 20131121.gdb|Ct-Std_External_20131121.gdb]]
*ReadMe file including the gene database schema:
*Gene database testing report: [[Media:C.trachomatis Gene Database Final Testing Report 20131212.pdf|C.trachomatis_Gene_Database_Final_Testing_Report_20131212.pdf]]
*Processed and analyzed DNA microarray dataset (.xls):
*GenMAPP Expression Dataset file (.gex):
*Filtered MAPPFinder Results (.xls):
*Sample MAPP file of a relevant biological pathway for your species (.mapp):
:*Picture of the MAPP (.jpeg):
*Group Report (.pdf):
*PowerPoint presentation (given on Thursday, December 12):

File:C.trachomatis Gene Database Final Testing Report 20131212.pdf

2013-12-13T12:02:12Z

Ksherbina: Gene database testing report for C. trachomatis

Gene database testing report for C. trachomatis

Team H(oo)KD Final Project Deliverables

2013-12-13T11:52:44Z

Ksherbina: /* Deliverables */ Added bullets for additional files

{{Team H(oo)KD}}

==Deliverables==

*Gene database for ''Chlamydia trachomatis'' A/HAR-13 (.gdb): [[Media:Ct-Std External 20131121.gdb|Ct-Std_External_20131121.gdb]]
*ReadMe file including the gene database schema:
*Gene database testing report:
*Processed and analyzed DNA microarray dataset (.xls):
*GenMAPP Expression Dataset file (.gex):
*Filtered MAPPFinder Results (.xls):
*Sample MAPP file of a relevant biological pathway for your species (.mapp):
:*Picture of the MAPP (.jpeg):
*Group Report (.pdf):
*PowerPoint presentation (given on Thursday, December 12):

Team H(oo)KD Final Project Deliverables

2013-12-13T11:48:25Z

Ksherbina: Added link to gene database file

{{Team H(oo)KD}}

==Deliverables==

*Gene database for ''Chlamydia trachomatis'' A/HAR-13: [[Media:Ct-Std External 20131121.gdb|Ct-Std_External_20131121.gdb]]

Team H(oo)KD Final Project Deliverables

2013-12-13T11:44:43Z

Ksherbina: Added template

Template:Team H(oo)KD

2013-12-13T11:44:16Z

Ksherbina: Added link to project deliverables page

<div style="background: #000000; padding: 10px; font-weight:bold; font-size:200%;color:white"><center>Team H(oo)KD</center></div>

<div style="background: #F5F5F5; padding: 10px">
{| style = "width:100%; font-size:120%"
| '''[[Project Manager|<span style="color:black;">'''Project Manager'''</span>]]/[[Coder|<span style="color:black;">'''Coder'''</span>]]: [[User:Ksherbina|Katrina Sherbina]]'''
| '''[[Quality Assurance|<span style="color:black;">'''Quality Assurance'''</span>]]: [[User:HDelgadi|Hilda Delgadillo]]'''
| '''[[GenMAPP User|<span style="color:black;">'''GenMAPP User'''</span>]]: [[User:Dwilliams|Dillon Williams]]'''
|}

<div style="border-width: 2px; border-bottom-width:2px; border-bottom-color:#FFE135; border-bottom-style: solid; width: 100%"></div>

{| style = "width:90%; font-size:%100"
| '''Project Guidelines:'''
| '''[[Gene Database Project]]'''
| '''[[Gene Database Project Report Guidelines|Report Guidelines]]'''
|-
| '''Team Journal Assignments'''
| '''[[Team H(oo)KD Week 12 Status Report| Week 12 Status Report]]'''
| '''[[Team H(oo)KD Week 13 Status Report| Week 13 Status Report]]'''
| '''[[Team H(oo)KD Week 15 Status Report| Week 15 Status Report]]'''
|-
| '''Individual Status Reports:'''
| '''[[HDelgadi Project Notebook]]
| '''[[dwilliams Project Notebook]]'''
| '''[[Ksherbina Project Notebook]]'''
|-
| '''Useful Links:'''
| '''[[Main Page|Class Page]]'''
| '''[[Chlamydia trachomatis|Team Home Page]]'''
|-
|}
{| style = "width:55%; font-size:130%"
| '''Final Product'''
| [[Team H(oo)KD Final Project Deliverables|<span style="color:#006400;">'''Project Deliverables'''</span>]]
|}

</div>

Ksherbina Project Notebook

2013-12-13T07:42:03Z

Ksherbina: /* Week 15 */ Finalized the testing report

{{Ksherbina}}

==Week 12==

===Starting the Set Up of the Testing Environment===

#Downloaded the following software onto my personal computer:
#*Latest version of GenMAPP Builder (gmbuilder2.0-b71) from [http://sourceforge.net/projects/xmlpipedb/files/?source=navbar SourceForge]
#*[http://www.oracle.com/technetwork/java/javase/downloads/index.html Java SE 7u45]
#*[http://www.eclipse.org/downloads/ Eclipse IDE for Java EE Developers]
#Created an account on [http://sourceforge.net/ SourceForge].

===Import/Export of Gene Database===

#Downloaded the [http://www.uniprot.org/uniprot/?query=organism:315277+keyword:181 UniProt XML] for ''C. trachomatis'' serovar A strain HAR-13: [[Media:Uniprot XML C.trachomatis serovar A KS 20131114.xml|Uniprot XML C.trachomatis serovar A KS 20131114.xml]]
#Downloaded the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ GOA file] for ''C. trachomatis'' serovar A: [[Media:22183.C trachomatis A KS 20131114.goa|22183.C trachomatis A KS 20131114.goa]]
#Created a new database in pgAdminIII: CT_KS_20131114_gmb2b71
#*Created tables in the database by executing the script in the file ''gmbuilder.sql''.
#Launched ''gmbuilder-32bit.bat'':
#Configured the database in gmbuilder:
#*Host or address: localhost
#*Port number: 5432
#*Database name: CT_KS_20131114_gmb2b71
#*Username: postgres
#*Password: <password of the PostgreSQL database created above>
#Imported the UniProt XML file into the PostgreSQL database through gmbuilder.
#Imported the OBO-XML file into the PostgreSQL database through gmbuilder.
#*gmbuilder was not able to completely process the data. See the testing report below for details.
#Imported the GOA file into into the PostgreSQL database through gmbuilder.
#*I did not begin the actual export of the database due to the problem with processing the OBO-XML file.

====Testing Report====

Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131114_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_11
* Time taken to import: 1.70 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped):
* Time taken to import: 9.29 min
* Time taken to process: '''Could no complete the process. The following error came up when gmbuilder was processing the 81000 term:'''
ERROR edu.lmu.xmlpipedb.gmbuilder.GenMAPPBuilder - java.util.concurrent.ExecutionException:java.lang.OutOfMemoryError:
Java heap space

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]):
* Time taken to import: 0.02 min

Name of .gdb file:
* Time taken to export .gdb:
* Upload your file and link to it here.

Note:

===Creating Custom Species Profile===

#On one of the class computers in the first row, I launched Eclipse with Subclipse.
#Went to Window > Open Perspective > Other > SVN Repository Exploring.
#Defined a new subversion by clicking on ''Add Repository'' and adding the URL https://svn.code.sf.net/p/xmlpipedb/code. Clicked Finish.
#Double-clicked on trunk, right-clicked on gmbuilder, and chose Checkout....
#Created a new project (chose Check out as a project configured using the New Project Wizard).
#Went to Window > Preferences > Java > Installed JREs. Added a new environment with JDK.
#Double-clicked on the project folder.
#Double-clicked on the lib folder.
#Shift-clicked and selected all files in the lib folder and then control-clicked on every file inside the lib folder whose name does not end in .jar. Right-click on one of them and choose Build Path > Add to Build Path from the popup menu that comes up.
#Created a species profile called ChlamydiaTrachomatisSerovarA.
#*Dondi modified the name to ChlamydiaTrachomatisSerovarAUniProtSpeciesProfile.
#Customized the species profile.
#*Opened the new profile and added the lines of code as specified in [[Coder|"Customize the Species Profile"]] directions.
#**Still need to insert the species-specific URL that returns a web page describing a gene for that species.
#Committed the profile following the directions in [[Coder|"Updating and Committing Code"]].

===Reflection===

#What were the week’s key accomplishments?
#*One of this week's key accomplishments was downloading the files necessary to run a import/export gene database cycle and the microarray files that will be analyzed. Also this week, we were able to start an import/export cycle.
#What are next week’s target accomplishments?
#*Next week, I would like to complete an import/export cycle and vet it with the QA person. In addition, I hope that we can download the software necessary to open the microarray files, format the files, and perform statistical analysis.
#What team strengths were seen this week?
#*I believe our strength this week as a team was meeting outside of class to create milestones. This meeting results in a detailed calendar of intermediate milestones that we plan to meet accomplish in order to meet the file deadline.
#What team weaknesses were seen this week?
#*It was difficult to set up the aforementioned meeting due to our busy schedules. Additionally, we did not have a clear plan for working on the project until the middle/end of the week. In the coming weeks, our goal will be to improve our work plan for each class period so as to minimize the difficulty of scheduling work meetings outside of class.

==Week 13==

===November 15, 2013===

====Import/Export Gene Database====

After the error that came up last week regarding the Java heap space, I performed the following steps to increase the heap space:
#Open up ''gmbuilder-32bit.bat'' in Notepad.
#Find the line
"C:\Program Files (x86)\Java\jre7\bin\java" -D32 -Xmx1024m -jar gmbuilder.jar
#Changed "-Xmx1024m" to "-Xmx2048m" thereby increasing the maximum heap space from 1024 MB to 2048 MB.
#Saved the file and closed it.

*Then, I tried the import/export again. I could not open ''gmbuilder-32bit.bat'' after I changed the maximum heap space. I repeated the above steps for ''gmbuilder.bat'' and found that I could then still open the program.
*Opening ''gmbuilder.bat'', I imported both the UniProt XML file and the OBO-XML file. However, I received a Java heap space error again when processing the OBO-XML file.
*I increased the heap space in ''gmbuilder.bat'' from 2048 MB to 4096 MB and tried the import/export cycle again. When this did not work, I increased the heap space in ''gmbuilder.bat'' to 8192 MB.
*Then, I was able to import the UniProt XML, OBO-XML, and GOA files as well as process the OBO-XML.
*However, I was not able to begin the export because an error popped up stating that an export to a GenMAPP database is only possible in ''gmbuilder-32bit.bat''.

=====Testing Report=====

Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131114_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]):
* Time taken to import: 0.73 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped):
* Time taken to import: 7.39 min
* Time taken to process: 47.15 min

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]):
* Time taken to import: 0.03 min

Name of .gdb file:
* Time taken to export .gdb:
* Upload your file and link to it here.

Note: '''Could not export the database. Received an error stating that an export to a GenMAPP database is only possible in ''gmbuilder-32bit.bat''.'''

====Troubleshooting====

#Ran cmd from the Start menu.
#cd to where GenMAPP Builder is located.
#Typed the same command that is in the .bat file, but directly at the prompt:
"C:\Program Files (x86)\Java\jre7\bin\java" -D32 -Xmx1536m -jar gmbuilder.jar

Received the following error:
Error occurred during initialization of VM.
Could not reserve enough space for project heap.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit

===November 19, 2013===

====Rerunning the Gene Database Import/Export====

*From November 11/15/2013, the UniProt XML, OBO-XML, and GOA files were imported and the OBO-XML file was processed the in 64-bit ''gmbuilder.bat'' with the maximum heap space set to 8192.
*Performed the export in 32-bit ''gmbuilder.bat''.

=====Testing Report=====

Version of GenMAPP Builder: Same as in previous testing report

Computer on which export was run: Same as in previous testing report

Postgres Database name: Same as in previous testing report

UniProt XML filename: Same as in previous testing report
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]):
* Time taken to import: 1.10 min

GO OBO-XML filename: Same as in previous testing report
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped):
* Time taken to import: 8.96 min
* Time taken to process: Did not reprocess because received a message stating that the GO data had already been processed.

GOA filename: Same as in previous testing report
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]):
* Time taken to import: 0.03 min

Name of .gdb file: Ct-Std v1 KS 20131119.gdb
* Time taken to export .gdb:
:*Start Time: 4:01:10 AM
:*End Time: 6:51:58 AM
* Upload your file and link to it here: [[Media:Ct-Std v1 KS 20131119.gdb|Ct-Std v1 KS 20131119.gdb]]

Note:

====Checking the Quality of the Exported Database====

#In ''gmbuilder-32bit.bat'', chose Run XML > Database Tallies for UniProt and GO....
:*The XML count for the UniProt file was 917 while the Database Count for the Uniprot file was 10087. This number discrepancy is likely due to importing the UniProt, OBO-XML, and GOA files multiple times in experimenting with the heap space threshold.
:*A new database must now be created in pgAdminIII in order to repeat the export/import gene database cycle.

===November 20-21, 2013===

====Finishing Installing Software to Set Up the Development Environment====

#Navigated to the "Download and Install" tab on the [http://subclipse.tigris.org/ Subclipse home page]
#Copied the Eclipse update site URL for the 1.10.x Release.
#Opened up Eclipse.
#Help > Install New Software...
#Pasted the URL into the "Work with" field.
#Checked the box next to "Subclipse" and "SVNKit".
#Clicked on Next and went through the process of installing the software.

====Repeating the Gene Database Import/Export Cycle====

#Created a new database in pgAdminIII.
#*Created tables in the database by executing the script in the file gmbuilder.sql.
#Since the aforementioned error occurred because duplicates were introduced into the database through repeated imports, I decided to try this new import-export cycle using 32-bit ''gmbuilder''.
#Opened ''gmbuilder-32bit.bat'' with the heap space set to 1024 MB and configured the new database.
#*Host or address: localhost
#*Port number: 5432
#*Database name: CT_KS_20131119_32bit_gmb2b71
#*Username: postgres
#*Password: <password of the PostgreSQL database created above>
#Imported the UniProt XML file into PostgreSQL through ''gmbuilder-32bit.bat''.
#Imported the OBO-XML file into PostgreSQL through ''gmbuilder-32bit.bat''.
#Processed the OBO-XML file through ''gmbuilder-32bit.bat''.
#Imported the GOA file into PostgreSQL through ''gmbuilder-32bit.bat''.

=====Testing Report=====
Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131119_32bit_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_11
::*Original file name from [http://www.uniprot.org/uniprot/?query=organism:315277+keyword:181 UniProt site]: uniprot-organism%3A315277+keyword%3A181.xml
* Time taken to import: 1.20 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): 11/06/2013
::*Original file name: go_daily-termdb.obo-xml.gz
* Time taken to import: 13.05 min
* Time taken to process: 10.80 min

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): 11/12/13
::*Original file name: 22183.C_trachomatis_A.goa
* Time taken to import: 0.04 min

Name of .gdb file: Ct-Std KS 20131121.gdb
* Time taken to export .gdb:
:*Start Time: 2:37:42 AM
:*End Time: 2:55:45 AM
* Upload your file and link to it here: [[Media:Ct-Std KS 20131121.gdb|Ct-Std KS 20131121.gdb]]

Note: '''There were no heap space errors this time, which is good. The errors probably occurred in the first place because of repeatedly importing the files.'''

===November 21, 2013===

====Revised the Custom Species Profile====

#Ran Eclipse.
#Navigated to Window > Open Perspective > Other... and chose SVN Repository Exploring
#Defined a new Subversion repository by clicking on the Add Repository button.
#Set the URL to https://svn.code.sf.net/p/xmlpipedb/code then clicked Finish.
#Double-cliked on https://svn.code.sf.net/p/xmlpipedb/code in the list.
#Double-clicked on ''trunk''.
#Right-clicked on ''gmbuilder'' and chose ''Checkout....''
#Chose ''Check out as a project configured using the New Project Wizard'' then click ''Finish''.
#In the New Project dialog that opened, chose Java Project from the list and then clicked ''Next >''.
#Entered the new project name ''xmlpipedb-gmbuilder'' and clicked ''Finish''.
#Set Eclipse to use JDK.
#*Navigated to Windows > Preferences and clicked on the ''Search'' button.
#*Expanded Computer > Local Disk (C:) > Program Files > Java > jdk1.7.0_45 and clicked OK.
#*Highlighted the row ''jdk'' and clicked on ''Edit''.
#*In "JRE home:" field, removed everything in the name after jdk1.7.0_45 and then clicked ''Finish''.
#Made sure that I was in the Java perspective by navigating to Window > Open Perspective > Other... and then choosing Java.
#Double-clicked on the ''xmlpipedb-gmbuilder'' folder.
#Double-clicked on the ''lib'' folder.
#Shift-clicked and selected all files in the lib folder and then control-clicked on every file inside the lib folder whose name did not end in .jar.
#Right-clicked on one of the files and choose Build Path > Add to Build Path.
#Check that the ''src'' folder was set to the source folder by right-clicking on it and then clicking on ''Build Path''.
#Right-clicked on the ''test'' folder and chose Build Path > Use as Source Folder. Then, made sure that there were no red x's in the list of folders.
#Right-clicked on ''build.xml'' toward the bottom of the list. Then, navigated to Team > Synchronize with Repository.
#Double-clicked on src > edu.lmu.xmlpipedb.gmbuilder.databasetoolkit.profiles > ChlamydiaTrachomatisSerovarAUniProtSpeciesProfile.java.\
#Within the code, added the gene database link pattern: "http://bacteria.ensembl.org/chlamydia_trachomatis_a_har_13/Gene/Summary?g=~" and then saved the code.
#Right-clicked on ''build.xml'' and chose Run As > Ant Build... (the one with the ellipses).
#In the Edit Configuration dialog that appears, checked on the clean and dist items in the Targets tab.
#Clicked on the ''Order...'' button and rearranged the items so that clean is first and dist is second and then clicked OK.
#Clicked the Run button.
#Right-clicked on the ''xmlpipedb-gmbuilder'' project folder and chose Refresh (F5 is its keyboard shortcut).

====Reran the Database Export With the New Build for gmbuilder====

#Made sure that pgAdminIII and the database "CT_KS_20131119_32bit_gmb2b71" is open.
#Opened the newly built ''gmbuilder-32bit.bat'' within the folder [User Folder] > workspace > xmlpipedb-gmbuilder > dist.
#Made sure that the database was configured to "CT_KS_20131119_32bit_gmb2b71".
#Performed a database export.

=====Testing Report=====

*Name of .gdb file: Ct-Std_v2_KS_20131121.gdb
*Time taken to export .gdb:
:*Start Time: 10:56 AM (approximately)
:*End Time: 11:14:26 AM
*Upload your file and link to it here: [[Media:Ct-Std v2 KS 20131121.gdb|Ct-Std v2 KS 20131121.gdb]]

Note:

====Testing the New Database with GenMAPP====
#Opened GenMAPP.
#Clicked on Data > New Gene Database and selected the gene database that was created today (Ct-Std v2 KS 20131121.gdb).
#On the toolbar, clicked on the button that has the word gene in a box.
#Clicked and dragged the cursor on the blank canvas to create a new gene box.
#Right-clicked on the gene box.
#In the pop-up window, typed in "CTA_0588" in the Gene ID search field.
#Under Gene ID System, selected Ordered Locus Names.
#Then, hit the "Search button". Interestingly, GenMAPP builder did not find this gene in the database.
#*To troubleshoot, I opened up the gene database in Access.
#*In the list of tables, I double-clicked on OrderedLocusNames.
#*I sorted the list of gene IDs from A-Z and then scrolled down to try to find "CTA_0588". I was able to find "CTA_0587" and "CTA_0589" but not "CTA_0588".
#Back in GenMAPP, I used the same steps as above to find "CTA_0587".
#"CTA_0587" was found in the database. In the window, I clicked on the link "CTA_0587" under the OrderedLocusNames section.
#*This link took me to the entry for this gene on the Model Organism Database (http://bacteria.ensembl.org/chlamydia_trachomatis_a_har_13/Gene/Summary?g=CTA_0587).

====Performing Quality Assurance on the New Database====

*Performed counts with Tally Engine.
[[Image:Tally Engine Results for Ct-Std v2 KS 20131121.jpg|thumb|none|upright=4]]
*Counted the number of unique gene IDs using xmlpipdb ''match'':
java -jar xmlpipedb-match-1.1.1.jar "CTA_[0-9][0-9][0-9][0-9]" < Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
:*'''Total unique matches = 911'''
*Counted the number of unique gene IDs using an SQL query:
select count(*) from genenametype where type = 'ordered locus' and value ~ 'CTA_[0-9][0-9][0-9][0-9]';
:*'''Count - 917'''
*Opened the database in Access. Then, double-clicked on the table "OriginalRowCounts".
{| style = "width:20%; font-size:100%"
| '''Table'''
| '''Rows'''
|-
| OrderedLocusNames
| 919
|-
| UniProt
| 917
|-
| UniProt-OrderedLocusNames
| 919
|}
*Double-clicked on the table "OrderedLocusNames" and scrolled all the way to the bottom. We found that some gene ID's had the format pCTA_####.
*Went back to the command prompt and ran the following command:
java -jar xmlpipedb-match-1.1.1.jar "[p|]CTA_[0-9][0-9][0-9][0-9]" < Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
:*'''Total unique matches = 8'''
:*Adding the results of the two runs with xmlpipedb match, we got 919 unique gene IDs, which is the same number as that in Access.

==Week 14==

===November 24, 2013===

====Ran GenMAPP Using ''C. trachomatis'' Gene Database====

#Opened GenMAPP.
#Made sure that the database was set to Ct-Std_v2_KS_20131121.gdb.
#Imported [[Media:C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt|sample microarray data]] into GenMAPP.
#*Loaded the microarray data by navigating to Expression Dataset Manager > New Dataset.
#*There were 23015 errors.
#Opened the exceptions file that was created to inspect the errors and scrolled down to where the IDs CTA_#### begin.
#*The gene IDs are actually in the format CTA_####_RRMH#####_at.
#Tried to search for a sample ID in the format found in the exceptions file in the UniProt database or any NCBI databases.
#*Could not find CTA_0806_RRMH00303, RRMH00303, CTA_0806_RRMH00303_at, or RRMH00303_at in the UniProt database.
#*Could not find CTA_0806_RRMH00303, RRMH00303, CTA_0806_RRMH00303_at, or RRMH00303_at in any database through NCBI.
#*My suspicion is that the RRMH#####_at portion of the ID is an identifier that is appended to each probe on the Affymetrix microarray when it is read by the company's software.

===November 27-28, 2013===

====Running SQL Queries to Determine Discrepancy in the Count Between the UniProt XML and the Gene Database====

*Ran an SQL query to determine if there were any ordered locus names in the database that do not match the ID "CTA_####".
select * from genenametype where type = 'ordered locus' and not value ~ 'CTA_[0-9][0-9][0-9][0-9]';
:*There were no hits.
*Ran an SQL query to show all the ordered locus names in the database that have the ID "CTA_####".
select * from genenametype where type = 'ordered locus' and value ~ 'CTA_[0-9][0-9][0-9][0-9]';
:*Scrolling through the results, I saw the ID "CTA_0406/CTA_0407/CTA_0408".
:*The [http://www.uniprot.org/manual/gene_name UniProt manual] stated that ordered locus names that are connected by a slash indicate two predicted genes that are actually one gene.
:*The combination of these IDs explain why two less IDs show up in the count by the Tally Engine in comparison to the SQL query.
*Opened the gene database within Access and filtered the ordered locus names to show only those that match the ID "CTA_0407".
:*The ID appeared on its own without being joined to other ID's with slashes.

====Testing Report for Finalized Gene Database====

Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131119_32bit_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_11
::*Original file name from [http://www.uniprot.org/uniprot/?query=organism:315277+keyword:181 UniProt site]: uniprot-organism%3A315277+keyword%3A181.xml
* Time taken to import: 1.20 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): 11/06/2013
::*Original file name: go_daily-termdb.obo-xml.gz
* Time taken to import: 13.05 min
* Time taken to process: 10.80 min

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): 11/12/13
::*Original file name: 22183.C_trachomatis_A.goa
* Time taken to import: 0.04 min

*Name of .gdb file: Ct-Std_v2_KS_20131121.gdb
*Time taken to export .gdb:
:*Start Time: 10:56 AM (approximately)
:*End Time: 11:14:26 AM
*Upload your file and link to it here: [[Media:Ct-Std v2 KS 20131121.gdb|Ct-Std v2 KS 20131121.gdb]]

Note: There is one ID "CTA_0406/CTA_0407/CTA_0408" that is a combination of three predicted genes that are actually one gene. The Tally Engine counts this as one gene that contributes to the total count of 917. However, through Access and xmlpipedb ''match'', the ID is separated into three separate genes bringing the total count to 919.

==Week 15==

===December 3, 2013===

*Worked with Hilda to separate the gene IDs from the Affymetrix IDs appended to them in the microarray data.
*Worked with Dillon to run GenMAPP and MAPPFinder with the microarray data and the gene database created for ''C. trachomatis''.

===December 5, 2013===

====Customize Tally Engine for ''C. trachomatis''====

*Opened Eclipse.
*Double-clicked xmlpipedb-gmbuilder > src > edu.lmu.xmlpipedb.gmbuilder.resource.properties > gmbuilder.properties.
*Found the following part of the code:
#
# wizard.properties
#
*Before this part of the code, added the following lines of code to specify which gene IDs to find in the UniProt XML file:
# Chlamydia trachomatis
chlamydiatrachomatis_level_amount=1

chlamydiatrachomatis_element_level0=uniprot/entry/gene/name&type&ordered locus

chlamydiatrachomatis_query_level0=select count(*) from genenametype where type = 'ordered locus';

chlamydiatrachomatis_table_name_level0=Ordered Locus
*Saved the changes and built a new version of gmbuilder.
*Opened gmbuilder and set the database to CT_KS_20131119_32bit_gmb2b71.
*Ran TallyEngine.
*Got the same counts as those from November 21, 2013 despite the fact that I received an error that the species specified in the added code could not be found. However, the error also specified that the correct species ID is chlamydiatrachomatisservoara.
*Accordingly, I went back to Eclipse and modified the code added to gmbuilder.properties:
# Chlamydia trachomatis
chlamydiatrachomatisserovara_level_amount=1

chlamydiatrachomatisserovara_element_level0=uniprot/entry/gene/name&type&ordered locus

chlamydiatrachomatisserovara_query_level0=select count(*) from genenametype where type = 'ordered locus';

chlamydiatrachomatisserovara_table_name_level0=Ordered Locus
*Built a distribution version of gmbuilder and then ran Tally Engine again. I got the same counts but no error message about the species ID as before.
*Went back to Eclipse. Synchronized the code on my computer with the code on SourceForge.
*Built a new distribution of gmbuilder.
*Synchronized again.
*Committed the changes I made to the gmbuilder.properties code.

==Week 16==

===December 12, 2013===

The gene database Ct-Std_v2_KS_20131121.gdb was renamed to Ct-Std_External_20131121.gdb in preparing the final delivarables for the project. The testing report was accordingly modified:

====Testing Report====

Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131119_32bit_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_11
::*Original file name from [http://www.uniprot.org/uniprot/?query=organism:315277+keyword:181 UniProt site]: uniprot-organism%3A315277+keyword%3A181.xml
* Time taken to import: 1.20 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): 11/06/2013
::*Original file name as listed in beta.geneontology.org: go_daily-termdb.obo-xml.gz
* Time taken to import: 13.05 min
* Time taken to process: 10.80 min

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): 11/12/13
::*Original file name as listed in the FTP site: 22183.C_trachomatis_A.goa
* Time taken to import: 0.04 min

*Name of .gdb file: Ct-Std_v2_KS_20131121.gdb
*Time taken to export .gdb:
:*Start Time: 10:56 AM (approximately)
:*End Time: 11:14:26 AM
*Upload your file and link to it here: [[Media:Ct-Std External 20131121.gdb|Ct-Std_External_20131121.gdb]]

Note: There is one ID "CTA_0406/CTA_0407/CTA_0408" that is a combination of three predicted genes that are actually one gene. The Tally Engine and PostgreSQL count this as one gene resulting in a total gene count of 917. However, through Access and xmlpipedb ''match'', the ID is separated into three separate genes bringing the total count to 919.

[[Category:Journal Entry]]
[[Category:Individual Homework]]

File:Ct-Std External 20131121.gdb

2013-12-13T06:43:42Z

Ksherbina: Same database as Ct-Std v2 KS 20131121.gdb. The name was changed during the preparation of the final deliverables for the project.

Same database as Ct-Std v2 KS 20131121.gdb. The name was changed during the preparation of the final deliverables for the project.

File:Ct-Std v2 KS 20131121.gdb

2013-12-12T23:25:17Z

Ksherbina: Ksherbina uploaded a new version of "File:Ct-Std v2 KS 20131121.gdb": Same database but renamed

Database that was exported in the new build of gmbuilder created after adding the species specific database link to the C. trachomatis species profile.

Chlamydia trachomatis

2013-12-10T04:09:34Z

Ksherbina: /* Deliverable and Final Paper Assignments */ Removed a section

{{Team_H(oo)KD}}

==Deadlines and Intermediate Milestones==

{| style="width: 100%; margin: 1em 0" cellpadding="5ex"
! style="width: 8em; background: #ddd" | Monday
! style="width: 8em; background: #ddd" | Tuesday
! style="width: 8em; background: #ddd" | Wednesday
! style="width: 8em; background: #ddd" | Thursday
! style="width: 8em; background: #ddd" | Friday
|-
| style="vertical-align: top" | 11/11
| style="vertical-align: top" | 11/12
| style="vertical-align: top" | 11/13
| style="vertical-align: top" | 11/14
*Run an export/import of gene database cycle '''KS/HD'''
*Create species profile '''KS'''
*Obtain and format raw microarray data '''DW'''
| style="vertical-align: top" | 11/15
*Rerun export/import of gene database cycle '''KS'''
|-
| style="vertical-align: top; background: #eee" | 11/18
| style="vertical-align: top; background: #eee" | 11/19
*Vet the gene database exported last week '''KS/HD/DW'''
*Format the raw microarray data '''DW'''
| style="vertical-align: top; background: #eee" | 11/20
| style="vertical-align: top; background: #eee" | 11/21
*Perform microarray statistical analysis '''DW'''
*Work on custom species profile '''KS/HD'''
| style="vertical-align: top; background: #eee" | 11/22
|-
| style="vertical-align: top" | 11/25
| style="vertical-align: top" | 11/26
*Build and commit species profile '''KS'''
*Customize Tally Engine '''KS/HD'''
*Perform GenMAPP and MAPPFinder Analysis '''DW'''
| style="vertical-align: top" | 11/27
'''THANKSGIVING BREAK'''
| style="vertical-align: top" | 11/28
'''THANKSGIVING BREAK'''
| style="vertical-align: top" | 11/29
'''THANKSGIVING BREAK'''
|-
| style="vertical-align: top; background: #eee" | 12/02
| style="vertical-align: top; background: #eee" | 12/03
*Run GenMAPP and MAPPFinder '''DW'''
*Customize the Tally Engine for ''C. trachomatis'' '''DW'''
| style="vertical-align: top; background: #eee" | 12/04
*Run GenMAPP and MAPPFinder again if necessary '''DW'''
| style="vertical-align: top; background: #eee" | 12/05
*Begin working on the relational database schema '''HD'''
*Commit the changes to gmbuilder '''KS'''
| style="vertical-align: top; background: #eee" | 12/06
|-
| style="vertical-align: top; background: #eee" | 12/09
*Write the paper '''HD/KS/DW'''
*Work on the presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/10
*Write the paper '''HD/KS/DW'''
*Work on the presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/11
*Finish the presentation '''HD/KS/DW'''
*Practice the final presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/12
*'''Final Presentation'''
*Proofread the final paper '''HD/KS/DW'''
*Make sure all the deliverables are raedy '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/13
|}

==Deliverable and Final Paper Assignments==

===Deliverables===

*GenMAPP Gene Database for assigned species (.gdb) '''Katrina'''
*ReadMe file to accompany the Gene Database (.pdf)
:*Include Gene Database Schema diagram in ReadMe '''Hilda'''
*Gene Database Testing Report for final submitted Gene Database (print from wiki to .pdf file) '''Katrina'''
*Processed and analyzed DNA microarray dataset (.xls) '''Dillon'''
*GenMAPP Expression Dataset file (.gex) '''Dillon'''
*Filtered MAPPFinder Results (.xls) '''Dillon'''
*Sample MAPP file of a relevant biological pathway for your species (.mapp) '''Dillon'''

===Final Paper===

#Introduction '''Hilda'''
#Methods
#*Download the UniProt XML proteome set and GOA (GO association) files for your species. '''Katrina'''
#*Download GO terms from in the OBO-XML format. '''Katrina'''
#*Create the GenMAPP Builder tables in PostgreSQL. '''Katrina'''
#*Load files into PostgreSQL database via GenMAPP Builder. '''Katrina'''
#*Export into a GenMAPP Gene Database. '''Katrina'''
#*Inspect/vet/validate Gene Database. '''Katrina'''
#*Prepare microarray data (organize, normalize, perform statistical analysis) '''Dillon'''
#*Run GenMAPP and MAPPFinder using the Gene Database. '''Dillon'''
#Results
#*Gene Database Schema '''Hilda'''
#*Gene Database Testing Report on final version of Gene Database '''Katrina'''
#*Report on quantity and identity of gene IDs that did not make it into the database '''Hilda'''
#*Report on what changes need to be made to the GenMAPP Builder code in order to to accommodate the second and third type of missing gene IDs '''Katrina'''
#*Report results of the DNA microarray statistical analysis '''Dillon'''
#*Table of MAPPFinder results (from .xls) and MAPP of a pathway '''Dillon'''
#Discussion
#*How well did the GenMAPP Builder process work for your species? '''Katrina'''
#*Discuss the statistical analysis and MAPPFinder results for your microarray dataset. '''Dillon'''

==Annotated Bibliography of Genomics Papers for ''C. trachomatis''==

===Whole genome sequencing of ''C. trachomatis''===

#Journal article describing results of whole genome sequencing:
#*Database: PubMed
#*Search Terms: Chlamydia trachomatis [MeSH Term] AND genome [Title]
#*There were 29 results. Many of the articles that appear using the above search terms have to do with performing whole-genome analysis of ''C. trachomatis'' to look at genetic polymorphisms in the bacterium and molecular and genetic characteristics that are observed when the bacterium infects a host.
#*Stephens, R.S., Kalman, S., Lammel, C., Fan, J., Marathe, R., Aravind, L., Mitchell, W., Olinger, L., Tatusov, R., Zhao, Q., Koonin, E. V., Davis, R.W. (1998) Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis. ''Science'' '''282''': 754-759. 10.1126/science.282.5389.754.
#*The article can be viewed as an [http://www.sciencemag.org/content/282/5389/754.long HTML].
#*The article can be viewed as a [[Media: Science-1998-Stephens-754-9.pdf | PDF]].
#*This article includes [http://www.sciencemag.org/site/feature/data/982604.xhtml Supplementary Material].
#Use the genome sequencing article you found to perform a prospective search in the ISI Web of Science/Knowledge database.
#*How many results did you get?: 872
#*Based on the titles and abstracts of the papers, what type of research directions have been taken now that the genome for that organism has been sequenced?: Based on the titles and abstracts found in the results, it seems that research has been primarily in the direction of understanding how the disease functions and assessing the severity of the disease (and how to cure it).

===Microarray articles===
The EBI ArrayExpress database was used to locate each of the four following microarray articles. The search terms used were Chlamydia trachomatis under 'By organism' while the rest of the drop down menus were not altered.

The following articles appear in order of preference:

#Omsland, A., Sager, J., Nair, V., Sturdevant, D.E., Hackstadt, T. (2012) [http://www.pnas.org/content/109/48/19781.full Developmental stage-specific metabolic and transcriptional activity of Chlamydia trachomatis in an axenic medium]. ''PNAS'' '''109''': 19781-19785. doi: 10.1073/pnas.1212831109.
#*[[Media: PNAS-2012-Omsland-19781-5.pdf|PDF Version of Paper]]
#*[http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-39530/?keywords=&organism=Chlamydia+trachomatis&array=&exptype%5B%5D=&exptype%5B%5D= Microarray Data]
#*Chlamydiae undergo a biphasic developmental cycle characterized by an infectious cell type known as either an elementary body (EB) and an intracellular replicative form called a reticulate body (RB). Chlamydia was incubated under microaerobic conditions to test the differences in preferred energy source between EB's and RB's.
#*There weren't necessarily a "treatment" group and a "control" group. Referencing figure 4, it can be inferred that EB would be considered the "treatment" group because of the way that the ratio was set up, being EB to RB. In this way the RB would be the "control" group as they are looking at the EB population relative to the RB population.
#*4 Biological Replicates were performed for the control and treatment; as the article states that "Density gradient-purified EBs and RBs were incubated in quadruplicate in four-well plates...".


==Journal Club Presentation on ''C. trachomatis'' Genome Sequencing Paper==

[[Media:Presentation on C.trachomatis Genome Paper KS HD DW 20131112.pdf|Here]] is the PDF version of the presentation given on November 12, 2013 on the Stephens ''et al.'' (1998) paper.


==Important Files==

*[[Media: Go_daily-termdb_v2_HD_20131107.obo-xml.gz|Go_daily-termdb_v2_HD_20131107.obo-xml.gz]]
:GO OBO-XML file for ''C. trachomatis'' serovar A.
*[[Media: Uniprot XML C.trachomatis serovar A KS 20131114.xml|Uniprot XML C.trachomatis serovar A KS 20131114.xml]]
:UniProt XML file for ''C. trachomatis'' serovar A.
*[[Media: 22183.C trachomatis A KS 20131114.goa|22183.C trachomatis A KS 20131114.goa]]
:GOA file for ''C. trachomatis'' serovar A.
*[[Media: Ct-Std v1 KS 20131119.gdb|Ct-Std v1 KS 20131119.gdb]]
:First iteration of the export/import gene database cycle.
*[[Media: Ct-Std KS 20131121.gdb|Ct-Std KS 20131121.gdb]]
:Updated gene database.
*[[Media: Ct-Std v2 KS 20131121.gdb|Ct-Std v2 KS 20131121.gdb]]
:New gene database created with the new version of gmbuilder built on 11/21/2013 using the custom species profile for ''C. trachomatis.''

'''Array Express Article Raw Files'''

*[[Media:Raw_Data_DW_V1_12112013_A-GEOD-4692.adf.txt|Raw_Data_DW_V1_12112013_A-GEOD-4692.adf.txt]]
:Microarray raw ADF file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.idf.txt|Raw_Data_DW_V1_12112013_E-GEOD-39530.idf.txt]]
:Microarray raw IDF file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.processed.1.zip|Raw_Data_DW_V1_12112013_E-GEOD-39530.processed.1.zip]]
:Microarray raw processed zip file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.raw.1.zip|Raw_Data_DW_V1_12112013_E-GEOD-39530.raw.1.zip]]
:Microarray raw zip file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.sdrf.txt|Raw_Data_DW_V1_12112013_E-GEOD-39530.sdrf.txt]]
:Microarray SDRF text file for ''C. trachomatis''.
*[[Media:Sdrf information excel.xls|Sdrf information excel.xls]]
:sdrf information in excel spreadsheet format.

'''Excel Master Spreadsheet File Versions (Containing the Formatted Raw Data)'''

*[[Media:Master Spreadsheet.xls|master spreadsheet work in progress]]
:Initial Microarray raw data formatted and opened in Microsoft Excel.
*[[Media: C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.xls|C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.xls]]
:Sample C. trachomatis microarray data that was used to run GenMAPP with the latest ''C. trachomatis'' database.
*[[Media: C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt|C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt]]
:Tab-delimited version of the file above.
*[[Media: Final Master Spreadsheet.xls|Final Master Spreadsheet]]
:Microarray data with identified Gene ID's and Log base data.
*[[Media: Master_Spreadsheet_Chlamydia_20131125_KD.xls | Master_Spreadsheet_Chlamydia_20131125_KD.xls]]
:version edited during office hours by Dr. Dahlquist to demonstrate statistics
*[[Media: Master Spreadsheet Chlamydia V1 20131125 DW.xls]]
:finished version of version edited during office hours by Dr. Dahlquist to demonstrate statistics -- finished by Dillon Williams
*[[Media: Copy of Master Spreadsheet Chlamydia V2 20131203 HD.xls]]
:Microarray data with separated IDs CTA_#### from _RRMH#####_...
*[[Media: For GenMAPP Chlamydia V4 20131203 DW.xls]]
:Excel Speadsheet with data formatted for GenMAPP (Version run in GenMAPP).

'''For GenMAPP Files'''
*[[Media: Ctrachomatis EbtoRb ATP metabolic process.mapp]]
: EB to RB ATP Metabolic Process Mapp file.
*[[Media: Ctrachomatis EbtoRB glucose catabolic process 20131204 DW.mapp]]
: EB to RB glucose catabolic process map.
*[[Media: CTrachomatis EBtoRB Rifampicin 20131204 DW-Criterion1-GO.txt]]
: EB to RB Rifampicin go.txt file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.gex]]
: For_GenMAPP .gex file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.gmf]]
: For GenMAPP .gmf file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.zip]]
: Access file created with .gex file.
*[[Media: For GenMAPP Chlamydia V4 20131205 KS.gex]]
:Color set created in GenMAPP for EB to RB no rifampicin and increased/decreased logFC criteria

'''MAPPFinder Files'''
*[[Media:EB to RB No Rif V2-Criterion0-GO.txt]]
:Increased
*[[Media: EB to RB No Rif V2-Criterion1-GO.txt]]
:Deceased
*[[Media: For GenMAPP Chlamydia V4 20131205 KS.gmf]]
: File created by MAPPFinder after creating the latest color set.

==Wiki Formatting==

Template: [[Template:Team_H(oo)KD]]

[[Category: Group Projects]]
[[Category: Team H(oo)KD]]

Chlamydia trachomatis

2013-12-08T06:43:26Z

Ksherbina: /* Final Paper */ Added some more assignments

{{Team_H(oo)KD}}

==Deadlines and Intermediate Milestones==

{| style="width: 100%; margin: 1em 0" cellpadding="5ex"
! style="width: 8em; background: #ddd" | Monday
! style="width: 8em; background: #ddd" | Tuesday
! style="width: 8em; background: #ddd" | Wednesday
! style="width: 8em; background: #ddd" | Thursday
! style="width: 8em; background: #ddd" | Friday
|-
| style="vertical-align: top" | 11/11
| style="vertical-align: top" | 11/12
| style="vertical-align: top" | 11/13
| style="vertical-align: top" | 11/14
*Run an export/import of gene database cycle '''KS/HD'''
*Create species profile '''KS'''
*Obtain and format raw microarray data '''DW'''
| style="vertical-align: top" | 11/15
*Rerun export/import of gene database cycle '''KS'''
|-
| style="vertical-align: top; background: #eee" | 11/18
| style="vertical-align: top; background: #eee" | 11/19
*Vet the gene database exported last week '''KS/HD/DW'''
*Format the raw microarray data '''DW'''
| style="vertical-align: top; background: #eee" | 11/20
| style="vertical-align: top; background: #eee" | 11/21
*Perform microarray statistical analysis '''DW'''
*Work on custom species profile '''KS/HD'''
| style="vertical-align: top; background: #eee" | 11/22
|-
| style="vertical-align: top" | 11/25
| style="vertical-align: top" | 11/26
*Build and commit species profile '''KS'''
*Customize Tally Engine '''KS/HD'''
*Perform GenMAPP and MAPPFinder Analysis '''DW'''
| style="vertical-align: top" | 11/27
'''THANKSGIVING BREAK'''
| style="vertical-align: top" | 11/28
'''THANKSGIVING BREAK'''
| style="vertical-align: top" | 11/29
'''THANKSGIVING BREAK'''
|-
| style="vertical-align: top; background: #eee" | 12/02
| style="vertical-align: top; background: #eee" | 12/03
*Run GenMAPP and MAPPFinder '''DW'''
*Customize the Tally Engine for ''C. trachomatis'' '''DW'''
| style="vertical-align: top; background: #eee" | 12/04
*Run GenMAPP and MAPPFinder again if necessary '''DW'''
| style="vertical-align: top; background: #eee" | 12/05
*Begin working on the relational database schema '''HD'''
*Commit the changes to gmbuilder '''KS'''
| style="vertical-align: top; background: #eee" | 12/06
|-
| style="vertical-align: top; background: #eee" | 12/09
*Write the paper '''HD/KS/DW'''
*Work on the presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/10
*Write the paper '''HD/KS/DW'''
*Work on the presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/11
*Finish the presentation '''HD/KS/DW'''
*Practice the final presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/12
*'''Final Presentation'''
*Proofread the final paper '''HD/KS/DW'''
*Make sure all the deliverables are raedy '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/13
|}

==Deliverable and Final Paper Assignments==

===Deliverables===

*GenMAPP Gene Database for assigned species (.gdb) '''Katrina'''
*ReadMe file to accompany the Gene Database (.pdf)
:*Include Gene Database Schema diagram in ReadMe '''Hilda'''
*Gene Database Testing Report for final submitted Gene Database (print from wiki to .pdf file) '''Katrina'''
*Processed and analyzed DNA microarray dataset (.xls) '''Dillon'''
*GenMAPP Expression Dataset file (.gex) '''Dillon'''
*Filtered MAPPFinder Results (.xls) '''Dillon'''
*Sample MAPP file of a relevant biological pathway for your species (.mapp) '''Dillon'''

===Final Paper===

#Introduction '''Hilda'''
#Methods
#*Download the UniProt XML proteome set and GOA (GO association) files for your species. '''Katrina'''
#*Download GO terms from in the OBO-XML format. '''Katrina'''
#*Create the GenMAPP Builder tables in PostgreSQL. '''Katrina'''
#*Load files into PostgreSQL database via GenMAPP Builder. '''Katrina'''
#*Export into a GenMAPP Gene Database. '''Katrina'''
#*Inspect/vet/validate Gene Database. '''Katrina'''
#*Prepare microarray data (organize, normalize, perform statistical analysis) '''Dillon'''
#*Run GenMAPP and MAPPFinder using the Gene Database. '''Dillon'''
#Results
#*Gene Database Schema '''Hilda'''
#*Gene Database Testing Report on final version of Gene Database '''Katrina'''
#*Report on quantity and identity of gene IDs that did not make it into the database '''Hilda'''
#*Report on what changes need to be made to the GenMAPP Builder code in order to to accommodate the second and third type of missing gene IDs '''Katrina'''
#*Report results of the DNA microarray statistical analysis '''Dillon'''
#*Table of MAPPFinder results (from .xls) and MAPP of a pathway '''Dillon'''
#Discussion
#*How well did the GenMAPP Builder process work for your species? '''Katrina'''
#*Discuss the statistical analysis and MAPPFinder results for your microarray dataset. '''Dillon'''

=== References ===

* Please use the [[Guidelines for Literature Citations in a Scientific Paper]]. The [http://en.wikipedia.org/wiki/APA_style#Citation APA format] is also very similar to these guidelines and will be acceptable. Be consistent with your format for the in text citations and for your references list at the end.

==Annotated Bibliography of Genomics Papers for ''C. trachomatis''==

===Whole genome sequencing of ''C. trachomatis''===

#Journal article describing results of whole genome sequencing:
#*Database: PubMed
#*Search Terms: Chlamydia trachomatis [MeSH Term] AND genome [Title]
#*There were 29 results. Many of the articles that appear using the above search terms have to do with performing whole-genome analysis of ''C. trachomatis'' to look at genetic polymorphisms in the bacterium and molecular and genetic characteristics that are observed when the bacterium infects a host.
#*Stephens, R.S., Kalman, S., Lammel, C., Fan, J., Marathe, R., Aravind, L., Mitchell, W., Olinger, L., Tatusov, R., Zhao, Q., Koonin, E. V., Davis, R.W. (1998) Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis. ''Science'' '''282''': 754-759. 10.1126/science.282.5389.754.
#*The article can be viewed as an [http://www.sciencemag.org/content/282/5389/754.long HTML].
#*The article can be viewed as a [[Media: Science-1998-Stephens-754-9.pdf | PDF]].
#*This article includes [http://www.sciencemag.org/site/feature/data/982604.xhtml Supplementary Material].
#Use the genome sequencing article you found to perform a prospective search in the ISI Web of Science/Knowledge database.
#*How many results did you get?: 872
#*Based on the titles and abstracts of the papers, what type of research directions have been taken now that the genome for that organism has been sequenced?: Based on the titles and abstracts found in the results, it seems that research has been primarily in the direction of understanding how the disease functions and assessing the severity of the disease (and how to cure it).

===Microarray articles===
The EBI ArrayExpress database was used to locate each of the four following microarray articles. The search terms used were Chlamydia trachomatis under 'By organism' while the rest of the drop down menus were not altered.

The following articles appear in order of preference:

#Omsland, A., Sager, J., Nair, V., Sturdevant, D.E., Hackstadt, T. (2012) [http://www.pnas.org/content/109/48/19781.full Developmental stage-specific metabolic and transcriptional activity of Chlamydia trachomatis in an axenic medium]. ''PNAS'' '''109''': 19781-19785. doi: 10.1073/pnas.1212831109.
#*[[Media: PNAS-2012-Omsland-19781-5.pdf|PDF Version of Paper]]
#*[http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-39530/?keywords=&organism=Chlamydia+trachomatis&array=&exptype%5B%5D=&exptype%5B%5D= Microarray Data]
#*Chlamydiae undergo a biphasic developmental cycle characterized by an infectious cell type known as either an elementary body (EB) and an intracellular replicative form called a reticulate body (RB). Chlamydia was incubated under microaerobic conditions to test the differences in preferred energy source between EB's and RB's.
#*There weren't necessarily a "treatment" group and a "control" group. Referencing figure 4, it can be inferred that EB would be considered the "treatment" group because of the way that the ratio was set up, being EB to RB. In this way the RB would be the "control" group as they are looking at the EB population relative to the RB population.
#*4 Biological Replicates were performed for the control and treatment; as the article states that "Density gradient-purified EBs and RBs were incubated in quadruplicate in four-well plates...".


==Journal Club Presentation on ''C. trachomatis'' Genome Sequencing Paper==

[[Media:Presentation on C.trachomatis Genome Paper KS HD DW 20131112.pdf|Here]] is the PDF version of the presentation given on November 12, 2013 on the Stephens ''et al.'' (1998) paper.


==Important Files==

*[[Media: Go_daily-termdb_v2_HD_20131107.obo-xml.gz|Go_daily-termdb_v2_HD_20131107.obo-xml.gz]]
:GO OBO-XML file for ''C. trachomatis'' serovar A.
*[[Media: Uniprot XML C.trachomatis serovar A KS 20131114.xml|Uniprot XML C.trachomatis serovar A KS 20131114.xml]]
:UniProt XML file for ''C. trachomatis'' serovar A.
*[[Media: 22183.C trachomatis A KS 20131114.goa|22183.C trachomatis A KS 20131114.goa]]
:GOA file for ''C. trachomatis'' serovar A.
*[[Media: Ct-Std v1 KS 20131119.gdb|Ct-Std v1 KS 20131119.gdb]]
:First iteration of the export/import gene database cycle.
*[[Media: Ct-Std KS 20131121.gdb|Ct-Std KS 20131121.gdb]]
:Updated gene database.
*[[Media: Ct-Std v2 KS 20131121.gdb|Ct-Std v2 KS 20131121.gdb]]
:New gene database created with the new version of gmbuilder built on 11/21/2013 using the custom species profile for ''C. trachomatis.''

'''Array Express Article Raw Files'''

*[[Media:Raw_Data_DW_V1_12112013_A-GEOD-4692.adf.txt|Raw_Data_DW_V1_12112013_A-GEOD-4692.adf.txt]]
:Microarray raw ADF file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.idf.txt|Raw_Data_DW_V1_12112013_E-GEOD-39530.idf.txt]]
:Microarray raw IDF file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.processed.1.zip|Raw_Data_DW_V1_12112013_E-GEOD-39530.processed.1.zip]]
:Microarray raw processed zip file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.raw.1.zip|Raw_Data_DW_V1_12112013_E-GEOD-39530.raw.1.zip]]
:Microarray raw zip file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.sdrf.txt|Raw_Data_DW_V1_12112013_E-GEOD-39530.sdrf.txt]]
:Microarray SDRF text file for ''C. trachomatis''.
*[[Media:Sdrf information excel.xls|Sdrf information excel.xls]]
:sdrf information in excel spreadsheet format.

'''Excel Master Spreadsheet File Versions (Containing the Formatted Raw Data)'''

*[[Media:Master Spreadsheet.xls|master spreadsheet work in progress]]
:Initial Microarray raw data formatted and opened in Microsoft Excel.
*[[Media: C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.xls|C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.xls]]
:Sample C. trachomatis microarray data that was used to run GenMAPP with the latest ''C. trachomatis'' database.
*[[Media: C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt|C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt]]
:Tab-delimited version of the file above.
*[[Media: Final Master Spreadsheet.xls|Final Master Spreadsheet]]
:Microarray data with identified Gene ID's and Log base data.
*[[Media: Master_Spreadsheet_Chlamydia_20131125_KD.xls | Master_Spreadsheet_Chlamydia_20131125_KD.xls]]
:version edited during office hours by Dr. Dahlquist to demonstrate statistics
*[[Media: Master Spreadsheet Chlamydia V1 20131125 DW.xls]]
:finished version of version edited during office hours by Dr. Dahlquist to demonstrate statistics -- finished by Dillon Williams
*[[Media: Copy of Master Spreadsheet Chlamydia V2 20131203 HD.xls]]
:Microarray data with separated IDs CTA_#### from _RRMH#####_...
*[[Media: For GenMAPP Chlamydia V4 20131203 DW.xls]]
:Excel Speadsheet with data formatted for GenMAPP (Version run in GenMAPP).

'''For GenMAPP Files'''
*[[Media: Ctrachomatis EbtoRb ATP metabolic process.mapp]]
: EB to RB ATP Metabolic Process Mapp file.
*[[Media: Ctrachomatis EbtoRB glucose catabolic process 20131204 DW.mapp]]
: EB to RB glucose catabolic process map.
*[[Media: CTrachomatis EBtoRB Rifampicin 20131204 DW-Criterion1-GO.txt]]
: EB to RB Rifampicin go.txt file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.gex]]
: For_GenMAPP .gex file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.gmf]]
: For GenMAPP .gmf file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.zip]]
: Access file created with .gex file.
*[[Media: For GenMAPP Chlamydia V4 20131205 KS.gex]]
:Color set created in GenMAPP for EB to RB no rifampicin and increased/decreased logFC criteria

'''MAPPFinder Files'''
*[[Media:EB to RB No Rif V2-Criterion0-GO.txt]]
:Increased
*[[Media: EB to RB No Rif V2-Criterion1-GO.txt]]
:Deceased
*[[Media: For GenMAPP Chlamydia V4 20131205 KS.gmf]]
: File created by MAPPFinder after creating the latest color set.

==Wiki Formatting==

Template: [[Template:Team_H(oo)KD]]

[[Category: Group Projects]]
[[Category: Team H(oo)KD]]

Chlamydia trachomatis

2013-12-08T06:42:55Z

Ksherbina: /* Final Paper */ Added assignments

{{Team_H(oo)KD}}

==Deadlines and Intermediate Milestones==

{| style="width: 100%; margin: 1em 0" cellpadding="5ex"
! style="width: 8em; background: #ddd" | Monday
! style="width: 8em; background: #ddd" | Tuesday
! style="width: 8em; background: #ddd" | Wednesday
! style="width: 8em; background: #ddd" | Thursday
! style="width: 8em; background: #ddd" | Friday
|-
| style="vertical-align: top" | 11/11
| style="vertical-align: top" | 11/12
| style="vertical-align: top" | 11/13
| style="vertical-align: top" | 11/14
*Run an export/import of gene database cycle '''KS/HD'''
*Create species profile '''KS'''
*Obtain and format raw microarray data '''DW'''
| style="vertical-align: top" | 11/15
*Rerun export/import of gene database cycle '''KS'''
|-
| style="vertical-align: top; background: #eee" | 11/18
| style="vertical-align: top; background: #eee" | 11/19
*Vet the gene database exported last week '''KS/HD/DW'''
*Format the raw microarray data '''DW'''
| style="vertical-align: top; background: #eee" | 11/20
| style="vertical-align: top; background: #eee" | 11/21
*Perform microarray statistical analysis '''DW'''
*Work on custom species profile '''KS/HD'''
| style="vertical-align: top; background: #eee" | 11/22
|-
| style="vertical-align: top" | 11/25
| style="vertical-align: top" | 11/26
*Build and commit species profile '''KS'''
*Customize Tally Engine '''KS/HD'''
*Perform GenMAPP and MAPPFinder Analysis '''DW'''
| style="vertical-align: top" | 11/27
'''THANKSGIVING BREAK'''
| style="vertical-align: top" | 11/28
'''THANKSGIVING BREAK'''
| style="vertical-align: top" | 11/29
'''THANKSGIVING BREAK'''
|-
| style="vertical-align: top; background: #eee" | 12/02
| style="vertical-align: top; background: #eee" | 12/03
*Run GenMAPP and MAPPFinder '''DW'''
*Customize the Tally Engine for ''C. trachomatis'' '''DW'''
| style="vertical-align: top; background: #eee" | 12/04
*Run GenMAPP and MAPPFinder again if necessary '''DW'''
| style="vertical-align: top; background: #eee" | 12/05
*Begin working on the relational database schema '''HD'''
*Commit the changes to gmbuilder '''KS'''
| style="vertical-align: top; background: #eee" | 12/06
|-
| style="vertical-align: top; background: #eee" | 12/09
*Write the paper '''HD/KS/DW'''
*Work on the presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/10
*Write the paper '''HD/KS/DW'''
*Work on the presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/11
*Finish the presentation '''HD/KS/DW'''
*Practice the final presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/12
*'''Final Presentation'''
*Proofread the final paper '''HD/KS/DW'''
*Make sure all the deliverables are raedy '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/13
|}

==Deliverable and Final Paper Assignments==

===Deliverables===

*GenMAPP Gene Database for assigned species (.gdb) '''Katrina'''
*ReadMe file to accompany the Gene Database (.pdf)
:*Include Gene Database Schema diagram in ReadMe '''Hilda'''
*Gene Database Testing Report for final submitted Gene Database (print from wiki to .pdf file) '''Katrina'''
*Processed and analyzed DNA microarray dataset (.xls) '''Dillon'''
*GenMAPP Expression Dataset file (.gex) '''Dillon'''
*Filtered MAPPFinder Results (.xls) '''Dillon'''
*Sample MAPP file of a relevant biological pathway for your species (.mapp) '''Dillon'''

===Final Paper===

#Introduction '''Hilda'''
#Methods
#*Download the UniProt XML proteome set and GOA (GO association) files for your species. '''Katrina'''
#*Download GO terms from in the OBO-XML format. '''Katrina'''
#*Create the GenMAPP Builder tables in PostgreSQL. '''Katrina'''
#*Load files into PostgreSQL database via GenMAPP Builder. '''Katrina'''
#*Export into a GenMAPP Gene Database. '''Katrina'''
#*Inspect/vet/validate Gene Database. '''Katrina'''
#*Prepare microarray data (organize, normalize, perform statistical analysis) '''Dillon'''
#*Run GenMAPP and MAPPFinder using the Gene Database. '''Dillon'''
#Results
#*Gene Database Schema '''Hilda'''
#*Gene Database Testing Report on final version of Gene Database '''Dillon'''
#*Report on quantity and identity of gene IDs that did not make it into the database '''Hilda'''
#*Report on what changes need to be made to the GenMAPP Builder code in order to to accommodate the second and third type of missing gene IDs '''Katrina'''
#*Report results of the DNA microarray statistical analysis
#*Table of MAPPFinder results (from .xls) and MAPP of a pathway
#Discussion
#*How well did the GenMAPP Builder process work for your species? '''Katrina'''
#*Discuss the statistical analysis and MAPPFinder results for your microarray dataset. '''Dillon'''

=== References ===

* Please use the [[Guidelines for Literature Citations in a Scientific Paper]]. The [http://en.wikipedia.org/wiki/APA_style#Citation APA format] is also very similar to these guidelines and will be acceptable. Be consistent with your format for the in text citations and for your references list at the end.

==Annotated Bibliography of Genomics Papers for ''C. trachomatis''==

===Whole genome sequencing of ''C. trachomatis''===

#Journal article describing results of whole genome sequencing:
#*Database: PubMed
#*Search Terms: Chlamydia trachomatis [MeSH Term] AND genome [Title]
#*There were 29 results. Many of the articles that appear using the above search terms have to do with performing whole-genome analysis of ''C. trachomatis'' to look at genetic polymorphisms in the bacterium and molecular and genetic characteristics that are observed when the bacterium infects a host.
#*Stephens, R.S., Kalman, S., Lammel, C., Fan, J., Marathe, R., Aravind, L., Mitchell, W., Olinger, L., Tatusov, R., Zhao, Q., Koonin, E. V., Davis, R.W. (1998) Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis. ''Science'' '''282''': 754-759. 10.1126/science.282.5389.754.
#*The article can be viewed as an [http://www.sciencemag.org/content/282/5389/754.long HTML].
#*The article can be viewed as a [[Media: Science-1998-Stephens-754-9.pdf | PDF]].
#*This article includes [http://www.sciencemag.org/site/feature/data/982604.xhtml Supplementary Material].
#Use the genome sequencing article you found to perform a prospective search in the ISI Web of Science/Knowledge database.
#*How many results did you get?: 872
#*Based on the titles and abstracts of the papers, what type of research directions have been taken now that the genome for that organism has been sequenced?: Based on the titles and abstracts found in the results, it seems that research has been primarily in the direction of understanding how the disease functions and assessing the severity of the disease (and how to cure it).

===Microarray articles===
The EBI ArrayExpress database was used to locate each of the four following microarray articles. The search terms used were Chlamydia trachomatis under 'By organism' while the rest of the drop down menus were not altered.

The following articles appear in order of preference:

#Omsland, A., Sager, J., Nair, V., Sturdevant, D.E., Hackstadt, T. (2012) [http://www.pnas.org/content/109/48/19781.full Developmental stage-specific metabolic and transcriptional activity of Chlamydia trachomatis in an axenic medium]. ''PNAS'' '''109''': 19781-19785. doi: 10.1073/pnas.1212831109.
#*[[Media: PNAS-2012-Omsland-19781-5.pdf|PDF Version of Paper]]
#*[http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-39530/?keywords=&organism=Chlamydia+trachomatis&array=&exptype%5B%5D=&exptype%5B%5D= Microarray Data]
#*Chlamydiae undergo a biphasic developmental cycle characterized by an infectious cell type known as either an elementary body (EB) and an intracellular replicative form called a reticulate body (RB). Chlamydia was incubated under microaerobic conditions to test the differences in preferred energy source between EB's and RB's.
#*There weren't necessarily a "treatment" group and a "control" group. Referencing figure 4, it can be inferred that EB would be considered the "treatment" group because of the way that the ratio was set up, being EB to RB. In this way the RB would be the "control" group as they are looking at the EB population relative to the RB population.
#*4 Biological Replicates were performed for the control and treatment; as the article states that "Density gradient-purified EBs and RBs were incubated in quadruplicate in four-well plates...".


==Journal Club Presentation on ''C. trachomatis'' Genome Sequencing Paper==

[[Media:Presentation on C.trachomatis Genome Paper KS HD DW 20131112.pdf|Here]] is the PDF version of the presentation given on November 12, 2013 on the Stephens ''et al.'' (1998) paper.


==Important Files==

*[[Media: Go_daily-termdb_v2_HD_20131107.obo-xml.gz|Go_daily-termdb_v2_HD_20131107.obo-xml.gz]]
:GO OBO-XML file for ''C. trachomatis'' serovar A.
*[[Media: Uniprot XML C.trachomatis serovar A KS 20131114.xml|Uniprot XML C.trachomatis serovar A KS 20131114.xml]]
:UniProt XML file for ''C. trachomatis'' serovar A.
*[[Media: 22183.C trachomatis A KS 20131114.goa|22183.C trachomatis A KS 20131114.goa]]
:GOA file for ''C. trachomatis'' serovar A.
*[[Media: Ct-Std v1 KS 20131119.gdb|Ct-Std v1 KS 20131119.gdb]]
:First iteration of the export/import gene database cycle.
*[[Media: Ct-Std KS 20131121.gdb|Ct-Std KS 20131121.gdb]]
:Updated gene database.
*[[Media: Ct-Std v2 KS 20131121.gdb|Ct-Std v2 KS 20131121.gdb]]
:New gene database created with the new version of gmbuilder built on 11/21/2013 using the custom species profile for ''C. trachomatis.''

'''Array Express Article Raw Files'''

*[[Media:Raw_Data_DW_V1_12112013_A-GEOD-4692.adf.txt|Raw_Data_DW_V1_12112013_A-GEOD-4692.adf.txt]]
:Microarray raw ADF file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.idf.txt|Raw_Data_DW_V1_12112013_E-GEOD-39530.idf.txt]]
:Microarray raw IDF file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.processed.1.zip|Raw_Data_DW_V1_12112013_E-GEOD-39530.processed.1.zip]]
:Microarray raw processed zip file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.raw.1.zip|Raw_Data_DW_V1_12112013_E-GEOD-39530.raw.1.zip]]
:Microarray raw zip file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.sdrf.txt|Raw_Data_DW_V1_12112013_E-GEOD-39530.sdrf.txt]]
:Microarray SDRF text file for ''C. trachomatis''.
*[[Media:Sdrf information excel.xls|Sdrf information excel.xls]]
:sdrf information in excel spreadsheet format.

'''Excel Master Spreadsheet File Versions (Containing the Formatted Raw Data)'''

*[[Media:Master Spreadsheet.xls|master spreadsheet work in progress]]
:Initial Microarray raw data formatted and opened in Microsoft Excel.
*[[Media: C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.xls|C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.xls]]
:Sample C. trachomatis microarray data that was used to run GenMAPP with the latest ''C. trachomatis'' database.
*[[Media: C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt|C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt]]
:Tab-delimited version of the file above.
*[[Media: Final Master Spreadsheet.xls|Final Master Spreadsheet]]
:Microarray data with identified Gene ID's and Log base data.
*[[Media: Master_Spreadsheet_Chlamydia_20131125_KD.xls | Master_Spreadsheet_Chlamydia_20131125_KD.xls]]
:version edited during office hours by Dr. Dahlquist to demonstrate statistics
*[[Media: Master Spreadsheet Chlamydia V1 20131125 DW.xls]]
:finished version of version edited during office hours by Dr. Dahlquist to demonstrate statistics -- finished by Dillon Williams
*[[Media: Copy of Master Spreadsheet Chlamydia V2 20131203 HD.xls]]
:Microarray data with separated IDs CTA_#### from _RRMH#####_...
*[[Media: For GenMAPP Chlamydia V4 20131203 DW.xls]]
:Excel Speadsheet with data formatted for GenMAPP (Version run in GenMAPP).

'''For GenMAPP Files'''
*[[Media: Ctrachomatis EbtoRb ATP metabolic process.mapp]]
: EB to RB ATP Metabolic Process Mapp file.
*[[Media: Ctrachomatis EbtoRB glucose catabolic process 20131204 DW.mapp]]
: EB to RB glucose catabolic process map.
*[[Media: CTrachomatis EBtoRB Rifampicin 20131204 DW-Criterion1-GO.txt]]
: EB to RB Rifampicin go.txt file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.gex]]
: For_GenMAPP .gex file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.gmf]]
: For GenMAPP .gmf file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.zip]]
: Access file created with .gex file.
*[[Media: For GenMAPP Chlamydia V4 20131205 KS.gex]]
:Color set created in GenMAPP for EB to RB no rifampicin and increased/decreased logFC criteria

'''MAPPFinder Files'''
*[[Media:EB to RB No Rif V2-Criterion0-GO.txt]]
:Increased
*[[Media: EB to RB No Rif V2-Criterion1-GO.txt]]
:Deceased
*[[Media: For GenMAPP Chlamydia V4 20131205 KS.gmf]]
: File created by MAPPFinder after creating the latest color set.

==Wiki Formatting==

Template: [[Template:Team_H(oo)KD]]

[[Category: Group Projects]]
[[Category: Team H(oo)KD]]

Chlamydia trachomatis

2013-12-08T06:40:23Z

Ksherbina: /* Final Paper */ Added more assignments

{{Team_H(oo)KD}}

==Deadlines and Intermediate Milestones==

{| style="width: 100%; margin: 1em 0" cellpadding="5ex"
! style="width: 8em; background: #ddd" | Monday
! style="width: 8em; background: #ddd" | Tuesday
! style="width: 8em; background: #ddd" | Wednesday
! style="width: 8em; background: #ddd" | Thursday
! style="width: 8em; background: #ddd" | Friday
|-
| style="vertical-align: top" | 11/11
| style="vertical-align: top" | 11/12
| style="vertical-align: top" | 11/13
| style="vertical-align: top" | 11/14
*Run an export/import of gene database cycle '''KS/HD'''
*Create species profile '''KS'''
*Obtain and format raw microarray data '''DW'''
| style="vertical-align: top" | 11/15
*Rerun export/import of gene database cycle '''KS'''
|-
| style="vertical-align: top; background: #eee" | 11/18
| style="vertical-align: top; background: #eee" | 11/19
*Vet the gene database exported last week '''KS/HD/DW'''
*Format the raw microarray data '''DW'''
| style="vertical-align: top; background: #eee" | 11/20
| style="vertical-align: top; background: #eee" | 11/21
*Perform microarray statistical analysis '''DW'''
*Work on custom species profile '''KS/HD'''
| style="vertical-align: top; background: #eee" | 11/22
|-
| style="vertical-align: top" | 11/25
| style="vertical-align: top" | 11/26
*Build and commit species profile '''KS'''
*Customize Tally Engine '''KS/HD'''
*Perform GenMAPP and MAPPFinder Analysis '''DW'''
| style="vertical-align: top" | 11/27
'''THANKSGIVING BREAK'''
| style="vertical-align: top" | 11/28
'''THANKSGIVING BREAK'''
| style="vertical-align: top" | 11/29
'''THANKSGIVING BREAK'''
|-
| style="vertical-align: top; background: #eee" | 12/02
| style="vertical-align: top; background: #eee" | 12/03
*Run GenMAPP and MAPPFinder '''DW'''
*Customize the Tally Engine for ''C. trachomatis'' '''DW'''
| style="vertical-align: top; background: #eee" | 12/04
*Run GenMAPP and MAPPFinder again if necessary '''DW'''
| style="vertical-align: top; background: #eee" | 12/05
*Begin working on the relational database schema '''HD'''
*Commit the changes to gmbuilder '''KS'''
| style="vertical-align: top; background: #eee" | 12/06
|-
| style="vertical-align: top; background: #eee" | 12/09
*Write the paper '''HD/KS/DW'''
*Work on the presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/10
*Write the paper '''HD/KS/DW'''
*Work on the presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/11
*Finish the presentation '''HD/KS/DW'''
*Practice the final presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/12
*'''Final Presentation'''
*Proofread the final paper '''HD/KS/DW'''
*Make sure all the deliverables are raedy '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/13
|}

==Deliverable and Final Paper Assignments==

===Deliverables===

*GenMAPP Gene Database for assigned species (.gdb) '''Katrina'''
*ReadMe file to accompany the Gene Database (.pdf)
:*Include Gene Database Schema diagram in ReadMe '''Hilda'''
*Gene Database Testing Report for final submitted Gene Database (print from wiki to .pdf file) '''Katrina'''
*Processed and analyzed DNA microarray dataset (.xls) '''Dillon'''
*GenMAPP Expression Dataset file (.gex) '''Dillon'''
*Filtered MAPPFinder Results (.xls) '''Dillon'''
*Sample MAPP file of a relevant biological pathway for your species (.mapp) '''Dillon'''

===Final Paper===

#Introduction
#Methods
#*Download the UniProt XML proteome set and GOA (GO association) files for your species.
#*Download GO terms from in the OBO-XML format.
#*Create the GenMAPP Builder tables in PostgreSQL.
#*Load files into PostgreSQL database via GenMAPP Builder.
#*Export into a GenMAPP Gene Database.
#*Inspect/vet/validate Gene Database.
#*Prepare microarray data (organize, normalize, perform statistical analysis) '''Dillon'''
#*Run GenMAPP and MAPPFinder using the Gene Database. '''Dillon'''
#Results
#*Gene Database Schema '''Hilda'''
#*Gene Database Testing Report on final version of Gene Database '''Dillon'''
#*Report on quantity and identity of gene IDs that did not make it into the database '''Katrina'''
#*Report on what changes need to be made to the GenMAPP Builder code in order to to accommodate the second and third type of missing gene IDs '''Katrina'''
#*Report results of the DNA microarray statistical analysis
#*Table of MAPPFinder results (from .xls) and MAPP of a pathway
#Discussion
#*How well did the GenMAPP Builder process work for your species? '''Katrina'''
#*Discuss the statistical analysis and MAPPFinder results for your microarray dataset. '''Dillon'''

=== References ===

* Please use the [[Guidelines for Literature Citations in a Scientific Paper]]. The [http://en.wikipedia.org/wiki/APA_style#Citation APA format] is also very similar to these guidelines and will be acceptable. Be consistent with your format for the in text citations and for your references list at the end.

==Annotated Bibliography of Genomics Papers for ''C. trachomatis''==

===Whole genome sequencing of ''C. trachomatis''===

#Journal article describing results of whole genome sequencing:
#*Database: PubMed
#*Search Terms: Chlamydia trachomatis [MeSH Term] AND genome [Title]
#*There were 29 results. Many of the articles that appear using the above search terms have to do with performing whole-genome analysis of ''C. trachomatis'' to look at genetic polymorphisms in the bacterium and molecular and genetic characteristics that are observed when the bacterium infects a host.
#*Stephens, R.S., Kalman, S., Lammel, C., Fan, J., Marathe, R., Aravind, L., Mitchell, W., Olinger, L., Tatusov, R., Zhao, Q., Koonin, E. V., Davis, R.W. (1998) Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis. ''Science'' '''282''': 754-759. 10.1126/science.282.5389.754.
#*The article can be viewed as an [http://www.sciencemag.org/content/282/5389/754.long HTML].
#*The article can be viewed as a [[Media: Science-1998-Stephens-754-9.pdf | PDF]].
#*This article includes [http://www.sciencemag.org/site/feature/data/982604.xhtml Supplementary Material].
#Use the genome sequencing article you found to perform a prospective search in the ISI Web of Science/Knowledge database.
#*How many results did you get?: 872
#*Based on the titles and abstracts of the papers, what type of research directions have been taken now that the genome for that organism has been sequenced?: Based on the titles and abstracts found in the results, it seems that research has been primarily in the direction of understanding how the disease functions and assessing the severity of the disease (and how to cure it).

===Microarray articles===
The EBI ArrayExpress database was used to locate each of the four following microarray articles. The search terms used were Chlamydia trachomatis under 'By organism' while the rest of the drop down menus were not altered.

The following articles appear in order of preference:

#Omsland, A., Sager, J., Nair, V., Sturdevant, D.E., Hackstadt, T. (2012) [http://www.pnas.org/content/109/48/19781.full Developmental stage-specific metabolic and transcriptional activity of Chlamydia trachomatis in an axenic medium]. ''PNAS'' '''109''': 19781-19785. doi: 10.1073/pnas.1212831109.
#*[[Media: PNAS-2012-Omsland-19781-5.pdf|PDF Version of Paper]]
#*[http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-39530/?keywords=&organism=Chlamydia+trachomatis&array=&exptype%5B%5D=&exptype%5B%5D= Microarray Data]
#*Chlamydiae undergo a biphasic developmental cycle characterized by an infectious cell type known as either an elementary body (EB) and an intracellular replicative form called a reticulate body (RB). Chlamydia was incubated under microaerobic conditions to test the differences in preferred energy source between EB's and RB's.
#*There weren't necessarily a "treatment" group and a "control" group. Referencing figure 4, it can be inferred that EB would be considered the "treatment" group because of the way that the ratio was set up, being EB to RB. In this way the RB would be the "control" group as they are looking at the EB population relative to the RB population.
#*4 Biological Replicates were performed for the control and treatment; as the article states that "Density gradient-purified EBs and RBs were incubated in quadruplicate in four-well plates...".


==Journal Club Presentation on ''C. trachomatis'' Genome Sequencing Paper==

[[Media:Presentation on C.trachomatis Genome Paper KS HD DW 20131112.pdf|Here]] is the PDF version of the presentation given on November 12, 2013 on the Stephens ''et al.'' (1998) paper.


==Important Files==

*[[Media: Go_daily-termdb_v2_HD_20131107.obo-xml.gz|Go_daily-termdb_v2_HD_20131107.obo-xml.gz]]
:GO OBO-XML file for ''C. trachomatis'' serovar A.
*[[Media: Uniprot XML C.trachomatis serovar A KS 20131114.xml|Uniprot XML C.trachomatis serovar A KS 20131114.xml]]
:UniProt XML file for ''C. trachomatis'' serovar A.
*[[Media: 22183.C trachomatis A KS 20131114.goa|22183.C trachomatis A KS 20131114.goa]]
:GOA file for ''C. trachomatis'' serovar A.
*[[Media: Ct-Std v1 KS 20131119.gdb|Ct-Std v1 KS 20131119.gdb]]
:First iteration of the export/import gene database cycle.
*[[Media: Ct-Std KS 20131121.gdb|Ct-Std KS 20131121.gdb]]
:Updated gene database.
*[[Media: Ct-Std v2 KS 20131121.gdb|Ct-Std v2 KS 20131121.gdb]]
:New gene database created with the new version of gmbuilder built on 11/21/2013 using the custom species profile for ''C. trachomatis.''

'''Array Express Article Raw Files'''

*[[Media:Raw_Data_DW_V1_12112013_A-GEOD-4692.adf.txt|Raw_Data_DW_V1_12112013_A-GEOD-4692.adf.txt]]
:Microarray raw ADF file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.idf.txt|Raw_Data_DW_V1_12112013_E-GEOD-39530.idf.txt]]
:Microarray raw IDF file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.processed.1.zip|Raw_Data_DW_V1_12112013_E-GEOD-39530.processed.1.zip]]
:Microarray raw processed zip file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.raw.1.zip|Raw_Data_DW_V1_12112013_E-GEOD-39530.raw.1.zip]]
:Microarray raw zip file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.sdrf.txt|Raw_Data_DW_V1_12112013_E-GEOD-39530.sdrf.txt]]
:Microarray SDRF text file for ''C. trachomatis''.
*[[Media:Sdrf information excel.xls|Sdrf information excel.xls]]
:sdrf information in excel spreadsheet format.

'''Excel Master Spreadsheet File Versions (Containing the Formatted Raw Data)'''

*[[Media:Master Spreadsheet.xls|master spreadsheet work in progress]]
:Initial Microarray raw data formatted and opened in Microsoft Excel.
*[[Media: C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.xls|C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.xls]]
:Sample C. trachomatis microarray data that was used to run GenMAPP with the latest ''C. trachomatis'' database.
*[[Media: C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt|C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt]]
:Tab-delimited version of the file above.
*[[Media: Final Master Spreadsheet.xls|Final Master Spreadsheet]]
:Microarray data with identified Gene ID's and Log base data.
*[[Media: Master_Spreadsheet_Chlamydia_20131125_KD.xls | Master_Spreadsheet_Chlamydia_20131125_KD.xls]]
:version edited during office hours by Dr. Dahlquist to demonstrate statistics
*[[Media: Master Spreadsheet Chlamydia V1 20131125 DW.xls]]
:finished version of version edited during office hours by Dr. Dahlquist to demonstrate statistics -- finished by Dillon Williams
*[[Media: Copy of Master Spreadsheet Chlamydia V2 20131203 HD.xls]]
:Microarray data with separated IDs CTA_#### from _RRMH#####_...
*[[Media: For GenMAPP Chlamydia V4 20131203 DW.xls]]
:Excel Speadsheet with data formatted for GenMAPP (Version run in GenMAPP).

'''For GenMAPP Files'''
*[[Media: Ctrachomatis EbtoRb ATP metabolic process.mapp]]
: EB to RB ATP Metabolic Process Mapp file.
*[[Media: Ctrachomatis EbtoRB glucose catabolic process 20131204 DW.mapp]]
: EB to RB glucose catabolic process map.
*[[Media: CTrachomatis EBtoRB Rifampicin 20131204 DW-Criterion1-GO.txt]]
: EB to RB Rifampicin go.txt file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.gex]]
: For_GenMAPP .gex file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.gmf]]
: For GenMAPP .gmf file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.zip]]
: Access file created with .gex file.
*[[Media: For GenMAPP Chlamydia V4 20131205 KS.gex]]
:Color set created in GenMAPP for EB to RB no rifampicin and increased/decreased logFC criteria

'''MAPPFinder Files'''
*[[Media:EB to RB No Rif V2-Criterion0-GO.txt]]
:Increased
*[[Media: EB to RB No Rif V2-Criterion1-GO.txt]]
:Deceased
*[[Media: For GenMAPP Chlamydia V4 20131205 KS.gmf]]
: File created by MAPPFinder after creating the latest color set.

==Wiki Formatting==

Template: [[Template:Team_H(oo)KD]]

[[Category: Group Projects]]
[[Category: Team H(oo)KD]]

Chlamydia trachomatis

2013-12-08T06:39:00Z

Ksherbina: /* Final Paper */ Started adding assignments

{{Team_H(oo)KD}}

==Deadlines and Intermediate Milestones==

{| style="width: 100%; margin: 1em 0" cellpadding="5ex"
! style="width: 8em; background: #ddd" | Monday
! style="width: 8em; background: #ddd" | Tuesday
! style="width: 8em; background: #ddd" | Wednesday
! style="width: 8em; background: #ddd" | Thursday
! style="width: 8em; background: #ddd" | Friday
|-
| style="vertical-align: top" | 11/11
| style="vertical-align: top" | 11/12
| style="vertical-align: top" | 11/13
| style="vertical-align: top" | 11/14
*Run an export/import of gene database cycle '''KS/HD'''
*Create species profile '''KS'''
*Obtain and format raw microarray data '''DW'''
| style="vertical-align: top" | 11/15
*Rerun export/import of gene database cycle '''KS'''
|-
| style="vertical-align: top; background: #eee" | 11/18
| style="vertical-align: top; background: #eee" | 11/19
*Vet the gene database exported last week '''KS/HD/DW'''
*Format the raw microarray data '''DW'''
| style="vertical-align: top; background: #eee" | 11/20
| style="vertical-align: top; background: #eee" | 11/21
*Perform microarray statistical analysis '''DW'''
*Work on custom species profile '''KS/HD'''
| style="vertical-align: top; background: #eee" | 11/22
|-
| style="vertical-align: top" | 11/25
| style="vertical-align: top" | 11/26
*Build and commit species profile '''KS'''
*Customize Tally Engine '''KS/HD'''
*Perform GenMAPP and MAPPFinder Analysis '''DW'''
| style="vertical-align: top" | 11/27
'''THANKSGIVING BREAK'''
| style="vertical-align: top" | 11/28
'''THANKSGIVING BREAK'''
| style="vertical-align: top" | 11/29
'''THANKSGIVING BREAK'''
|-
| style="vertical-align: top; background: #eee" | 12/02
| style="vertical-align: top; background: #eee" | 12/03
*Run GenMAPP and MAPPFinder '''DW'''
*Customize the Tally Engine for ''C. trachomatis'' '''DW'''
| style="vertical-align: top; background: #eee" | 12/04
*Run GenMAPP and MAPPFinder again if necessary '''DW'''
| style="vertical-align: top; background: #eee" | 12/05
*Begin working on the relational database schema '''HD'''
*Commit the changes to gmbuilder '''KS'''
| style="vertical-align: top; background: #eee" | 12/06
|-
| style="vertical-align: top; background: #eee" | 12/09
*Write the paper '''HD/KS/DW'''
*Work on the presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/10
*Write the paper '''HD/KS/DW'''
*Work on the presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/11
*Finish the presentation '''HD/KS/DW'''
*Practice the final presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/12
*'''Final Presentation'''
*Proofread the final paper '''HD/KS/DW'''
*Make sure all the deliverables are raedy '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/13
|}

==Deliverable and Final Paper Assignments==

===Deliverables===

*GenMAPP Gene Database for assigned species (.gdb) '''Katrina'''
*ReadMe file to accompany the Gene Database (.pdf)
:*Include Gene Database Schema diagram in ReadMe '''Hilda'''
*Gene Database Testing Report for final submitted Gene Database (print from wiki to .pdf file) '''Katrina'''
*Processed and analyzed DNA microarray dataset (.xls) '''Dillon'''
*GenMAPP Expression Dataset file (.gex) '''Dillon'''
*Filtered MAPPFinder Results (.xls) '''Dillon'''
*Sample MAPP file of a relevant biological pathway for your species (.mapp) '''Dillon'''

===Final Paper===

#Introduction
#Methods
#*Download the UniProt XML proteome set and GOA (GO association) files for your species.
#*Download GO terms from in the OBO-XML format.
#*Create the GenMAPP Builder tables in PostgreSQL.
#*Load files into PostgreSQL database via GenMAPP Builder.
#*Export into a GenMAPP Gene Database.
#*Inspect/vet/validate Gene Database.
#*Prepare microarray data (organize, normalize, perform statistical analysis)
#*Run GenMAPP and MAPPFinder using the Gene Database.
#Results
#*Gene Database Schema
#*Gene Database Testing Report on final version of Gene Database
#*Report on quantity and identity of gene IDs that did not make it into the database
#*Report on what changes need to be made to the GenMAPP Builder code in order to to accommodate the second and third type of missing gene IDs
#*Report results of the DNA microarray statistical analysis
#*Table of MAPPFinder results (from .xls) and MAPP of a pathway
#Discussion
#*How well did the GenMAPP Builder process work for your species? '''Katrina'''
#*Discuss the statistical analysis and MAPPFinder results for your microarray dataset. '''Dillon'''

=== References ===

* Please use the [[Guidelines for Literature Citations in a Scientific Paper]]. The [http://en.wikipedia.org/wiki/APA_style#Citation APA format] is also very similar to these guidelines and will be acceptable. Be consistent with your format for the in text citations and for your references list at the end.

==Annotated Bibliography of Genomics Papers for ''C. trachomatis''==

===Whole genome sequencing of ''C. trachomatis''===

#Journal article describing results of whole genome sequencing:
#*Database: PubMed
#*Search Terms: Chlamydia trachomatis [MeSH Term] AND genome [Title]
#*There were 29 results. Many of the articles that appear using the above search terms have to do with performing whole-genome analysis of ''C. trachomatis'' to look at genetic polymorphisms in the bacterium and molecular and genetic characteristics that are observed when the bacterium infects a host.
#*Stephens, R.S., Kalman, S., Lammel, C., Fan, J., Marathe, R., Aravind, L., Mitchell, W., Olinger, L., Tatusov, R., Zhao, Q., Koonin, E. V., Davis, R.W. (1998) Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis. ''Science'' '''282''': 754-759. 10.1126/science.282.5389.754.
#*The article can be viewed as an [http://www.sciencemag.org/content/282/5389/754.long HTML].
#*The article can be viewed as a [[Media: Science-1998-Stephens-754-9.pdf | PDF]].
#*This article includes [http://www.sciencemag.org/site/feature/data/982604.xhtml Supplementary Material].
#Use the genome sequencing article you found to perform a prospective search in the ISI Web of Science/Knowledge database.
#*How many results did you get?: 872
#*Based on the titles and abstracts of the papers, what type of research directions have been taken now that the genome for that organism has been sequenced?: Based on the titles and abstracts found in the results, it seems that research has been primarily in the direction of understanding how the disease functions and assessing the severity of the disease (and how to cure it).

===Microarray articles===
The EBI ArrayExpress database was used to locate each of the four following microarray articles. The search terms used were Chlamydia trachomatis under 'By organism' while the rest of the drop down menus were not altered.

The following articles appear in order of preference:

#Omsland, A., Sager, J., Nair, V., Sturdevant, D.E., Hackstadt, T. (2012) [http://www.pnas.org/content/109/48/19781.full Developmental stage-specific metabolic and transcriptional activity of Chlamydia trachomatis in an axenic medium]. ''PNAS'' '''109''': 19781-19785. doi: 10.1073/pnas.1212831109.
#*[[Media: PNAS-2012-Omsland-19781-5.pdf|PDF Version of Paper]]
#*[http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-39530/?keywords=&organism=Chlamydia+trachomatis&array=&exptype%5B%5D=&exptype%5B%5D= Microarray Data]
#*Chlamydiae undergo a biphasic developmental cycle characterized by an infectious cell type known as either an elementary body (EB) and an intracellular replicative form called a reticulate body (RB). Chlamydia was incubated under microaerobic conditions to test the differences in preferred energy source between EB's and RB's.
#*There weren't necessarily a "treatment" group and a "control" group. Referencing figure 4, it can be inferred that EB would be considered the "treatment" group because of the way that the ratio was set up, being EB to RB. In this way the RB would be the "control" group as they are looking at the EB population relative to the RB population.
#*4 Biological Replicates were performed for the control and treatment; as the article states that "Density gradient-purified EBs and RBs were incubated in quadruplicate in four-well plates...".


==Journal Club Presentation on ''C. trachomatis'' Genome Sequencing Paper==

[[Media:Presentation on C.trachomatis Genome Paper KS HD DW 20131112.pdf|Here]] is the PDF version of the presentation given on November 12, 2013 on the Stephens ''et al.'' (1998) paper.


==Important Files==

*[[Media: Go_daily-termdb_v2_HD_20131107.obo-xml.gz|Go_daily-termdb_v2_HD_20131107.obo-xml.gz]]
:GO OBO-XML file for ''C. trachomatis'' serovar A.
*[[Media: Uniprot XML C.trachomatis serovar A KS 20131114.xml|Uniprot XML C.trachomatis serovar A KS 20131114.xml]]
:UniProt XML file for ''C. trachomatis'' serovar A.
*[[Media: 22183.C trachomatis A KS 20131114.goa|22183.C trachomatis A KS 20131114.goa]]
:GOA file for ''C. trachomatis'' serovar A.
*[[Media: Ct-Std v1 KS 20131119.gdb|Ct-Std v1 KS 20131119.gdb]]
:First iteration of the export/import gene database cycle.
*[[Media: Ct-Std KS 20131121.gdb|Ct-Std KS 20131121.gdb]]
:Updated gene database.
*[[Media: Ct-Std v2 KS 20131121.gdb|Ct-Std v2 KS 20131121.gdb]]
:New gene database created with the new version of gmbuilder built on 11/21/2013 using the custom species profile for ''C. trachomatis.''

'''Array Express Article Raw Files'''

*[[Media:Raw_Data_DW_V1_12112013_A-GEOD-4692.adf.txt|Raw_Data_DW_V1_12112013_A-GEOD-4692.adf.txt]]
:Microarray raw ADF file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.idf.txt|Raw_Data_DW_V1_12112013_E-GEOD-39530.idf.txt]]
:Microarray raw IDF file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.processed.1.zip|Raw_Data_DW_V1_12112013_E-GEOD-39530.processed.1.zip]]
:Microarray raw processed zip file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.raw.1.zip|Raw_Data_DW_V1_12112013_E-GEOD-39530.raw.1.zip]]
:Microarray raw zip file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.sdrf.txt|Raw_Data_DW_V1_12112013_E-GEOD-39530.sdrf.txt]]
:Microarray SDRF text file for ''C. trachomatis''.
*[[Media:Sdrf information excel.xls|Sdrf information excel.xls]]
:sdrf information in excel spreadsheet format.

'''Excel Master Spreadsheet File Versions (Containing the Formatted Raw Data)'''

*[[Media:Master Spreadsheet.xls|master spreadsheet work in progress]]
:Initial Microarray raw data formatted and opened in Microsoft Excel.
*[[Media: C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.xls|C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.xls]]
:Sample C. trachomatis microarray data that was used to run GenMAPP with the latest ''C. trachomatis'' database.
*[[Media: C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt|C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt]]
:Tab-delimited version of the file above.
*[[Media: Final Master Spreadsheet.xls|Final Master Spreadsheet]]
:Microarray data with identified Gene ID's and Log base data.
*[[Media: Master_Spreadsheet_Chlamydia_20131125_KD.xls | Master_Spreadsheet_Chlamydia_20131125_KD.xls]]
:version edited during office hours by Dr. Dahlquist to demonstrate statistics
*[[Media: Master Spreadsheet Chlamydia V1 20131125 DW.xls]]
:finished version of version edited during office hours by Dr. Dahlquist to demonstrate statistics -- finished by Dillon Williams
*[[Media: Copy of Master Spreadsheet Chlamydia V2 20131203 HD.xls]]
:Microarray data with separated IDs CTA_#### from _RRMH#####_...
*[[Media: For GenMAPP Chlamydia V4 20131203 DW.xls]]
:Excel Speadsheet with data formatted for GenMAPP (Version run in GenMAPP).

'''For GenMAPP Files'''
*[[Media: Ctrachomatis EbtoRb ATP metabolic process.mapp]]
: EB to RB ATP Metabolic Process Mapp file.
*[[Media: Ctrachomatis EbtoRB glucose catabolic process 20131204 DW.mapp]]
: EB to RB glucose catabolic process map.
*[[Media: CTrachomatis EBtoRB Rifampicin 20131204 DW-Criterion1-GO.txt]]
: EB to RB Rifampicin go.txt file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.gex]]
: For_GenMAPP .gex file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.gmf]]
: For GenMAPP .gmf file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.zip]]
: Access file created with .gex file.
*[[Media: For GenMAPP Chlamydia V4 20131205 KS.gex]]
:Color set created in GenMAPP for EB to RB no rifampicin and increased/decreased logFC criteria

'''MAPPFinder Files'''
*[[Media:EB to RB No Rif V2-Criterion0-GO.txt]]
:Increased
*[[Media: EB to RB No Rif V2-Criterion1-GO.txt]]
:Deceased
*[[Media: For GenMAPP Chlamydia V4 20131205 KS.gmf]]
: File created by MAPPFinder after creating the latest color set.

==Wiki Formatting==

Template: [[Template:Team_H(oo)KD]]

[[Category: Group Projects]]
[[Category: Team H(oo)KD]]

Chlamydia trachomatis

2013-12-08T06:38:09Z

Ksherbina: /* Final Paper */ Added outline of paper

{{Team_H(oo)KD}}

==Deadlines and Intermediate Milestones==

{| style="width: 100%; margin: 1em 0" cellpadding="5ex"
! style="width: 8em; background: #ddd" | Monday
! style="width: 8em; background: #ddd" | Tuesday
! style="width: 8em; background: #ddd" | Wednesday
! style="width: 8em; background: #ddd" | Thursday
! style="width: 8em; background: #ddd" | Friday
|-
| style="vertical-align: top" | 11/11
| style="vertical-align: top" | 11/12
| style="vertical-align: top" | 11/13
| style="vertical-align: top" | 11/14
*Run an export/import of gene database cycle '''KS/HD'''
*Create species profile '''KS'''
*Obtain and format raw microarray data '''DW'''
| style="vertical-align: top" | 11/15
*Rerun export/import of gene database cycle '''KS'''
|-
| style="vertical-align: top; background: #eee" | 11/18
| style="vertical-align: top; background: #eee" | 11/19
*Vet the gene database exported last week '''KS/HD/DW'''
*Format the raw microarray data '''DW'''
| style="vertical-align: top; background: #eee" | 11/20
| style="vertical-align: top; background: #eee" | 11/21
*Perform microarray statistical analysis '''DW'''
*Work on custom species profile '''KS/HD'''
| style="vertical-align: top; background: #eee" | 11/22
|-
| style="vertical-align: top" | 11/25
| style="vertical-align: top" | 11/26
*Build and commit species profile '''KS'''
*Customize Tally Engine '''KS/HD'''
*Perform GenMAPP and MAPPFinder Analysis '''DW'''
| style="vertical-align: top" | 11/27
'''THANKSGIVING BREAK'''
| style="vertical-align: top" | 11/28
'''THANKSGIVING BREAK'''
| style="vertical-align: top" | 11/29
'''THANKSGIVING BREAK'''
|-
| style="vertical-align: top; background: #eee" | 12/02
| style="vertical-align: top; background: #eee" | 12/03
*Run GenMAPP and MAPPFinder '''DW'''
*Customize the Tally Engine for ''C. trachomatis'' '''DW'''
| style="vertical-align: top; background: #eee" | 12/04
*Run GenMAPP and MAPPFinder again if necessary '''DW'''
| style="vertical-align: top; background: #eee" | 12/05
*Begin working on the relational database schema '''HD'''
*Commit the changes to gmbuilder '''KS'''
| style="vertical-align: top; background: #eee" | 12/06
|-
| style="vertical-align: top; background: #eee" | 12/09
*Write the paper '''HD/KS/DW'''
*Work on the presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/10
*Write the paper '''HD/KS/DW'''
*Work on the presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/11
*Finish the presentation '''HD/KS/DW'''
*Practice the final presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/12
*'''Final Presentation'''
*Proofread the final paper '''HD/KS/DW'''
*Make sure all the deliverables are raedy '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/13
|}

==Deliverable and Final Paper Assignments==

===Deliverables===

*GenMAPP Gene Database for assigned species (.gdb) '''Katrina'''
*ReadMe file to accompany the Gene Database (.pdf)
:*Include Gene Database Schema diagram in ReadMe '''Hilda'''
*Gene Database Testing Report for final submitted Gene Database (print from wiki to .pdf file) '''Katrina'''
*Processed and analyzed DNA microarray dataset (.xls) '''Dillon'''
*GenMAPP Expression Dataset file (.gex) '''Dillon'''
*Filtered MAPPFinder Results (.xls) '''Dillon'''
*Sample MAPP file of a relevant biological pathway for your species (.mapp) '''Dillon'''

===Final Paper===

#Introductino
#Methods
#*Download the UniProt XML proteome set and GOA (GO association) files for your species.
#*Download GO terms from in the OBO-XML format.
#*Create the GenMAPP Builder tables in PostgreSQL.
#*Load files into PostgreSQL database via GenMAPP Builder.
#*Export into a GenMAPP Gene Database.
#*Inspect/vet/validate Gene Database.
#*Prepare microarray data (organize, normalize, perform statistical analysis)
#*Run GenMAPP and MAPPFinder using the Gene Database.
#Results
#*Gene Database Schema
#*Gene Database Testing Report on final version of Gene Database
#*Report on quantity and identity of gene IDs that did not make it into the database
#*Report on what changes need to be made to the GenMAPP Builder code in order to to accommodate the second and third type of missing gene IDs
#*Report results of the DNA microarray statistical analysis
#*Table of MAPPFinder results (from .xls) and MAPP of a pathway
#Discussion
#*How well did the GenMAPP Builder process work for your species (just comment on the technical aspects here, you will discuss the teamwork/process aspects in your individual assessment).
#*Discuss the statistical analysis and MAPPFinder results for your microarray dataset. Compare it to what was reported in the original paper from which you got the microarray data.

=== References ===

* Please use the [[Guidelines for Literature Citations in a Scientific Paper]]. The [http://en.wikipedia.org/wiki/APA_style#Citation APA format] is also very similar to these guidelines and will be acceptable. Be consistent with your format for the in text citations and for your references list at the end.

==Annotated Bibliography of Genomics Papers for ''C. trachomatis''==

===Whole genome sequencing of ''C. trachomatis''===

#Journal article describing results of whole genome sequencing:
#*Database: PubMed
#*Search Terms: Chlamydia trachomatis [MeSH Term] AND genome [Title]
#*There were 29 results. Many of the articles that appear using the above search terms have to do with performing whole-genome analysis of ''C. trachomatis'' to look at genetic polymorphisms in the bacterium and molecular and genetic characteristics that are observed when the bacterium infects a host.
#*Stephens, R.S., Kalman, S., Lammel, C., Fan, J., Marathe, R., Aravind, L., Mitchell, W., Olinger, L., Tatusov, R., Zhao, Q., Koonin, E. V., Davis, R.W. (1998) Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis. ''Science'' '''282''': 754-759. 10.1126/science.282.5389.754.
#*The article can be viewed as an [http://www.sciencemag.org/content/282/5389/754.long HTML].
#*The article can be viewed as a [[Media: Science-1998-Stephens-754-9.pdf | PDF]].
#*This article includes [http://www.sciencemag.org/site/feature/data/982604.xhtml Supplementary Material].
#Use the genome sequencing article you found to perform a prospective search in the ISI Web of Science/Knowledge database.
#*How many results did you get?: 872
#*Based on the titles and abstracts of the papers, what type of research directions have been taken now that the genome for that organism has been sequenced?: Based on the titles and abstracts found in the results, it seems that research has been primarily in the direction of understanding how the disease functions and assessing the severity of the disease (and how to cure it).

===Microarray articles===
The EBI ArrayExpress database was used to locate each of the four following microarray articles. The search terms used were Chlamydia trachomatis under 'By organism' while the rest of the drop down menus were not altered.

The following articles appear in order of preference:

#Omsland, A., Sager, J., Nair, V., Sturdevant, D.E., Hackstadt, T. (2012) [http://www.pnas.org/content/109/48/19781.full Developmental stage-specific metabolic and transcriptional activity of Chlamydia trachomatis in an axenic medium]. ''PNAS'' '''109''': 19781-19785. doi: 10.1073/pnas.1212831109.
#*[[Media: PNAS-2012-Omsland-19781-5.pdf|PDF Version of Paper]]
#*[http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-39530/?keywords=&organism=Chlamydia+trachomatis&array=&exptype%5B%5D=&exptype%5B%5D= Microarray Data]
#*Chlamydiae undergo a biphasic developmental cycle characterized by an infectious cell type known as either an elementary body (EB) and an intracellular replicative form called a reticulate body (RB). Chlamydia was incubated under microaerobic conditions to test the differences in preferred energy source between EB's and RB's.
#*There weren't necessarily a "treatment" group and a "control" group. Referencing figure 4, it can be inferred that EB would be considered the "treatment" group because of the way that the ratio was set up, being EB to RB. In this way the RB would be the "control" group as they are looking at the EB population relative to the RB population.
#*4 Biological Replicates were performed for the control and treatment; as the article states that "Density gradient-purified EBs and RBs were incubated in quadruplicate in four-well plates...".


==Journal Club Presentation on ''C. trachomatis'' Genome Sequencing Paper==

[[Media:Presentation on C.trachomatis Genome Paper KS HD DW 20131112.pdf|Here]] is the PDF version of the presentation given on November 12, 2013 on the Stephens ''et al.'' (1998) paper.


==Important Files==

*[[Media: Go_daily-termdb_v2_HD_20131107.obo-xml.gz|Go_daily-termdb_v2_HD_20131107.obo-xml.gz]]
:GO OBO-XML file for ''C. trachomatis'' serovar A.
*[[Media: Uniprot XML C.trachomatis serovar A KS 20131114.xml|Uniprot XML C.trachomatis serovar A KS 20131114.xml]]
:UniProt XML file for ''C. trachomatis'' serovar A.
*[[Media: 22183.C trachomatis A KS 20131114.goa|22183.C trachomatis A KS 20131114.goa]]
:GOA file for ''C. trachomatis'' serovar A.
*[[Media: Ct-Std v1 KS 20131119.gdb|Ct-Std v1 KS 20131119.gdb]]
:First iteration of the export/import gene database cycle.
*[[Media: Ct-Std KS 20131121.gdb|Ct-Std KS 20131121.gdb]]
:Updated gene database.
*[[Media: Ct-Std v2 KS 20131121.gdb|Ct-Std v2 KS 20131121.gdb]]
:New gene database created with the new version of gmbuilder built on 11/21/2013 using the custom species profile for ''C. trachomatis.''

'''Array Express Article Raw Files'''

*[[Media:Raw_Data_DW_V1_12112013_A-GEOD-4692.adf.txt|Raw_Data_DW_V1_12112013_A-GEOD-4692.adf.txt]]
:Microarray raw ADF file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.idf.txt|Raw_Data_DW_V1_12112013_E-GEOD-39530.idf.txt]]
:Microarray raw IDF file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.processed.1.zip|Raw_Data_DW_V1_12112013_E-GEOD-39530.processed.1.zip]]
:Microarray raw processed zip file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.raw.1.zip|Raw_Data_DW_V1_12112013_E-GEOD-39530.raw.1.zip]]
:Microarray raw zip file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.sdrf.txt|Raw_Data_DW_V1_12112013_E-GEOD-39530.sdrf.txt]]
:Microarray SDRF text file for ''C. trachomatis''.
*[[Media:Sdrf information excel.xls|Sdrf information excel.xls]]
:sdrf information in excel spreadsheet format.

'''Excel Master Spreadsheet File Versions (Containing the Formatted Raw Data)'''

*[[Media:Master Spreadsheet.xls|master spreadsheet work in progress]]
:Initial Microarray raw data formatted and opened in Microsoft Excel.
*[[Media: C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.xls|C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.xls]]
:Sample C. trachomatis microarray data that was used to run GenMAPP with the latest ''C. trachomatis'' database.
*[[Media: C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt|C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt]]
:Tab-delimited version of the file above.
*[[Media: Final Master Spreadsheet.xls|Final Master Spreadsheet]]
:Microarray data with identified Gene ID's and Log base data.
*[[Media: Master_Spreadsheet_Chlamydia_20131125_KD.xls | Master_Spreadsheet_Chlamydia_20131125_KD.xls]]
:version edited during office hours by Dr. Dahlquist to demonstrate statistics
*[[Media: Master Spreadsheet Chlamydia V1 20131125 DW.xls]]
:finished version of version edited during office hours by Dr. Dahlquist to demonstrate statistics -- finished by Dillon Williams
*[[Media: Copy of Master Spreadsheet Chlamydia V2 20131203 HD.xls]]
:Microarray data with separated IDs CTA_#### from _RRMH#####_...
*[[Media: For GenMAPP Chlamydia V4 20131203 DW.xls]]
:Excel Speadsheet with data formatted for GenMAPP (Version run in GenMAPP).

'''For GenMAPP Files'''
*[[Media: Ctrachomatis EbtoRb ATP metabolic process.mapp]]
: EB to RB ATP Metabolic Process Mapp file.
*[[Media: Ctrachomatis EbtoRB glucose catabolic process 20131204 DW.mapp]]
: EB to RB glucose catabolic process map.
*[[Media: CTrachomatis EBtoRB Rifampicin 20131204 DW-Criterion1-GO.txt]]
: EB to RB Rifampicin go.txt file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.gex]]
: For_GenMAPP .gex file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.gmf]]
: For GenMAPP .gmf file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.zip]]
: Access file created with .gex file.
*[[Media: For GenMAPP Chlamydia V4 20131205 KS.gex]]
:Color set created in GenMAPP for EB to RB no rifampicin and increased/decreased logFC criteria

'''MAPPFinder Files'''
*[[Media:EB to RB No Rif V2-Criterion0-GO.txt]]
:Increased
*[[Media: EB to RB No Rif V2-Criterion1-GO.txt]]
:Deceased
*[[Media: For GenMAPP Chlamydia V4 20131205 KS.gmf]]
: File created by MAPPFinder after creating the latest color set.

==Wiki Formatting==

Template: [[Template:Team_H(oo)KD]]

[[Category: Group Projects]]
[[Category: Team H(oo)KD]]

Chlamydia trachomatis

2013-12-08T06:32:37Z

Ksherbina: /* Deliverable and Final Paper Assignments */ Added a new section

{{Team_H(oo)KD}}

==Deadlines and Intermediate Milestones==

{| style="width: 100%; margin: 1em 0" cellpadding="5ex"
! style="width: 8em; background: #ddd" | Monday
! style="width: 8em; background: #ddd" | Tuesday
! style="width: 8em; background: #ddd" | Wednesday
! style="width: 8em; background: #ddd" | Thursday
! style="width: 8em; background: #ddd" | Friday
|-
| style="vertical-align: top" | 11/11
| style="vertical-align: top" | 11/12
| style="vertical-align: top" | 11/13
| style="vertical-align: top" | 11/14
*Run an export/import of gene database cycle '''KS/HD'''
*Create species profile '''KS'''
*Obtain and format raw microarray data '''DW'''
| style="vertical-align: top" | 11/15
*Rerun export/import of gene database cycle '''KS'''
|-
| style="vertical-align: top; background: #eee" | 11/18
| style="vertical-align: top; background: #eee" | 11/19
*Vet the gene database exported last week '''KS/HD/DW'''
*Format the raw microarray data '''DW'''
| style="vertical-align: top; background: #eee" | 11/20
| style="vertical-align: top; background: #eee" | 11/21
*Perform microarray statistical analysis '''DW'''
*Work on custom species profile '''KS/HD'''
| style="vertical-align: top; background: #eee" | 11/22
|-
| style="vertical-align: top" | 11/25
| style="vertical-align: top" | 11/26
*Build and commit species profile '''KS'''
*Customize Tally Engine '''KS/HD'''
*Perform GenMAPP and MAPPFinder Analysis '''DW'''
| style="vertical-align: top" | 11/27
'''THANKSGIVING BREAK'''
| style="vertical-align: top" | 11/28
'''THANKSGIVING BREAK'''
| style="vertical-align: top" | 11/29
'''THANKSGIVING BREAK'''
|-
| style="vertical-align: top; background: #eee" | 12/02
| style="vertical-align: top; background: #eee" | 12/03
*Run GenMAPP and MAPPFinder '''DW'''
*Customize the Tally Engine for ''C. trachomatis'' '''DW'''
| style="vertical-align: top; background: #eee" | 12/04
*Run GenMAPP and MAPPFinder again if necessary '''DW'''
| style="vertical-align: top; background: #eee" | 12/05
*Begin working on the relational database schema '''HD'''
*Commit the changes to gmbuilder '''KS'''
| style="vertical-align: top; background: #eee" | 12/06
|-
| style="vertical-align: top; background: #eee" | 12/09
*Write the paper '''HD/KS/DW'''
*Work on the presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/10
*Write the paper '''HD/KS/DW'''
*Work on the presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/11
*Finish the presentation '''HD/KS/DW'''
*Practice the final presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/12
*'''Final Presentation'''
*Proofread the final paper '''HD/KS/DW'''
*Make sure all the deliverables are raedy '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/13
|}

==Deliverable and Final Paper Assignments==

===Deliverables===

*GenMAPP Gene Database for assigned species (.gdb) '''Katrina'''
*ReadMe file to accompany the Gene Database (.pdf)
:*Include Gene Database Schema diagram in ReadMe '''Hilda'''
*Gene Database Testing Report for final submitted Gene Database (print from wiki to .pdf file) '''Katrina'''
*Processed and analyzed DNA microarray dataset (.xls) '''Dillon'''
*GenMAPP Expression Dataset file (.gex) '''Dillon'''
*Filtered MAPPFinder Results (.xls) '''Dillon'''
*Sample MAPP file of a relevant biological pathway for your species (.mapp) '''Dillon'''

===Final Paper===

==Annotated Bibliography of Genomics Papers for ''C. trachomatis''==

===Whole genome sequencing of ''C. trachomatis''===

#Journal article describing results of whole genome sequencing:
#*Database: PubMed
#*Search Terms: Chlamydia trachomatis [MeSH Term] AND genome [Title]
#*There were 29 results. Many of the articles that appear using the above search terms have to do with performing whole-genome analysis of ''C. trachomatis'' to look at genetic polymorphisms in the bacterium and molecular and genetic characteristics that are observed when the bacterium infects a host.
#*Stephens, R.S., Kalman, S., Lammel, C., Fan, J., Marathe, R., Aravind, L., Mitchell, W., Olinger, L., Tatusov, R., Zhao, Q., Koonin, E. V., Davis, R.W. (1998) Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis. ''Science'' '''282''': 754-759. 10.1126/science.282.5389.754.
#*The article can be viewed as an [http://www.sciencemag.org/content/282/5389/754.long HTML].
#*The article can be viewed as a [[Media: Science-1998-Stephens-754-9.pdf | PDF]].
#*This article includes [http://www.sciencemag.org/site/feature/data/982604.xhtml Supplementary Material].
#Use the genome sequencing article you found to perform a prospective search in the ISI Web of Science/Knowledge database.
#*How many results did you get?: 872
#*Based on the titles and abstracts of the papers, what type of research directions have been taken now that the genome for that organism has been sequenced?: Based on the titles and abstracts found in the results, it seems that research has been primarily in the direction of understanding how the disease functions and assessing the severity of the disease (and how to cure it).

===Microarray articles===
The EBI ArrayExpress database was used to locate each of the four following microarray articles. The search terms used were Chlamydia trachomatis under 'By organism' while the rest of the drop down menus were not altered.

The following articles appear in order of preference:

#Omsland, A., Sager, J., Nair, V., Sturdevant, D.E., Hackstadt, T. (2012) [http://www.pnas.org/content/109/48/19781.full Developmental stage-specific metabolic and transcriptional activity of Chlamydia trachomatis in an axenic medium]. ''PNAS'' '''109''': 19781-19785. doi: 10.1073/pnas.1212831109.
#*[[Media: PNAS-2012-Omsland-19781-5.pdf|PDF Version of Paper]]
#*[http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-39530/?keywords=&organism=Chlamydia+trachomatis&array=&exptype%5B%5D=&exptype%5B%5D= Microarray Data]
#*Chlamydiae undergo a biphasic developmental cycle characterized by an infectious cell type known as either an elementary body (EB) and an intracellular replicative form called a reticulate body (RB). Chlamydia was incubated under microaerobic conditions to test the differences in preferred energy source between EB's and RB's.
#*There weren't necessarily a "treatment" group and a "control" group. Referencing figure 4, it can be inferred that EB would be considered the "treatment" group because of the way that the ratio was set up, being EB to RB. In this way the RB would be the "control" group as they are looking at the EB population relative to the RB population.
#*4 Biological Replicates were performed for the control and treatment; as the article states that "Density gradient-purified EBs and RBs were incubated in quadruplicate in four-well plates...".


==Journal Club Presentation on ''C. trachomatis'' Genome Sequencing Paper==

[[Media:Presentation on C.trachomatis Genome Paper KS HD DW 20131112.pdf|Here]] is the PDF version of the presentation given on November 12, 2013 on the Stephens ''et al.'' (1998) paper.


==Important Files==

*[[Media: Go_daily-termdb_v2_HD_20131107.obo-xml.gz|Go_daily-termdb_v2_HD_20131107.obo-xml.gz]]
:GO OBO-XML file for ''C. trachomatis'' serovar A.
*[[Media: Uniprot XML C.trachomatis serovar A KS 20131114.xml|Uniprot XML C.trachomatis serovar A KS 20131114.xml]]
:UniProt XML file for ''C. trachomatis'' serovar A.
*[[Media: 22183.C trachomatis A KS 20131114.goa|22183.C trachomatis A KS 20131114.goa]]
:GOA file for ''C. trachomatis'' serovar A.
*[[Media: Ct-Std v1 KS 20131119.gdb|Ct-Std v1 KS 20131119.gdb]]
:First iteration of the export/import gene database cycle.
*[[Media: Ct-Std KS 20131121.gdb|Ct-Std KS 20131121.gdb]]
:Updated gene database.
*[[Media: Ct-Std v2 KS 20131121.gdb|Ct-Std v2 KS 20131121.gdb]]
:New gene database created with the new version of gmbuilder built on 11/21/2013 using the custom species profile for ''C. trachomatis.''

'''Array Express Article Raw Files'''

*[[Media:Raw_Data_DW_V1_12112013_A-GEOD-4692.adf.txt|Raw_Data_DW_V1_12112013_A-GEOD-4692.adf.txt]]
:Microarray raw ADF file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.idf.txt|Raw_Data_DW_V1_12112013_E-GEOD-39530.idf.txt]]
:Microarray raw IDF file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.processed.1.zip|Raw_Data_DW_V1_12112013_E-GEOD-39530.processed.1.zip]]
:Microarray raw processed zip file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.raw.1.zip|Raw_Data_DW_V1_12112013_E-GEOD-39530.raw.1.zip]]
:Microarray raw zip file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.sdrf.txt|Raw_Data_DW_V1_12112013_E-GEOD-39530.sdrf.txt]]
:Microarray SDRF text file for ''C. trachomatis''.
*[[Media:Sdrf information excel.xls|Sdrf information excel.xls]]
:sdrf information in excel spreadsheet format.

'''Excel Master Spreadsheet File Versions (Containing the Formatted Raw Data)'''

*[[Media:Master Spreadsheet.xls|master spreadsheet work in progress]]
:Initial Microarray raw data formatted and opened in Microsoft Excel.
*[[Media: C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.xls|C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.xls]]
:Sample C. trachomatis microarray data that was used to run GenMAPP with the latest ''C. trachomatis'' database.
*[[Media: C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt|C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt]]
:Tab-delimited version of the file above.
*[[Media: Final Master Spreadsheet.xls|Final Master Spreadsheet]]
:Microarray data with identified Gene ID's and Log base data.
*[[Media: Master_Spreadsheet_Chlamydia_20131125_KD.xls | Master_Spreadsheet_Chlamydia_20131125_KD.xls]]
:version edited during office hours by Dr. Dahlquist to demonstrate statistics
*[[Media: Master Spreadsheet Chlamydia V1 20131125 DW.xls]]
:finished version of version edited during office hours by Dr. Dahlquist to demonstrate statistics -- finished by Dillon Williams
*[[Media: Copy of Master Spreadsheet Chlamydia V2 20131203 HD.xls]]
:Microarray data with separated IDs CTA_#### from _RRMH#####_...
*[[Media: For GenMAPP Chlamydia V4 20131203 DW.xls]]
:Excel Speadsheet with data formatted for GenMAPP (Version run in GenMAPP).

'''For GenMAPP Files'''
*[[Media: Ctrachomatis EbtoRb ATP metabolic process.mapp]]
: EB to RB ATP Metabolic Process Mapp file.
*[[Media: Ctrachomatis EbtoRB glucose catabolic process 20131204 DW.mapp]]
: EB to RB glucose catabolic process map.
*[[Media: CTrachomatis EBtoRB Rifampicin 20131204 DW-Criterion1-GO.txt]]
: EB to RB Rifampicin go.txt file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.gex]]
: For_GenMAPP .gex file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.gmf]]
: For GenMAPP .gmf file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.zip]]
: Access file created with .gex file.
*[[Media: For GenMAPP Chlamydia V4 20131205 KS.gex]]
:Color set created in GenMAPP for EB to RB no rifampicin and increased/decreased logFC criteria

'''MAPPFinder Files'''
*[[Media:EB to RB No Rif V2-Criterion0-GO.txt]]
:Increased
*[[Media: EB to RB No Rif V2-Criterion1-GO.txt]]
:Deceased
*[[Media: For GenMAPP Chlamydia V4 20131205 KS.gmf]]
: File created by MAPPFinder after creating the latest color set.

==Wiki Formatting==

Template: [[Template:Team_H(oo)KD]]

[[Category: Group Projects]]
[[Category: Team H(oo)KD]]

Chlamydia trachomatis

2013-12-08T06:32:02Z

Ksherbina: /* Deliverable and Final Paper Assignments */ Created a a new section

{{Team_H(oo)KD}}

==Deadlines and Intermediate Milestones==

{| style="width: 100%; margin: 1em 0" cellpadding="5ex"
! style="width: 8em; background: #ddd" | Monday
! style="width: 8em; background: #ddd" | Tuesday
! style="width: 8em; background: #ddd" | Wednesday
! style="width: 8em; background: #ddd" | Thursday
! style="width: 8em; background: #ddd" | Friday
|-
| style="vertical-align: top" | 11/11
| style="vertical-align: top" | 11/12
| style="vertical-align: top" | 11/13
| style="vertical-align: top" | 11/14
*Run an export/import of gene database cycle '''KS/HD'''
*Create species profile '''KS'''
*Obtain and format raw microarray data '''DW'''
| style="vertical-align: top" | 11/15
*Rerun export/import of gene database cycle '''KS'''
|-
| style="vertical-align: top; background: #eee" | 11/18
| style="vertical-align: top; background: #eee" | 11/19
*Vet the gene database exported last week '''KS/HD/DW'''
*Format the raw microarray data '''DW'''
| style="vertical-align: top; background: #eee" | 11/20
| style="vertical-align: top; background: #eee" | 11/21
*Perform microarray statistical analysis '''DW'''
*Work on custom species profile '''KS/HD'''
| style="vertical-align: top; background: #eee" | 11/22
|-
| style="vertical-align: top" | 11/25
| style="vertical-align: top" | 11/26
*Build and commit species profile '''KS'''
*Customize Tally Engine '''KS/HD'''
*Perform GenMAPP and MAPPFinder Analysis '''DW'''
| style="vertical-align: top" | 11/27
'''THANKSGIVING BREAK'''
| style="vertical-align: top" | 11/28
'''THANKSGIVING BREAK'''
| style="vertical-align: top" | 11/29
'''THANKSGIVING BREAK'''
|-
| style="vertical-align: top; background: #eee" | 12/02
| style="vertical-align: top; background: #eee" | 12/03
*Run GenMAPP and MAPPFinder '''DW'''
*Customize the Tally Engine for ''C. trachomatis'' '''DW'''
| style="vertical-align: top; background: #eee" | 12/04
*Run GenMAPP and MAPPFinder again if necessary '''DW'''
| style="vertical-align: top; background: #eee" | 12/05
*Begin working on the relational database schema '''HD'''
*Commit the changes to gmbuilder '''KS'''
| style="vertical-align: top; background: #eee" | 12/06
|-
| style="vertical-align: top; background: #eee" | 12/09
*Write the paper '''HD/KS/DW'''
*Work on the presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/10
*Write the paper '''HD/KS/DW'''
*Work on the presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/11
*Finish the presentation '''HD/KS/DW'''
*Practice the final presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/12
*'''Final Presentation'''
*Proofread the final paper '''HD/KS/DW'''
*Make sure all the deliverables are raedy '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/13
|}

==Deliverable and Final Paper Assignments==

===Deliverables===

*GenMAPP Gene Database for assigned species (.gdb) '''Katrina'''
*ReadMe file to accompany the Gene Database (.pdf)
:*Include Gene Database Schema diagram in ReadMe '''Hilda'''
*Gene Database Testing Report for final submitted Gene Database (print from wiki to .pdf file) '''Katrina'''
*Processed and analyzed DNA microarray dataset (.xls) '''Dillon'''
*GenMAPP Expression Dataset file (.gex) '''Dillon'''
*Filtered MAPPFinder Results (.xls) '''Dillon'''
*Sample MAPP file of a relevant biological pathway for your species (.mapp) '''Dillon'''

==Annotated Bibliography of Genomics Papers for ''C. trachomatis''==

===Whole genome sequencing of ''C. trachomatis''===

#Journal article describing results of whole genome sequencing:
#*Database: PubMed
#*Search Terms: Chlamydia trachomatis [MeSH Term] AND genome [Title]
#*There were 29 results. Many of the articles that appear using the above search terms have to do with performing whole-genome analysis of ''C. trachomatis'' to look at genetic polymorphisms in the bacterium and molecular and genetic characteristics that are observed when the bacterium infects a host.
#*Stephens, R.S., Kalman, S., Lammel, C., Fan, J., Marathe, R., Aravind, L., Mitchell, W., Olinger, L., Tatusov, R., Zhao, Q., Koonin, E. V., Davis, R.W. (1998) Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis. ''Science'' '''282''': 754-759. 10.1126/science.282.5389.754.
#*The article can be viewed as an [http://www.sciencemag.org/content/282/5389/754.long HTML].
#*The article can be viewed as a [[Media: Science-1998-Stephens-754-9.pdf | PDF]].
#*This article includes [http://www.sciencemag.org/site/feature/data/982604.xhtml Supplementary Material].
#Use the genome sequencing article you found to perform a prospective search in the ISI Web of Science/Knowledge database.
#*How many results did you get?: 872
#*Based on the titles and abstracts of the papers, what type of research directions have been taken now that the genome for that organism has been sequenced?: Based on the titles and abstracts found in the results, it seems that research has been primarily in the direction of understanding how the disease functions and assessing the severity of the disease (and how to cure it).

===Microarray articles===
The EBI ArrayExpress database was used to locate each of the four following microarray articles. The search terms used were Chlamydia trachomatis under 'By organism' while the rest of the drop down menus were not altered.

The following articles appear in order of preference:

#Omsland, A., Sager, J., Nair, V., Sturdevant, D.E., Hackstadt, T. (2012) [http://www.pnas.org/content/109/48/19781.full Developmental stage-specific metabolic and transcriptional activity of Chlamydia trachomatis in an axenic medium]. ''PNAS'' '''109''': 19781-19785. doi: 10.1073/pnas.1212831109.
#*[[Media: PNAS-2012-Omsland-19781-5.pdf|PDF Version of Paper]]
#*[http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-39530/?keywords=&organism=Chlamydia+trachomatis&array=&exptype%5B%5D=&exptype%5B%5D= Microarray Data]
#*Chlamydiae undergo a biphasic developmental cycle characterized by an infectious cell type known as either an elementary body (EB) and an intracellular replicative form called a reticulate body (RB). Chlamydia was incubated under microaerobic conditions to test the differences in preferred energy source between EB's and RB's.
#*There weren't necessarily a "treatment" group and a "control" group. Referencing figure 4, it can be inferred that EB would be considered the "treatment" group because of the way that the ratio was set up, being EB to RB. In this way the RB would be the "control" group as they are looking at the EB population relative to the RB population.
#*4 Biological Replicates were performed for the control and treatment; as the article states that "Density gradient-purified EBs and RBs were incubated in quadruplicate in four-well plates...".


==Journal Club Presentation on ''C. trachomatis'' Genome Sequencing Paper==

[[Media:Presentation on C.trachomatis Genome Paper KS HD DW 20131112.pdf|Here]] is the PDF version of the presentation given on November 12, 2013 on the Stephens ''et al.'' (1998) paper.


==Important Files==

*[[Media: Go_daily-termdb_v2_HD_20131107.obo-xml.gz|Go_daily-termdb_v2_HD_20131107.obo-xml.gz]]
:GO OBO-XML file for ''C. trachomatis'' serovar A.
*[[Media: Uniprot XML C.trachomatis serovar A KS 20131114.xml|Uniprot XML C.trachomatis serovar A KS 20131114.xml]]
:UniProt XML file for ''C. trachomatis'' serovar A.
*[[Media: 22183.C trachomatis A KS 20131114.goa|22183.C trachomatis A KS 20131114.goa]]
:GOA file for ''C. trachomatis'' serovar A.
*[[Media: Ct-Std v1 KS 20131119.gdb|Ct-Std v1 KS 20131119.gdb]]
:First iteration of the export/import gene database cycle.
*[[Media: Ct-Std KS 20131121.gdb|Ct-Std KS 20131121.gdb]]
:Updated gene database.
*[[Media: Ct-Std v2 KS 20131121.gdb|Ct-Std v2 KS 20131121.gdb]]
:New gene database created with the new version of gmbuilder built on 11/21/2013 using the custom species profile for ''C. trachomatis.''

'''Array Express Article Raw Files'''

*[[Media:Raw_Data_DW_V1_12112013_A-GEOD-4692.adf.txt|Raw_Data_DW_V1_12112013_A-GEOD-4692.adf.txt]]
:Microarray raw ADF file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.idf.txt|Raw_Data_DW_V1_12112013_E-GEOD-39530.idf.txt]]
:Microarray raw IDF file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.processed.1.zip|Raw_Data_DW_V1_12112013_E-GEOD-39530.processed.1.zip]]
:Microarray raw processed zip file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.raw.1.zip|Raw_Data_DW_V1_12112013_E-GEOD-39530.raw.1.zip]]
:Microarray raw zip file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.sdrf.txt|Raw_Data_DW_V1_12112013_E-GEOD-39530.sdrf.txt]]
:Microarray SDRF text file for ''C. trachomatis''.
*[[Media:Sdrf information excel.xls|Sdrf information excel.xls]]
:sdrf information in excel spreadsheet format.

'''Excel Master Spreadsheet File Versions (Containing the Formatted Raw Data)'''

*[[Media:Master Spreadsheet.xls|master spreadsheet work in progress]]
:Initial Microarray raw data formatted and opened in Microsoft Excel.
*[[Media: C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.xls|C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.xls]]
:Sample C. trachomatis microarray data that was used to run GenMAPP with the latest ''C. trachomatis'' database.
*[[Media: C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt|C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt]]
:Tab-delimited version of the file above.
*[[Media: Final Master Spreadsheet.xls|Final Master Spreadsheet]]
:Microarray data with identified Gene ID's and Log base data.
*[[Media: Master_Spreadsheet_Chlamydia_20131125_KD.xls | Master_Spreadsheet_Chlamydia_20131125_KD.xls]]
:version edited during office hours by Dr. Dahlquist to demonstrate statistics
*[[Media: Master Spreadsheet Chlamydia V1 20131125 DW.xls]]
:finished version of version edited during office hours by Dr. Dahlquist to demonstrate statistics -- finished by Dillon Williams
*[[Media: Copy of Master Spreadsheet Chlamydia V2 20131203 HD.xls]]
:Microarray data with separated IDs CTA_#### from _RRMH#####_...
*[[Media: For GenMAPP Chlamydia V4 20131203 DW.xls]]
:Excel Speadsheet with data formatted for GenMAPP (Version run in GenMAPP).

'''For GenMAPP Files'''
*[[Media: Ctrachomatis EbtoRb ATP metabolic process.mapp]]
: EB to RB ATP Metabolic Process Mapp file.
*[[Media: Ctrachomatis EbtoRB glucose catabolic process 20131204 DW.mapp]]
: EB to RB glucose catabolic process map.
*[[Media: CTrachomatis EBtoRB Rifampicin 20131204 DW-Criterion1-GO.txt]]
: EB to RB Rifampicin go.txt file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.gex]]
: For_GenMAPP .gex file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.gmf]]
: For GenMAPP .gmf file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.zip]]
: Access file created with .gex file.
*[[Media: For GenMAPP Chlamydia V4 20131205 KS.gex]]
:Color set created in GenMAPP for EB to RB no rifampicin and increased/decreased logFC criteria

'''MAPPFinder Files'''
*[[Media:EB to RB No Rif V2-Criterion0-GO.txt]]
:Increased
*[[Media: EB to RB No Rif V2-Criterion1-GO.txt]]
:Deceased
*[[Media: For GenMAPP Chlamydia V4 20131205 KS.gmf]]
: File created by MAPPFinder after creating the latest color set.

==Wiki Formatting==

Template: [[Template:Team_H(oo)KD]]

[[Category: Group Projects]]
[[Category: Team H(oo)KD]]

Chlamydia trachomatis

2013-12-08T06:29:38Z

Ksherbina: /* Deliverable and Final Paper Assignments */ Added assignments for deliverables

{{Team_H(oo)KD}}

==Deadlines and Intermediate Milestones==

{| style="width: 100%; margin: 1em 0" cellpadding="5ex"
! style="width: 8em; background: #ddd" | Monday
! style="width: 8em; background: #ddd" | Tuesday
! style="width: 8em; background: #ddd" | Wednesday
! style="width: 8em; background: #ddd" | Thursday
! style="width: 8em; background: #ddd" | Friday
|-
| style="vertical-align: top" | 11/11
| style="vertical-align: top" | 11/12
| style="vertical-align: top" | 11/13
| style="vertical-align: top" | 11/14
*Run an export/import of gene database cycle '''KS/HD'''
*Create species profile '''KS'''
*Obtain and format raw microarray data '''DW'''
| style="vertical-align: top" | 11/15
*Rerun export/import of gene database cycle '''KS'''
|-
| style="vertical-align: top; background: #eee" | 11/18
| style="vertical-align: top; background: #eee" | 11/19
*Vet the gene database exported last week '''KS/HD/DW'''
*Format the raw microarray data '''DW'''
| style="vertical-align: top; background: #eee" | 11/20
| style="vertical-align: top; background: #eee" | 11/21
*Perform microarray statistical analysis '''DW'''
*Work on custom species profile '''KS/HD'''
| style="vertical-align: top; background: #eee" | 11/22
|-
| style="vertical-align: top" | 11/25
| style="vertical-align: top" | 11/26
*Build and commit species profile '''KS'''
*Customize Tally Engine '''KS/HD'''
*Perform GenMAPP and MAPPFinder Analysis '''DW'''
| style="vertical-align: top" | 11/27
'''THANKSGIVING BREAK'''
| style="vertical-align: top" | 11/28
'''THANKSGIVING BREAK'''
| style="vertical-align: top" | 11/29
'''THANKSGIVING BREAK'''
|-
| style="vertical-align: top; background: #eee" | 12/02
| style="vertical-align: top; background: #eee" | 12/03
*Run GenMAPP and MAPPFinder '''DW'''
*Customize the Tally Engine for ''C. trachomatis'' '''DW'''
| style="vertical-align: top; background: #eee" | 12/04
*Run GenMAPP and MAPPFinder again if necessary '''DW'''
| style="vertical-align: top; background: #eee" | 12/05
*Begin working on the relational database schema '''HD'''
*Commit the changes to gmbuilder '''KS'''
| style="vertical-align: top; background: #eee" | 12/06
|-
| style="vertical-align: top; background: #eee" | 12/09
*Write the paper '''HD/KS/DW'''
*Work on the presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/10
*Write the paper '''HD/KS/DW'''
*Work on the presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/11
*Finish the presentation '''HD/KS/DW'''
*Practice the final presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/12
*'''Final Presentation'''
*Proofread the final paper '''HD/KS/DW'''
*Make sure all the deliverables are raedy '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/13
|}

==Deliverable and Final Paper Assignments==

*GenMAPP Gene Database for assigned species (.gdb) '''Katrina'''
*ReadMe file to accompany the Gene Database (.pdf)
:*Include Gene Database Schema diagram in ReadMe '''Hilda'''
*Gene Database Testing Report for final submitted Gene Database (print from wiki to .pdf file) '''Katrina'''
*Processed and analyzed DNA microarray dataset (.xls) '''Dillon'''
*GenMAPP Expression Dataset file (.gex) '''Dillon'''
*Filtered MAPPFinder Results (.xls) '''Dillon'''
*Sample MAPP file of a relevant biological pathway for your species (.mapp) '''Dillon'''

==Annotated Bibliography of Genomics Papers for ''C. trachomatis''==

===Whole genome sequencing of ''C. trachomatis''===

#Journal article describing results of whole genome sequencing:
#*Database: PubMed
#*Search Terms: Chlamydia trachomatis [MeSH Term] AND genome [Title]
#*There were 29 results. Many of the articles that appear using the above search terms have to do with performing whole-genome analysis of ''C. trachomatis'' to look at genetic polymorphisms in the bacterium and molecular and genetic characteristics that are observed when the bacterium infects a host.
#*Stephens, R.S., Kalman, S., Lammel, C., Fan, J., Marathe, R., Aravind, L., Mitchell, W., Olinger, L., Tatusov, R., Zhao, Q., Koonin, E. V., Davis, R.W. (1998) Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis. ''Science'' '''282''': 754-759. 10.1126/science.282.5389.754.
#*The article can be viewed as an [http://www.sciencemag.org/content/282/5389/754.long HTML].
#*The article can be viewed as a [[Media: Science-1998-Stephens-754-9.pdf | PDF]].
#*This article includes [http://www.sciencemag.org/site/feature/data/982604.xhtml Supplementary Material].
#Use the genome sequencing article you found to perform a prospective search in the ISI Web of Science/Knowledge database.
#*How many results did you get?: 872
#*Based on the titles and abstracts of the papers, what type of research directions have been taken now that the genome for that organism has been sequenced?: Based on the titles and abstracts found in the results, it seems that research has been primarily in the direction of understanding how the disease functions and assessing the severity of the disease (and how to cure it).

===Microarray articles===
The EBI ArrayExpress database was used to locate each of the four following microarray articles. The search terms used were Chlamydia trachomatis under 'By organism' while the rest of the drop down menus were not altered.

The following articles appear in order of preference:

#Omsland, A., Sager, J., Nair, V., Sturdevant, D.E., Hackstadt, T. (2012) [http://www.pnas.org/content/109/48/19781.full Developmental stage-specific metabolic and transcriptional activity of Chlamydia trachomatis in an axenic medium]. ''PNAS'' '''109''': 19781-19785. doi: 10.1073/pnas.1212831109.
#*[[Media: PNAS-2012-Omsland-19781-5.pdf|PDF Version of Paper]]
#*[http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-39530/?keywords=&organism=Chlamydia+trachomatis&array=&exptype%5B%5D=&exptype%5B%5D= Microarray Data]
#*Chlamydiae undergo a biphasic developmental cycle characterized by an infectious cell type known as either an elementary body (EB) and an intracellular replicative form called a reticulate body (RB). Chlamydia was incubated under microaerobic conditions to test the differences in preferred energy source between EB's and RB's.
#*There weren't necessarily a "treatment" group and a "control" group. Referencing figure 4, it can be inferred that EB would be considered the "treatment" group because of the way that the ratio was set up, being EB to RB. In this way the RB would be the "control" group as they are looking at the EB population relative to the RB population.
#*4 Biological Replicates were performed for the control and treatment; as the article states that "Density gradient-purified EBs and RBs were incubated in quadruplicate in four-well plates...".


==Journal Club Presentation on ''C. trachomatis'' Genome Sequencing Paper==

[[Media:Presentation on C.trachomatis Genome Paper KS HD DW 20131112.pdf|Here]] is the PDF version of the presentation given on November 12, 2013 on the Stephens ''et al.'' (1998) paper.


==Important Files==

*[[Media: Go_daily-termdb_v2_HD_20131107.obo-xml.gz|Go_daily-termdb_v2_HD_20131107.obo-xml.gz]]
:GO OBO-XML file for ''C. trachomatis'' serovar A.
*[[Media: Uniprot XML C.trachomatis serovar A KS 20131114.xml|Uniprot XML C.trachomatis serovar A KS 20131114.xml]]
:UniProt XML file for ''C. trachomatis'' serovar A.
*[[Media: 22183.C trachomatis A KS 20131114.goa|22183.C trachomatis A KS 20131114.goa]]
:GOA file for ''C. trachomatis'' serovar A.
*[[Media: Ct-Std v1 KS 20131119.gdb|Ct-Std v1 KS 20131119.gdb]]
:First iteration of the export/import gene database cycle.
*[[Media: Ct-Std KS 20131121.gdb|Ct-Std KS 20131121.gdb]]
:Updated gene database.
*[[Media: Ct-Std v2 KS 20131121.gdb|Ct-Std v2 KS 20131121.gdb]]
:New gene database created with the new version of gmbuilder built on 11/21/2013 using the custom species profile for ''C. trachomatis.''

'''Array Express Article Raw Files'''

*[[Media:Raw_Data_DW_V1_12112013_A-GEOD-4692.adf.txt|Raw_Data_DW_V1_12112013_A-GEOD-4692.adf.txt]]
:Microarray raw ADF file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.idf.txt|Raw_Data_DW_V1_12112013_E-GEOD-39530.idf.txt]]
:Microarray raw IDF file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.processed.1.zip|Raw_Data_DW_V1_12112013_E-GEOD-39530.processed.1.zip]]
:Microarray raw processed zip file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.raw.1.zip|Raw_Data_DW_V1_12112013_E-GEOD-39530.raw.1.zip]]
:Microarray raw zip file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.sdrf.txt|Raw_Data_DW_V1_12112013_E-GEOD-39530.sdrf.txt]]
:Microarray SDRF text file for ''C. trachomatis''.
*[[Media:Sdrf information excel.xls|Sdrf information excel.xls]]
:sdrf information in excel spreadsheet format.

'''Excel Master Spreadsheet File Versions (Containing the Formatted Raw Data)'''

*[[Media:Master Spreadsheet.xls|master spreadsheet work in progress]]
:Initial Microarray raw data formatted and opened in Microsoft Excel.
*[[Media: C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.xls|C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.xls]]
:Sample C. trachomatis microarray data that was used to run GenMAPP with the latest ''C. trachomatis'' database.
*[[Media: C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt|C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt]]
:Tab-delimited version of the file above.
*[[Media: Final Master Spreadsheet.xls|Final Master Spreadsheet]]
:Microarray data with identified Gene ID's and Log base data.
*[[Media: Master_Spreadsheet_Chlamydia_20131125_KD.xls | Master_Spreadsheet_Chlamydia_20131125_KD.xls]]
:version edited during office hours by Dr. Dahlquist to demonstrate statistics
*[[Media: Master Spreadsheet Chlamydia V1 20131125 DW.xls]]
:finished version of version edited during office hours by Dr. Dahlquist to demonstrate statistics -- finished by Dillon Williams
*[[Media: Copy of Master Spreadsheet Chlamydia V2 20131203 HD.xls]]
:Microarray data with separated IDs CTA_#### from _RRMH#####_...
*[[Media: For GenMAPP Chlamydia V4 20131203 DW.xls]]
:Excel Speadsheet with data formatted for GenMAPP (Version run in GenMAPP).

'''For GenMAPP Files'''
*[[Media: Ctrachomatis EbtoRb ATP metabolic process.mapp]]
: EB to RB ATP Metabolic Process Mapp file.
*[[Media: Ctrachomatis EbtoRB glucose catabolic process 20131204 DW.mapp]]
: EB to RB glucose catabolic process map.
*[[Media: CTrachomatis EBtoRB Rifampicin 20131204 DW-Criterion1-GO.txt]]
: EB to RB Rifampicin go.txt file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.gex]]
: For_GenMAPP .gex file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.gmf]]
: For GenMAPP .gmf file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.zip]]
: Access file created with .gex file.
*[[Media: For GenMAPP Chlamydia V4 20131205 KS.gex]]
:Color set created in GenMAPP for EB to RB no rifampicin and increased/decreased logFC criteria

'''MAPPFinder Files'''
*[[Media:EB to RB No Rif V2-Criterion0-GO.txt]]
:Increased
*[[Media: EB to RB No Rif V2-Criterion1-GO.txt]]
:Deceased
*[[Media: For GenMAPP Chlamydia V4 20131205 KS.gmf]]
: File created by MAPPFinder after creating the latest color set.

==Wiki Formatting==

Template: [[Template:Team_H(oo)KD]]

[[Category: Group Projects]]
[[Category: Team H(oo)KD]]

Chlamydia trachomatis

2013-12-08T06:26:10Z

Ksherbina: Added a new section

{{Team_H(oo)KD}}

==Deadlines and Intermediate Milestones==

{| style="width: 100%; margin: 1em 0" cellpadding="5ex"
! style="width: 8em; background: #ddd" | Monday
! style="width: 8em; background: #ddd" | Tuesday
! style="width: 8em; background: #ddd" | Wednesday
! style="width: 8em; background: #ddd" | Thursday
! style="width: 8em; background: #ddd" | Friday
|-
| style="vertical-align: top" | 11/11
| style="vertical-align: top" | 11/12
| style="vertical-align: top" | 11/13
| style="vertical-align: top" | 11/14
*Run an export/import of gene database cycle '''KS/HD'''
*Create species profile '''KS'''
*Obtain and format raw microarray data '''DW'''
| style="vertical-align: top" | 11/15
*Rerun export/import of gene database cycle '''KS'''
|-
| style="vertical-align: top; background: #eee" | 11/18
| style="vertical-align: top; background: #eee" | 11/19
*Vet the gene database exported last week '''KS/HD/DW'''
*Format the raw microarray data '''DW'''
| style="vertical-align: top; background: #eee" | 11/20
| style="vertical-align: top; background: #eee" | 11/21
*Perform microarray statistical analysis '''DW'''
*Work on custom species profile '''KS/HD'''
| style="vertical-align: top; background: #eee" | 11/22
|-
| style="vertical-align: top" | 11/25
| style="vertical-align: top" | 11/26
*Build and commit species profile '''KS'''
*Customize Tally Engine '''KS/HD'''
*Perform GenMAPP and MAPPFinder Analysis '''DW'''
| style="vertical-align: top" | 11/27
'''THANKSGIVING BREAK'''
| style="vertical-align: top" | 11/28
'''THANKSGIVING BREAK'''
| style="vertical-align: top" | 11/29
'''THANKSGIVING BREAK'''
|-
| style="vertical-align: top; background: #eee" | 12/02
| style="vertical-align: top; background: #eee" | 12/03
*Run GenMAPP and MAPPFinder '''DW'''
*Customize the Tally Engine for ''C. trachomatis'' '''DW'''
| style="vertical-align: top; background: #eee" | 12/04
*Run GenMAPP and MAPPFinder again if necessary '''DW'''
| style="vertical-align: top; background: #eee" | 12/05
*Begin working on the relational database schema '''HD'''
*Commit the changes to gmbuilder '''KS'''
| style="vertical-align: top; background: #eee" | 12/06
|-
| style="vertical-align: top; background: #eee" | 12/09
*Write the paper '''HD/KS/DW'''
*Work on the presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/10
*Write the paper '''HD/KS/DW'''
*Work on the presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/11
*Finish the presentation '''HD/KS/DW'''
*Practice the final presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/12
*'''Final Presentation'''
*Proofread the final paper '''HD/KS/DW'''
*Make sure all the deliverables are raedy '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/13
|}

==Deliverable and Final Paper Assignments==

==Annotated Bibliography of Genomics Papers for ''C. trachomatis''==

===Whole genome sequencing of ''C. trachomatis''===

#Journal article describing results of whole genome sequencing:
#*Database: PubMed
#*Search Terms: Chlamydia trachomatis [MeSH Term] AND genome [Title]
#*There were 29 results. Many of the articles that appear using the above search terms have to do with performing whole-genome analysis of ''C. trachomatis'' to look at genetic polymorphisms in the bacterium and molecular and genetic characteristics that are observed when the bacterium infects a host.
#*Stephens, R.S., Kalman, S., Lammel, C., Fan, J., Marathe, R., Aravind, L., Mitchell, W., Olinger, L., Tatusov, R., Zhao, Q., Koonin, E. V., Davis, R.W. (1998) Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis. ''Science'' '''282''': 754-759. 10.1126/science.282.5389.754.
#*The article can be viewed as an [http://www.sciencemag.org/content/282/5389/754.long HTML].
#*The article can be viewed as a [[Media: Science-1998-Stephens-754-9.pdf | PDF]].
#*This article includes [http://www.sciencemag.org/site/feature/data/982604.xhtml Supplementary Material].
#Use the genome sequencing article you found to perform a prospective search in the ISI Web of Science/Knowledge database.
#*How many results did you get?: 872
#*Based on the titles and abstracts of the papers, what type of research directions have been taken now that the genome for that organism has been sequenced?: Based on the titles and abstracts found in the results, it seems that research has been primarily in the direction of understanding how the disease functions and assessing the severity of the disease (and how to cure it).

===Microarray articles===
The EBI ArrayExpress database was used to locate each of the four following microarray articles. The search terms used were Chlamydia trachomatis under 'By organism' while the rest of the drop down menus were not altered.

The following articles appear in order of preference:

#Omsland, A., Sager, J., Nair, V., Sturdevant, D.E., Hackstadt, T. (2012) [http://www.pnas.org/content/109/48/19781.full Developmental stage-specific metabolic and transcriptional activity of Chlamydia trachomatis in an axenic medium]. ''PNAS'' '''109''': 19781-19785. doi: 10.1073/pnas.1212831109.
#*[[Media: PNAS-2012-Omsland-19781-5.pdf|PDF Version of Paper]]
#*[http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-39530/?keywords=&organism=Chlamydia+trachomatis&array=&exptype%5B%5D=&exptype%5B%5D= Microarray Data]
#*Chlamydiae undergo a biphasic developmental cycle characterized by an infectious cell type known as either an elementary body (EB) and an intracellular replicative form called a reticulate body (RB). Chlamydia was incubated under microaerobic conditions to test the differences in preferred energy source between EB's and RB's.
#*There weren't necessarily a "treatment" group and a "control" group. Referencing figure 4, it can be inferred that EB would be considered the "treatment" group because of the way that the ratio was set up, being EB to RB. In this way the RB would be the "control" group as they are looking at the EB population relative to the RB population.
#*4 Biological Replicates were performed for the control and treatment; as the article states that "Density gradient-purified EBs and RBs were incubated in quadruplicate in four-well plates...".


==Journal Club Presentation on ''C. trachomatis'' Genome Sequencing Paper==

[[Media:Presentation on C.trachomatis Genome Paper KS HD DW 20131112.pdf|Here]] is the PDF version of the presentation given on November 12, 2013 on the Stephens ''et al.'' (1998) paper.


==Important Files==

*[[Media: Go_daily-termdb_v2_HD_20131107.obo-xml.gz|Go_daily-termdb_v2_HD_20131107.obo-xml.gz]]
:GO OBO-XML file for ''C. trachomatis'' serovar A.
*[[Media: Uniprot XML C.trachomatis serovar A KS 20131114.xml|Uniprot XML C.trachomatis serovar A KS 20131114.xml]]
:UniProt XML file for ''C. trachomatis'' serovar A.
*[[Media: 22183.C trachomatis A KS 20131114.goa|22183.C trachomatis A KS 20131114.goa]]
:GOA file for ''C. trachomatis'' serovar A.
*[[Media: Ct-Std v1 KS 20131119.gdb|Ct-Std v1 KS 20131119.gdb]]
:First iteration of the export/import gene database cycle.
*[[Media: Ct-Std KS 20131121.gdb|Ct-Std KS 20131121.gdb]]
:Updated gene database.
*[[Media: Ct-Std v2 KS 20131121.gdb|Ct-Std v2 KS 20131121.gdb]]
:New gene database created with the new version of gmbuilder built on 11/21/2013 using the custom species profile for ''C. trachomatis.''

'''Array Express Article Raw Files'''

*[[Media:Raw_Data_DW_V1_12112013_A-GEOD-4692.adf.txt|Raw_Data_DW_V1_12112013_A-GEOD-4692.adf.txt]]
:Microarray raw ADF file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.idf.txt|Raw_Data_DW_V1_12112013_E-GEOD-39530.idf.txt]]
:Microarray raw IDF file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.processed.1.zip|Raw_Data_DW_V1_12112013_E-GEOD-39530.processed.1.zip]]
:Microarray raw processed zip file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.raw.1.zip|Raw_Data_DW_V1_12112013_E-GEOD-39530.raw.1.zip]]
:Microarray raw zip file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.sdrf.txt|Raw_Data_DW_V1_12112013_E-GEOD-39530.sdrf.txt]]
:Microarray SDRF text file for ''C. trachomatis''.
*[[Media:Sdrf information excel.xls|Sdrf information excel.xls]]
:sdrf information in excel spreadsheet format.

'''Excel Master Spreadsheet File Versions (Containing the Formatted Raw Data)'''

*[[Media:Master Spreadsheet.xls|master spreadsheet work in progress]]
:Initial Microarray raw data formatted and opened in Microsoft Excel.
*[[Media: C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.xls|C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.xls]]
:Sample C. trachomatis microarray data that was used to run GenMAPP with the latest ''C. trachomatis'' database.
*[[Media: C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt|C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt]]
:Tab-delimited version of the file above.
*[[Media: Final Master Spreadsheet.xls|Final Master Spreadsheet]]
:Microarray data with identified Gene ID's and Log base data.
*[[Media: Master_Spreadsheet_Chlamydia_20131125_KD.xls | Master_Spreadsheet_Chlamydia_20131125_KD.xls]]
:version edited during office hours by Dr. Dahlquist to demonstrate statistics
*[[Media: Master Spreadsheet Chlamydia V1 20131125 DW.xls]]
:finished version of version edited during office hours by Dr. Dahlquist to demonstrate statistics -- finished by Dillon Williams
*[[Media: Copy of Master Spreadsheet Chlamydia V2 20131203 HD.xls]]
:Microarray data with separated IDs CTA_#### from _RRMH#####_...
*[[Media: For GenMAPP Chlamydia V4 20131203 DW.xls]]
:Excel Speadsheet with data formatted for GenMAPP (Version run in GenMAPP).

'''For GenMAPP Files'''
*[[Media: Ctrachomatis EbtoRb ATP metabolic process.mapp]]
: EB to RB ATP Metabolic Process Mapp file.
*[[Media: Ctrachomatis EbtoRB glucose catabolic process 20131204 DW.mapp]]
: EB to RB glucose catabolic process map.
*[[Media: CTrachomatis EBtoRB Rifampicin 20131204 DW-Criterion1-GO.txt]]
: EB to RB Rifampicin go.txt file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.gex]]
: For_GenMAPP .gex file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.gmf]]
: For GenMAPP .gmf file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.zip]]
: Access file created with .gex file.
*[[Media: For GenMAPP Chlamydia V4 20131205 KS.gex]]
:Color set created in GenMAPP for EB to RB no rifampicin and increased/decreased logFC criteria

'''MAPPFinder Files'''
*[[Media:EB to RB No Rif V2-Criterion0-GO.txt]]
:Increased
*[[Media: EB to RB No Rif V2-Criterion1-GO.txt]]
:Deceased
*[[Media: For GenMAPP Chlamydia V4 20131205 KS.gmf]]
: File created by MAPPFinder after creating the latest color set.

==Wiki Formatting==

Template: [[Template:Team_H(oo)KD]]

[[Category: Group Projects]]
[[Category: Team H(oo)KD]]

Team H(oo)KD Week 15 Status Report

2013-12-06T07:02:04Z

Ksherbina: /* Reflection */ Finished reflection

{{Team H(oo)KD}}

'''Refer to the calendar on the team home page to see the milestones for this week.'''

==Coder Status Report==

The following was accomplished during Weeks 14-15:
*Determined why TallyEngine counted 917 genes in the gene database for ''C. trachomatis'' while the count was 919 when viewing the gene database in Access.
:*Apparently, one ordered locus ID (CTA_0406/CTA_0407/CTA_0408) is actually a combination of three ordered locus IDs each of which were predicted to correspond to a different gene before it was found that all three IDs actually correspond to the same gene. TallyEngine does not separate the ordered locus IDs while Access does.
:*In consulting with Dr. Dahlquist and Dr. Dionisio, we decided to leave this as is and describe the discrepancy in the final testing report, which is recorded in my [[Ksherbina Project Notebook|project journal]] under November 27-28, 2013.
*Customized the Tally Engine for ''C. trachomatis'' and committed the changes to SourceForge.

[[User:Ksherbina|Ksherbina]] ([[User talk:Ksherbina|talk]]) 22:41, 5 December 2013 (PST)

===Reflection===

#The Quality Assurance person (Hilda) and I were able to finish working on the gene database and figuring out how to separate the gene IDs from the Affymetrix IDs appened to the gene IDs in the microarray data. This allowed as to devote the rest of our time to working with Dillon to perform the GenMAPP and MAPPFinder analysis. Lucky for us, no exceptions file was generated when running GenMAPP with the microarray data with the modified IDs and the gene database for ''C. trachomatis''.
#When we originally set the milestones, we had planned to start working on both the paper and the presentation this week. Unfortunately, we had not advanced far enough in finalizing the database and performing GenMAPP analysis at the beginning of the week to be able to do this.
#With next week being finals week, we will have several meetings leading up to the final presentation in order to work on the paper, the presentation, and deliverables to make sure we meet the final deadline.

[[User:Ksherbina|Ksherbina]] ([[User talk:Ksherbina|talk]]) 23:02, 5 December 2013 (PST)

==Quality Assurance Status Report==

===Reflection===

==GenMAPP User Status Report==

===Reflection===

Team H(oo)KD Week 15 Status Report

2013-12-06T06:56:14Z

Ksherbina: /* Coder Status Report */ Started reflection

{{Team H(oo)KD}}

'''Refer to the calendar on the team home page to see the milestones for this week.'''

==Coder Status Report==

The following was accomplished during Weeks 14-15:
*Determined why TallyEngine counted 917 genes in the gene database for ''C. trachomatis'' while the count was 919 when viewing the gene database in Access.
:*Apparently, one ordered locus ID (CTA_0406/CTA_0407/CTA_0408) is actually a combination of three ordered locus IDs each of which were predicted to correspond to a different gene before it was found that all three IDs actually correspond to the same gene. TallyEngine does not separate the ordered locus IDs while Access does.
:*In consulting with Dr. Dahlquist and Dr. Dionisio, we decided to leave this as is and describe the discrepancy in the final testing report, which is recorded in my [[Ksherbina Project Notebook|project journal]] under November 27-28, 2013.
*Customized the Tally Engine for ''C. trachomatis'' and committed the changes to SourceForge.

[[User:Ksherbina|Ksherbina]] ([[User talk:Ksherbina|talk]]) 22:41, 5 December 2013 (PST)

===Reflection===

#The Quality Assurance person (Hilda) and I were able to finish working on the gene database and figuring out how to separate the gene IDs from the Affymetrix IDs appened to the gene IDs in the microarray data. This allowed as to devote the rest of our time to working with Dillon to perform the GenMAPP and MAPPFinder analysis. Lucky for us, no exceptions file was generated when running GenMAPP with the microarray data with the modified IDs and the gene database for ''C. trachomatis''.
#When we originally set the milestones, we had planned to start working on both the paper and the presentation this week. Unfortunately, we had not advanced far enough in finalizing the database and performing GenMAPP analysis at the beginning of the week to be able to do this.

==Quality Assurance Status Report==

===Reflection===

==GenMAPP User Status Report==

===Reflection===

Team H(oo)KD Week 15 Status Report

2013-12-06T06:42:23Z

Ksherbina: Added statement regarding where to find the milestones

Team H(oo)KD Week 15 Status Report

2013-12-06T06:41:54Z

Ksherbina: /* Coder Status Report */ Finished status report

{{Team H(oo)KD}}

==Coder Status Report==

The following was accomplished during Weeks 14-15:
*Determined why TallyEngine counted 917 genes in the gene database for ''C. trachomatis'' while the count was 919 when viewing the gene database in Access.
:*Apparently, one ordered locus ID (CTA_0406/CTA_0407/CTA_0408) is actually a combination of three ordered locus IDs each of which were predicted to correspond to a different gene before it was found that all three IDs actually correspond to the same gene. TallyEngine does not separate the ordered locus IDs while Access does.
:*In consulting with Dr. Dahlquist and Dr. Dionisio, we decided to leave this as is and describe the discrepancy in the final testing report, which is recorded in my [[Ksherbina Project Notebook|project journal]] under November 27-28, 2013.
*Customized the Tally Engine for ''C. trachomatis'' and committed the changes to SourceForge.

[[User:Ksherbina|Ksherbina]] ([[User talk:Ksherbina|talk]]) 22:41, 5 December 2013 (PST)

===Reflection===

==Quality Assurance Status Report==

===Reflection===

==GenMAPP User Status Report==

===Reflection===

Team H(oo)KD Week 15 Status Report

2013-12-06T06:40:22Z

Ksherbina: /* Coder Status Report */ Began status report

{{Team H(oo)KD}}

==Coder Status Report==

*Determined why TallyEngine counted 917 genes in the gene database for ''C.trachomatis'' while the count was 919 when viewing the gene database in Access.
:*Apparently, one ordered locus ID (CTA_0406/CTA_0407/CTA_0408) is actually a combination of three ordered locus IDs each of which were predicted to correspond to a different gene before it was found that all three IDs actually correspond to the same gene. TallyEngine does not separate the ordered locus IDs while Access does.
:*In consulting with Dr. Dahlquist and Dr. Dionisio, we decided to leave this as is and describe the discrepancy in the final testing report, which is recorded in my [[Ksherbina Project Notebook|project journal]] under November 27-28, 2013.

===Reflection===

==Quality Assurance Status Report==

===Reflection===

==GenMAPP User Status Report==

===Reflection===

Ksherbina Project Notebook

2013-12-06T06:35:02Z

Ksherbina: /* Customize Tally Engine for C. trachomatis */ Finished recording steps.

{{Ksherbina}}

==Week 12==

===Starting the Set Up of the Testing Environment===

#Downloaded the following software onto my personal computer:
#*Latest version of GenMAPP Builder (gmbuilder2.0-b71) from [http://sourceforge.net/projects/xmlpipedb/files/?source=navbar SourceForge]
#*[http://www.oracle.com/technetwork/java/javase/downloads/index.html Java SE 7u45]
#*[http://www.eclipse.org/downloads/ Eclipse IDE for Java EE Developers]
#Created an account on [http://sourceforge.net/ SourceForge].

===Import/Export of Gene Database===

#Downloaded the [http://www.uniprot.org/uniprot/?query=organism:315277+keyword:181 UniProt XML] for ''C. trachomatis'' serovar A strain HAR-13: [[Media:Uniprot XML C.trachomatis serovar A KS 20131114.xml|Uniprot XML C.trachomatis serovar A KS 20131114.xml]]
#Downloaded the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ GOA file] for ''C. trachomatis'' serovar A: [[Media:22183.C trachomatis A KS 20131114.goa|22183.C trachomatis A KS 20131114.goa]]
#Created a new database in pgAdminIII: CT_KS_20131114_gmb2b71
#*Created tables in the database by executing the script in the file ''gmbuilder.sql''.
#Launched ''gmbuilder-32bit.bat'':
#Configured the database in gmbuilder:
#*Host or address: localhost
#*Port number: 5432
#*Database name: CT_KS_20131114_gmb2b71
#*Username: postgres
#*Password: <password of the PostgreSQL database created above>
#Imported the UniProt XML file into the PostgreSQL database through gmbuilder.
#Imported the OBO-XML file into the PostgreSQL database through gmbuilder.
#*gmbuilder was not able to completely process the data. See the testing report below for details.
#Imported the GOA file into into the PostgreSQL database through gmbuilder.
#*I did not begin the actual export of the database due to the problem with processing the OBO-XML file.

====Testing Report====

Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131114_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_11
* Time taken to import: 1.70 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped):
* Time taken to import: 9.29 min
* Time taken to process: '''Could no complete the process. The following error came up when gmbuilder was processing the 81000 term:'''
ERROR edu.lmu.xmlpipedb.gmbuilder.GenMAPPBuilder - java.util.concurrent.ExecutionException:java.lang.OutOfMemoryError:
Java heap space

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]):
* Time taken to import: 0.02 min

Name of .gdb file:
* Time taken to export .gdb:
* Upload your file and link to it here.

Note:

===Creating Custom Species Profile===

#On one of the class computers in the first row, I launched Eclipse with Subclipse.
#Went to Window > Open Perspective > Other > SVN Repository Exploring.
#Defined a new subversion by clicking on ''Add Repository'' and adding the URL https://svn.code.sf.net/p/xmlpipedb/code. Clicked Finish.
#Double-clicked on trunk, right-clicked on gmbuilder, and chose Checkout....
#Created a new project (chose Check out as a project configured using the New Project Wizard).
#Went to Window > Preferences > Java > Installed JREs. Added a new environment with JDK.
#Double-clicked on the project folder.
#Double-clicked on the lib folder.
#Shift-clicked and selected all files in the lib folder and then control-clicked on every file inside the lib folder whose name does not end in .jar. Right-click on one of them and choose Build Path > Add to Build Path from the popup menu that comes up.
#Created a species profile called ChlamydiaTrachomatisSerovarA.
#*Dondi modified the name to ChlamydiaTrachomatisSerovarAUniProtSpeciesProfile.
#Customized the species profile.
#*Opened the new profile and added the lines of code as specified in [[Coder|"Customize the Species Profile"]] directions.
#**Still need to insert the species-specific URL that returns a web page describing a gene for that species.
#Committed the profile following the directions in [[Coder|"Updating and Committing Code"]].

===Reflection===

#What were the week’s key accomplishments?
#*One of this week's key accomplishments was downloading the files necessary to run a import/export gene database cycle and the microarray files that will be analyzed. Also this week, we were able to start an import/export cycle.
#What are next week’s target accomplishments?
#*Next week, I would like to complete an import/export cycle and vet it with the QA person. In addition, I hope that we can download the software necessary to open the microarray files, format the files, and perform statistical analysis.
#What team strengths were seen this week?
#*I believe our strength this week as a team was meeting outside of class to create milestones. This meeting results in a detailed calendar of intermediate milestones that we plan to meet accomplish in order to meet the file deadline.
#What team weaknesses were seen this week?
#*It was difficult to set up the aforementioned meeting due to our busy schedules. Additionally, we did not have a clear plan for working on the project until the middle/end of the week. In the coming weeks, our goal will be to improve our work plan for each class period so as to minimize the difficulty of scheduling work meetings outside of class.

==Week 13==

===November 15, 2013===

====Import/Export Gene Database====

After the error that came up last week regarding the Java heap space, I performed the following steps to increase the heap space:
#Open up ''gmbuilder-32bit.bat'' in Notepad.
#Find the line
"C:\Program Files (x86)\Java\jre7\bin\java" -D32 -Xmx1024m -jar gmbuilder.jar
#Changed "-Xmx1024m" to "-Xmx2048m" thereby increasing the maximum heap space from 1024 MB to 2048 MB.
#Saved the file and closed it.

*Then, I tried the import/export again. I could not open ''gmbuilder-32bit.bat'' after I changed the maximum heap space. I repeated the above steps for ''gmbuilder.bat'' and found that I could then still open the program.
*Opening ''gmbuilder.bat'', I imported both the UniProt XML file and the OBO-XML file. However, I received a Java heap space error again when processing the OBO-XML file.
*I increased the heap space in ''gmbuilder.bat'' from 2048 MB to 4096 MB and tried the import/export cycle again. When this did not work, I increased the heap space in ''gmbuilder.bat'' to 8192 MB.
*Then, I was able to import the UniProt XML, OBO-XML, and GOA files as well as process the OBO-XML.
*However, I was not able to begin the export because an error popped up stating that an export to a GenMAPP database is only possible in ''gmbuilder-32bit.bat''.

=====Testing Report=====

Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131114_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]):
* Time taken to import: 0.73 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped):
* Time taken to import: 7.39 min
* Time taken to process: 47.15 min

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]):
* Time taken to import: 0.03 min

Name of .gdb file:
* Time taken to export .gdb:
* Upload your file and link to it here.

Note: '''Could not export the database. Received an error stating that an export to a GenMAPP database is only possible in ''gmbuilder-32bit.bat''.'''

====Troubleshooting====

#Ran cmd from the Start menu.
#cd to where GenMAPP Builder is located.
#Typed the same command that is in the .bat file, but directly at the prompt:
"C:\Program Files (x86)\Java\jre7\bin\java" -D32 -Xmx1536m -jar gmbuilder.jar

Received the following error:
Error occurred during initialization of VM.
Could not reserve enough space for project heap.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit

===November 19, 2013===

====Rerunning the Gene Database Import/Export====

*From November 11/15/2013, the UniProt XML, OBO-XML, and GOA files were imported and the OBO-XML file was processed the in 64-bit ''gmbuilder.bat'' with the maximum heap space set to 8192.
*Performed the export in 32-bit ''gmbuilder.bat''.

=====Testing Report=====

Version of GenMAPP Builder: Same as in previous testing report

Computer on which export was run: Same as in previous testing report

Postgres Database name: Same as in previous testing report

UniProt XML filename: Same as in previous testing report
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]):
* Time taken to import: 1.10 min

GO OBO-XML filename: Same as in previous testing report
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped):
* Time taken to import: 8.96 min
* Time taken to process: Did not reprocess because received a message stating that the GO data had already been processed.

GOA filename: Same as in previous testing report
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]):
* Time taken to import: 0.03 min

Name of .gdb file: Ct-Std v1 KS 20131119.gdb
* Time taken to export .gdb:
:*Start Time: 4:01:10 AM
:*End Time: 6:51:58 AM
* Upload your file and link to it here: [[Media:Ct-Std v1 KS 20131119.gdb|Ct-Std v1 KS 20131119.gdb]]

Note:

====Checking the Quality of the Exported Database====

#In ''gmbuilder-32bit.bat'', chose Run XML > Database Tallies for UniProt and GO....
:*The XML count for the UniProt file was 917 while the Database Count for the Uniprot file was 10087. This number discrepancy is likely due to importing the UniProt, OBO-XML, and GOA files multiple times in experimenting with the heap space threshold.
:*A new database must now be created in pgAdminIII in order to repeat the export/import gene database cycle.

===November 20-21, 2013===

====Finishing Installing Software to Set Up the Development Environment====

#Navigated to the "Download and Install" tab on the [http://subclipse.tigris.org/ Subclipse home page]
#Copied the Eclipse update site URL for the 1.10.x Release.
#Opened up Eclipse.
#Help > Install New Software...
#Pasted the URL into the "Work with" field.
#Checked the box next to "Subclipse" and "SVNKit".
#Clicked on Next and went through the process of installing the software.

====Repeating the Gene Database Import/Export Cycle====

#Created a new database in pgAdminIII.
#*Created tables in the database by executing the script in the file gmbuilder.sql.
#Since the aforementioned error occurred because duplicates were introduced into the database through repeated imports, I decided to try this new import-export cycle using 32-bit ''gmbuilder''.
#Opened ''gmbuilder-32bit.bat'' with the heap space set to 1024 MB and configured the new database.
#*Host or address: localhost
#*Port number: 5432
#*Database name: CT_KS_20131119_32bit_gmb2b71
#*Username: postgres
#*Password: <password of the PostgreSQL database created above>
#Imported the UniProt XML file into PostgreSQL through ''gmbuilder-32bit.bat''.
#Imported the OBO-XML file into PostgreSQL through ''gmbuilder-32bit.bat''.
#Processed the OBO-XML file through ''gmbuilder-32bit.bat''.
#Imported the GOA file into PostgreSQL through ''gmbuilder-32bit.bat''.

=====Testing Report=====
Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131119_32bit_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_11
::*Original file name from [http://www.uniprot.org/uniprot/?query=organism:315277+keyword:181 UniProt site]: uniprot-organism%3A315277+keyword%3A181.xml
* Time taken to import: 1.20 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): 11/06/2013
::*Original file name: go_daily-termdb.obo-xml.gz
* Time taken to import: 13.05 min
* Time taken to process: 10.80 min

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): 11/12/13
::*Original file name: 22183.C_trachomatis_A.goa
* Time taken to import: 0.04 min

Name of .gdb file: Ct-Std KS 20131121.gdb
* Time taken to export .gdb:
:*Start Time: 2:37:42 AM
:*End Time: 2:55:45 AM
* Upload your file and link to it here: [[Media:Ct-Std KS 20131121.gdb|Ct-Std KS 20131121.gdb]]

Note: '''There were no heap space errors this time, which is good. The errors probably occurred in the first place because of repeatedly importing the files.'''

===November 21, 2013===

====Revised the Custom Species Profile====

#Ran Eclipse.
#Navigated to Window > Open Perspective > Other... and chose SVN Repository Exploring
#Defined a new Subversion repository by clicking on the Add Repository button.
#Set the URL to https://svn.code.sf.net/p/xmlpipedb/code then clicked Finish.
#Double-cliked on https://svn.code.sf.net/p/xmlpipedb/code in the list.
#Double-clicked on ''trunk''.
#Right-clicked on ''gmbuilder'' and chose ''Checkout....''
#Chose ''Check out as a project configured using the New Project Wizard'' then click ''Finish''.
#In the New Project dialog that opened, chose Java Project from the list and then clicked ''Next >''.
#Entered the new project name ''xmlpipedb-gmbuilder'' and clicked ''Finish''.
#Set Eclipse to use JDK.
#*Navigated to Windows > Preferences and clicked on the ''Search'' button.
#*Expanded Computer > Local Disk (C:) > Program Files > Java > jdk1.7.0_45 and clicked OK.
#*Highlighted the row ''jdk'' and clicked on ''Edit''.
#*In "JRE home:" field, removed everything in the name after jdk1.7.0_45 and then clicked ''Finish''.
#Made sure that I was in the Java perspective by navigating to Window > Open Perspective > Other... and then choosing Java.
#Double-clicked on the ''xmlpipedb-gmbuilder'' folder.
#Double-clicked on the ''lib'' folder.
#Shift-clicked and selected all files in the lib folder and then control-clicked on every file inside the lib folder whose name did not end in .jar.
#Right-clicked on one of the files and choose Build Path > Add to Build Path.
#Check that the ''src'' folder was set to the source folder by right-clicking on it and then clicking on ''Build Path''.
#Right-clicked on the ''test'' folder and chose Build Path > Use as Source Folder. Then, made sure that there were no red x's in the list of folders.
#Right-clicked on ''build.xml'' toward the bottom of the list. Then, navigated to Team > Synchronize with Repository.
#Double-clicked on src > edu.lmu.xmlpipedb.gmbuilder.databasetoolkit.profiles > ChlamydiaTrachomatisSerovarAUniProtSpeciesProfile.java.\
#Within the code, added the gene database link pattern: "http://bacteria.ensembl.org/chlamydia_trachomatis_a_har_13/Gene/Summary?g=~" and then saved the code.
#Right-clicked on ''build.xml'' and chose Run As > Ant Build... (the one with the ellipses).
#In the Edit Configuration dialog that appears, checked on the clean and dist items in the Targets tab.
#Clicked on the ''Order...'' button and rearranged the items so that clean is first and dist is second and then clicked OK.
#Clicked the Run button.
#Right-clicked on the ''xmlpipedb-gmbuilder'' project folder and chose Refresh (F5 is its keyboard shortcut).

====Reran the Database Export With the New Build for gmbuilder====

#Made sure that pgAdminIII and the database "CT_KS_20131119_32bit_gmb2b71" is open.
#Opened the newly built ''gmbuilder-32bit.bat'' within the folder [User Folder] > workspace > xmlpipedb-gmbuilder > dist.
#Made sure that the database was configured to "CT_KS_20131119_32bit_gmb2b71".
#Performed a database export.

=====Testing Report=====

*Name of .gdb file: Ct-Std_v2_KS_20131121.gdb
*Time taken to export .gdb:
:*Start Time: 10:56 AM (approximately)
:*End Time: 11:14:26 AM
*Upload your file and link to it here: [[Media:Ct-Std v2 KS 20131121.gdb|Ct-Std v2 KS 20131121.gdb]]

Note:

====Testing the New Database with GenMAPP====
#Opened GenMAPP.
#Clicked on Data > New Gene Database and selected the gene database that was created today (Ct-Std v2 KS 20131121.gdb).
#On the toolbar, clicked on the button that has the word gene in a box.
#Clicked and dragged the cursor on the blank canvas to create a new gene box.
#Right-clicked on the gene box.
#In the pop-up window, typed in "CTA_0588" in the Gene ID search field.
#Under Gene ID System, selected Ordered Locus Names.
#Then, hit the "Search button". Interestingly, GenMAPP builder did not find this gene in the database.
#*To troubleshoot, I opened up the gene database in Access.
#*In the list of tables, I double-clicked on OrderedLocusNames.
#*I sorted the list of gene IDs from A-Z and then scrolled down to try to find "CTA_0588". I was able to find "CTA_0587" and "CTA_0589" but not "CTA_0588".
#Back in GenMAPP, I used the same steps as above to find "CTA_0587".
#"CTA_0587" was found in the database. In the window, I clicked on the link "CTA_0587" under the OrderedLocusNames section.
#*This link took me to the entry for this gene on the Model Organism Database (http://bacteria.ensembl.org/chlamydia_trachomatis_a_har_13/Gene/Summary?g=CTA_0587).

====Performing Quality Assurance on the New Database====

*Performed counts with Tally Engine.
[[Image:Tally Engine Results for Ct-Std v2 KS 20131121.jpg|thumb|none|upright=4]]
*Counted the number of unique gene IDs using xmlpipdb ''match'':
java -jar xmlpipedb-match-1.1.1.jar "CTA_[0-9][0-9][0-9][0-9]" < Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
:*'''Total unique matches = 911'''
*Counted the number of unique gene IDs using an SQL query:
select count(*) from genenametype where type = 'ordered locus' and value ~ 'CTA_[0-9][0-9][0-9][0-9]';
:*'''Count - 917'''
*Opened the database in Access. Then, double-clicked on the table "OriginalRowCounts".
{| style = "width:20%; font-size:100%"
| '''Table'''
| '''Rows'''
|-
| OrderedLocusNames
| 919
|-
| UniProt
| 917
|-
| UniProt-OrderedLocusNames
| 919
|}
*Double-clicked on the table "OrderedLocusNames" and scrolled all the way to the bottom. We found that some gene ID's had the format pCTA_####.
*Went back to the command prompt and ran the following command:
java -jar xmlpipedb-match-1.1.1.jar "[p|]CTA_[0-9][0-9][0-9][0-9]" < Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
:*'''Total unique matches = 8'''
:*Adding the results of the two runs with xmlpipedb match, we got 919 unique gene IDs, which is the same number as that in Access.

==Week 14==

===November 24, 2013===

====Ran GenMAPP Using ''C. trachomatis'' Gene Database====

#Opened GenMAPP.
#Made sure that the database was set to Ct-Std_v2_KS_20131121.gdb.
#Imported [[Media:C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt|sample microarray data]] into GenMAPP.
#*Loaded the microarray data by navigating to Expression Dataset Manager > New Dataset.
#*There were 23015 errors.
#Opened the exceptions file that was created to inspect the errors and scrolled down to where the IDs CTA_#### begin.
#*The gene IDs are actually in the format CTA_####_RRMH#####_at.
#Tried to search for a sample ID in the format found in the exceptions file in the UniProt database or any NCBI databases.
#*Could not find CTA_0806_RRMH00303, RRMH00303, CTA_0806_RRMH00303_at, or RRMH00303_at in the UniProt database.
#*Could not find CTA_0806_RRMH00303, RRMH00303, CTA_0806_RRMH00303_at, or RRMH00303_at in any database through NCBI.
#*My suspicion is that the RRMH#####_at portion of the ID is an identifier that is appended to each probe on the Affymetrix microarray when it is read by the company's software.

===November 27-28, 2013===

====Running SQL Queries to Determine Discrepancy in the Count Between the UniProt XML and the Gene Database====

*Ran an SQL query to determine if there were any ordered locus names in the database that do not match the ID "CTA_####".
select * from genenametype where type = 'ordered locus' and not value ~ 'CTA_[0-9][0-9][0-9][0-9]';
:*There were no hits.
*Ran an SQL query to show all the ordered locus names in the database that have the ID "CTA_####".
select * from genenametype where type = 'ordered locus' and value ~ 'CTA_[0-9][0-9][0-9][0-9]';
:*Scrolling through the results, I saw the ID "CTA_0406/CTA_0407/CTA_0408".
:*The [http://www.uniprot.org/manual/gene_name UniProt manual] stated that ordered locus names that are connected by a slash indicate two predicted genes that are actually one gene.
:*The combination of these IDs explain why two less IDs show up in the count by the Tally Engine in comparison to the SQL query.
*Opened the gene database within Access and filtered the ordered locus names to show only those that match the ID "CTA_0407".
:*The ID appeared on its own without being joined to other ID's with slashes.

====Testing Report for Finalized Gene Database====

Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131119_32bit_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_11
::*Original file name from [http://www.uniprot.org/uniprot/?query=organism:315277+keyword:181 UniProt site]: uniprot-organism%3A315277+keyword%3A181.xml
* Time taken to import: 1.20 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): 11/06/2013
::*Original file name: go_daily-termdb.obo-xml.gz
* Time taken to import: 13.05 min
* Time taken to process: 10.80 min

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): 11/12/13
::*Original file name: 22183.C_trachomatis_A.goa
* Time taken to import: 0.04 min

*Name of .gdb file: Ct-Std_v2_KS_20131121.gdb
*Time taken to export .gdb:
:*Start Time: 10:56 AM (approximately)
:*End Time: 11:14:26 AM
*Upload your file and link to it here: [[Media:Ct-Std v2 KS 20131121.gdb|Ct-Std v2 KS 20131121.gdb]]

Note: There is one ID "CTA_0406/CTA_0407/CTA_0408" that is a combination of three predicted genes that are actually one gene. The Tally Engine counts this as one gene that contributes to the total count of 917. However, through Access and xmlpipedb ''match'', the ID is separated into three separate genes bringing the total count to 919.

==Week 15==

===December 3, 2013===

*Worked with Hilda to separate the gene IDs from the Affymetrix IDs appended to them in the microarray data.
*Worked with Dillon to run GenMAPP and MAPPFinder with the microarray data and the gene database created for ''C. trachomatis''.

===December 5, 2013===

====Customize Tally Engine for ''C. trachomatis''====

*Opened Eclipse.
*Double-clicked xmlpipedb-gmbuilder > src > edu.lmu.xmlpipedb.gmbuilder.resource.properties > gmbuilder.properties.
*Found the following part of the code:
#
# wizard.properties
#
*Before this part of the code, added the following lines of code to specify which gene IDs to find in the UniProt XML file:
# Chlamydia trachomatis
chlamydiatrachomatis_level_amount=1

chlamydiatrachomatis_element_level0=uniprot/entry/gene/name&type&ordered locus

chlamydiatrachomatis_query_level0=select count(*) from genenametype where type = 'ordered locus';

chlamydiatrachomatis_table_name_level0=Ordered Locus
*Saved the changes and built a new version of gmbuilder.
*Opened gmbuilder and set the database to CT_KS_20131119_32bit_gmb2b71.
*Ran TallyEngine.
*Got the same counts as those from November 21, 2013 despite the fact that I received an error that the species specified in the added code could not be found. However, the error also specified that the correct species ID is chlamydiatrachomatisservoara.
*Accordingly, I went back to Eclipse and modified the code added to gmbuilder.properties:
# Chlamydia trachomatis
chlamydiatrachomatisserovara_level_amount=1

chlamydiatrachomatisserovara_element_level0=uniprot/entry/gene/name&type&ordered locus

chlamydiatrachomatisserovara_query_level0=select count(*) from genenametype where type = 'ordered locus';

chlamydiatrachomatisserovara_table_name_level0=Ordered Locus
*Built a distribution version of gmbuilder and then ran Tally Engine again. I got the same counts but no error message about the species ID as before.
*Went back to Eclipse. Synchronized the code on my computer with the code on SourceForge.
*Built a new distribution of gmbuilder.
*Synchronized again.
*Committed the changes I made to the gmbuilder.properties code.

[[Category:Journal Entry]]
[[Category:Individual Homework]]

Ksherbina Project Notebook

2013-12-06T06:29:48Z

Ksherbina: /* December 5, 2013 */ Recorded some of the steps taken to customize the Tally Engine

{{Ksherbina}}

==Week 12==

===Starting the Set Up of the Testing Environment===

#Downloaded the following software onto my personal computer:
#*Latest version of GenMAPP Builder (gmbuilder2.0-b71) from [http://sourceforge.net/projects/xmlpipedb/files/?source=navbar SourceForge]
#*[http://www.oracle.com/technetwork/java/javase/downloads/index.html Java SE 7u45]
#*[http://www.eclipse.org/downloads/ Eclipse IDE for Java EE Developers]
#Created an account on [http://sourceforge.net/ SourceForge].

===Import/Export of Gene Database===

#Downloaded the [http://www.uniprot.org/uniprot/?query=organism:315277+keyword:181 UniProt XML] for ''C. trachomatis'' serovar A strain HAR-13: [[Media:Uniprot XML C.trachomatis serovar A KS 20131114.xml|Uniprot XML C.trachomatis serovar A KS 20131114.xml]]
#Downloaded the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ GOA file] for ''C. trachomatis'' serovar A: [[Media:22183.C trachomatis A KS 20131114.goa|22183.C trachomatis A KS 20131114.goa]]
#Created a new database in pgAdminIII: CT_KS_20131114_gmb2b71
#*Created tables in the database by executing the script in the file ''gmbuilder.sql''.
#Launched ''gmbuilder-32bit.bat'':
#Configured the database in gmbuilder:
#*Host or address: localhost
#*Port number: 5432
#*Database name: CT_KS_20131114_gmb2b71
#*Username: postgres
#*Password: <password of the PostgreSQL database created above>
#Imported the UniProt XML file into the PostgreSQL database through gmbuilder.
#Imported the OBO-XML file into the PostgreSQL database through gmbuilder.
#*gmbuilder was not able to completely process the data. See the testing report below for details.
#Imported the GOA file into into the PostgreSQL database through gmbuilder.
#*I did not begin the actual export of the database due to the problem with processing the OBO-XML file.

====Testing Report====

Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131114_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_11
* Time taken to import: 1.70 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped):
* Time taken to import: 9.29 min
* Time taken to process: '''Could no complete the process. The following error came up when gmbuilder was processing the 81000 term:'''
ERROR edu.lmu.xmlpipedb.gmbuilder.GenMAPPBuilder - java.util.concurrent.ExecutionException:java.lang.OutOfMemoryError:
Java heap space

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]):
* Time taken to import: 0.02 min

Name of .gdb file:
* Time taken to export .gdb:
* Upload your file and link to it here.

Note:

===Creating Custom Species Profile===

#On one of the class computers in the first row, I launched Eclipse with Subclipse.
#Went to Window > Open Perspective > Other > SVN Repository Exploring.
#Defined a new subversion by clicking on ''Add Repository'' and adding the URL https://svn.code.sf.net/p/xmlpipedb/code. Clicked Finish.
#Double-clicked on trunk, right-clicked on gmbuilder, and chose Checkout....
#Created a new project (chose Check out as a project configured using the New Project Wizard).
#Went to Window > Preferences > Java > Installed JREs. Added a new environment with JDK.
#Double-clicked on the project folder.
#Double-clicked on the lib folder.
#Shift-clicked and selected all files in the lib folder and then control-clicked on every file inside the lib folder whose name does not end in .jar. Right-click on one of them and choose Build Path > Add to Build Path from the popup menu that comes up.
#Created a species profile called ChlamydiaTrachomatisSerovarA.
#*Dondi modified the name to ChlamydiaTrachomatisSerovarAUniProtSpeciesProfile.
#Customized the species profile.
#*Opened the new profile and added the lines of code as specified in [[Coder|"Customize the Species Profile"]] directions.
#**Still need to insert the species-specific URL that returns a web page describing a gene for that species.
#Committed the profile following the directions in [[Coder|"Updating and Committing Code"]].

===Reflection===

#What were the week’s key accomplishments?
#*One of this week's key accomplishments was downloading the files necessary to run a import/export gene database cycle and the microarray files that will be analyzed. Also this week, we were able to start an import/export cycle.
#What are next week’s target accomplishments?
#*Next week, I would like to complete an import/export cycle and vet it with the QA person. In addition, I hope that we can download the software necessary to open the microarray files, format the files, and perform statistical analysis.
#What team strengths were seen this week?
#*I believe our strength this week as a team was meeting outside of class to create milestones. This meeting results in a detailed calendar of intermediate milestones that we plan to meet accomplish in order to meet the file deadline.
#What team weaknesses were seen this week?
#*It was difficult to set up the aforementioned meeting due to our busy schedules. Additionally, we did not have a clear plan for working on the project until the middle/end of the week. In the coming weeks, our goal will be to improve our work plan for each class period so as to minimize the difficulty of scheduling work meetings outside of class.

==Week 13==

===November 15, 2013===

====Import/Export Gene Database====

After the error that came up last week regarding the Java heap space, I performed the following steps to increase the heap space:
#Open up ''gmbuilder-32bit.bat'' in Notepad.
#Find the line
"C:\Program Files (x86)\Java\jre7\bin\java" -D32 -Xmx1024m -jar gmbuilder.jar
#Changed "-Xmx1024m" to "-Xmx2048m" thereby increasing the maximum heap space from 1024 MB to 2048 MB.
#Saved the file and closed it.

*Then, I tried the import/export again. I could not open ''gmbuilder-32bit.bat'' after I changed the maximum heap space. I repeated the above steps for ''gmbuilder.bat'' and found that I could then still open the program.
*Opening ''gmbuilder.bat'', I imported both the UniProt XML file and the OBO-XML file. However, I received a Java heap space error again when processing the OBO-XML file.
*I increased the heap space in ''gmbuilder.bat'' from 2048 MB to 4096 MB and tried the import/export cycle again. When this did not work, I increased the heap space in ''gmbuilder.bat'' to 8192 MB.
*Then, I was able to import the UniProt XML, OBO-XML, and GOA files as well as process the OBO-XML.
*However, I was not able to begin the export because an error popped up stating that an export to a GenMAPP database is only possible in ''gmbuilder-32bit.bat''.

=====Testing Report=====

Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131114_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]):
* Time taken to import: 0.73 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped):
* Time taken to import: 7.39 min
* Time taken to process: 47.15 min

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]):
* Time taken to import: 0.03 min

Name of .gdb file:
* Time taken to export .gdb:
* Upload your file and link to it here.

Note: '''Could not export the database. Received an error stating that an export to a GenMAPP database is only possible in ''gmbuilder-32bit.bat''.'''

====Troubleshooting====

#Ran cmd from the Start menu.
#cd to where GenMAPP Builder is located.
#Typed the same command that is in the .bat file, but directly at the prompt:
"C:\Program Files (x86)\Java\jre7\bin\java" -D32 -Xmx1536m -jar gmbuilder.jar

Received the following error:
Error occurred during initialization of VM.
Could not reserve enough space for project heap.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit

===November 19, 2013===

====Rerunning the Gene Database Import/Export====

*From November 11/15/2013, the UniProt XML, OBO-XML, and GOA files were imported and the OBO-XML file was processed the in 64-bit ''gmbuilder.bat'' with the maximum heap space set to 8192.
*Performed the export in 32-bit ''gmbuilder.bat''.

=====Testing Report=====

Version of GenMAPP Builder: Same as in previous testing report

Computer on which export was run: Same as in previous testing report

Postgres Database name: Same as in previous testing report

UniProt XML filename: Same as in previous testing report
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]):
* Time taken to import: 1.10 min

GO OBO-XML filename: Same as in previous testing report
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped):
* Time taken to import: 8.96 min
* Time taken to process: Did not reprocess because received a message stating that the GO data had already been processed.

GOA filename: Same as in previous testing report
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]):
* Time taken to import: 0.03 min

Name of .gdb file: Ct-Std v1 KS 20131119.gdb
* Time taken to export .gdb:
:*Start Time: 4:01:10 AM
:*End Time: 6:51:58 AM
* Upload your file and link to it here: [[Media:Ct-Std v1 KS 20131119.gdb|Ct-Std v1 KS 20131119.gdb]]

Note:

====Checking the Quality of the Exported Database====

#In ''gmbuilder-32bit.bat'', chose Run XML > Database Tallies for UniProt and GO....
:*The XML count for the UniProt file was 917 while the Database Count for the Uniprot file was 10087. This number discrepancy is likely due to importing the UniProt, OBO-XML, and GOA files multiple times in experimenting with the heap space threshold.
:*A new database must now be created in pgAdminIII in order to repeat the export/import gene database cycle.

===November 20-21, 2013===

====Finishing Installing Software to Set Up the Development Environment====

#Navigated to the "Download and Install" tab on the [http://subclipse.tigris.org/ Subclipse home page]
#Copied the Eclipse update site URL for the 1.10.x Release.
#Opened up Eclipse.
#Help > Install New Software...
#Pasted the URL into the "Work with" field.
#Checked the box next to "Subclipse" and "SVNKit".
#Clicked on Next and went through the process of installing the software.

====Repeating the Gene Database Import/Export Cycle====

#Created a new database in pgAdminIII.
#*Created tables in the database by executing the script in the file gmbuilder.sql.
#Since the aforementioned error occurred because duplicates were introduced into the database through repeated imports, I decided to try this new import-export cycle using 32-bit ''gmbuilder''.
#Opened ''gmbuilder-32bit.bat'' with the heap space set to 1024 MB and configured the new database.
#*Host or address: localhost
#*Port number: 5432
#*Database name: CT_KS_20131119_32bit_gmb2b71
#*Username: postgres
#*Password: <password of the PostgreSQL database created above>
#Imported the UniProt XML file into PostgreSQL through ''gmbuilder-32bit.bat''.
#Imported the OBO-XML file into PostgreSQL through ''gmbuilder-32bit.bat''.
#Processed the OBO-XML file through ''gmbuilder-32bit.bat''.
#Imported the GOA file into PostgreSQL through ''gmbuilder-32bit.bat''.

=====Testing Report=====
Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131119_32bit_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_11
::*Original file name from [http://www.uniprot.org/uniprot/?query=organism:315277+keyword:181 UniProt site]: uniprot-organism%3A315277+keyword%3A181.xml
* Time taken to import: 1.20 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): 11/06/2013
::*Original file name: go_daily-termdb.obo-xml.gz
* Time taken to import: 13.05 min
* Time taken to process: 10.80 min

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): 11/12/13
::*Original file name: 22183.C_trachomatis_A.goa
* Time taken to import: 0.04 min

Name of .gdb file: Ct-Std KS 20131121.gdb
* Time taken to export .gdb:
:*Start Time: 2:37:42 AM
:*End Time: 2:55:45 AM
* Upload your file and link to it here: [[Media:Ct-Std KS 20131121.gdb|Ct-Std KS 20131121.gdb]]

Note: '''There were no heap space errors this time, which is good. The errors probably occurred in the first place because of repeatedly importing the files.'''

===November 21, 2013===

====Revised the Custom Species Profile====

#Ran Eclipse.
#Navigated to Window > Open Perspective > Other... and chose SVN Repository Exploring
#Defined a new Subversion repository by clicking on the Add Repository button.
#Set the URL to https://svn.code.sf.net/p/xmlpipedb/code then clicked Finish.
#Double-cliked on https://svn.code.sf.net/p/xmlpipedb/code in the list.
#Double-clicked on ''trunk''.
#Right-clicked on ''gmbuilder'' and chose ''Checkout....''
#Chose ''Check out as a project configured using the New Project Wizard'' then click ''Finish''.
#In the New Project dialog that opened, chose Java Project from the list and then clicked ''Next >''.
#Entered the new project name ''xmlpipedb-gmbuilder'' and clicked ''Finish''.
#Set Eclipse to use JDK.
#*Navigated to Windows > Preferences and clicked on the ''Search'' button.
#*Expanded Computer > Local Disk (C:) > Program Files > Java > jdk1.7.0_45 and clicked OK.
#*Highlighted the row ''jdk'' and clicked on ''Edit''.
#*In "JRE home:" field, removed everything in the name after jdk1.7.0_45 and then clicked ''Finish''.
#Made sure that I was in the Java perspective by navigating to Window > Open Perspective > Other... and then choosing Java.
#Double-clicked on the ''xmlpipedb-gmbuilder'' folder.
#Double-clicked on the ''lib'' folder.
#Shift-clicked and selected all files in the lib folder and then control-clicked on every file inside the lib folder whose name did not end in .jar.
#Right-clicked on one of the files and choose Build Path > Add to Build Path.
#Check that the ''src'' folder was set to the source folder by right-clicking on it and then clicking on ''Build Path''.
#Right-clicked on the ''test'' folder and chose Build Path > Use as Source Folder. Then, made sure that there were no red x's in the list of folders.
#Right-clicked on ''build.xml'' toward the bottom of the list. Then, navigated to Team > Synchronize with Repository.
#Double-clicked on src > edu.lmu.xmlpipedb.gmbuilder.databasetoolkit.profiles > ChlamydiaTrachomatisSerovarAUniProtSpeciesProfile.java.\
#Within the code, added the gene database link pattern: "http://bacteria.ensembl.org/chlamydia_trachomatis_a_har_13/Gene/Summary?g=~" and then saved the code.
#Right-clicked on ''build.xml'' and chose Run As > Ant Build... (the one with the ellipses).
#In the Edit Configuration dialog that appears, checked on the clean and dist items in the Targets tab.
#Clicked on the ''Order...'' button and rearranged the items so that clean is first and dist is second and then clicked OK.
#Clicked the Run button.
#Right-clicked on the ''xmlpipedb-gmbuilder'' project folder and chose Refresh (F5 is its keyboard shortcut).

====Reran the Database Export With the New Build for gmbuilder====

#Made sure that pgAdminIII and the database "CT_KS_20131119_32bit_gmb2b71" is open.
#Opened the newly built ''gmbuilder-32bit.bat'' within the folder [User Folder] > workspace > xmlpipedb-gmbuilder > dist.
#Made sure that the database was configured to "CT_KS_20131119_32bit_gmb2b71".
#Performed a database export.

=====Testing Report=====

*Name of .gdb file: Ct-Std_v2_KS_20131121.gdb
*Time taken to export .gdb:
:*Start Time: 10:56 AM (approximately)
:*End Time: 11:14:26 AM
*Upload your file and link to it here: [[Media:Ct-Std v2 KS 20131121.gdb|Ct-Std v2 KS 20131121.gdb]]

Note:

====Testing the New Database with GenMAPP====
#Opened GenMAPP.
#Clicked on Data > New Gene Database and selected the gene database that was created today (Ct-Std v2 KS 20131121.gdb).
#On the toolbar, clicked on the button that has the word gene in a box.
#Clicked and dragged the cursor on the blank canvas to create a new gene box.
#Right-clicked on the gene box.
#In the pop-up window, typed in "CTA_0588" in the Gene ID search field.
#Under Gene ID System, selected Ordered Locus Names.
#Then, hit the "Search button". Interestingly, GenMAPP builder did not find this gene in the database.
#*To troubleshoot, I opened up the gene database in Access.
#*In the list of tables, I double-clicked on OrderedLocusNames.
#*I sorted the list of gene IDs from A-Z and then scrolled down to try to find "CTA_0588". I was able to find "CTA_0587" and "CTA_0589" but not "CTA_0588".
#Back in GenMAPP, I used the same steps as above to find "CTA_0587".
#"CTA_0587" was found in the database. In the window, I clicked on the link "CTA_0587" under the OrderedLocusNames section.
#*This link took me to the entry for this gene on the Model Organism Database (http://bacteria.ensembl.org/chlamydia_trachomatis_a_har_13/Gene/Summary?g=CTA_0587).

====Performing Quality Assurance on the New Database====

*Performed counts with Tally Engine.
[[Image:Tally Engine Results for Ct-Std v2 KS 20131121.jpg|thumb|none|upright=4]]
*Counted the number of unique gene IDs using xmlpipdb ''match'':
java -jar xmlpipedb-match-1.1.1.jar "CTA_[0-9][0-9][0-9][0-9]" < Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
:*'''Total unique matches = 911'''
*Counted the number of unique gene IDs using an SQL query:
select count(*) from genenametype where type = 'ordered locus' and value ~ 'CTA_[0-9][0-9][0-9][0-9]';
:*'''Count - 917'''
*Opened the database in Access. Then, double-clicked on the table "OriginalRowCounts".
{| style = "width:20%; font-size:100%"
| '''Table'''
| '''Rows'''
|-
| OrderedLocusNames
| 919
|-
| UniProt
| 917
|-
| UniProt-OrderedLocusNames
| 919
|}
*Double-clicked on the table "OrderedLocusNames" and scrolled all the way to the bottom. We found that some gene ID's had the format pCTA_####.
*Went back to the command prompt and ran the following command:
java -jar xmlpipedb-match-1.1.1.jar "[p|]CTA_[0-9][0-9][0-9][0-9]" < Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
:*'''Total unique matches = 8'''
:*Adding the results of the two runs with xmlpipedb match, we got 919 unique gene IDs, which is the same number as that in Access.

==Week 14==

===November 24, 2013===

====Ran GenMAPP Using ''C. trachomatis'' Gene Database====

#Opened GenMAPP.
#Made sure that the database was set to Ct-Std_v2_KS_20131121.gdb.
#Imported [[Media:C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt|sample microarray data]] into GenMAPP.
#*Loaded the microarray data by navigating to Expression Dataset Manager > New Dataset.
#*There were 23015 errors.
#Opened the exceptions file that was created to inspect the errors and scrolled down to where the IDs CTA_#### begin.
#*The gene IDs are actually in the format CTA_####_RRMH#####_at.
#Tried to search for a sample ID in the format found in the exceptions file in the UniProt database or any NCBI databases.
#*Could not find CTA_0806_RRMH00303, RRMH00303, CTA_0806_RRMH00303_at, or RRMH00303_at in the UniProt database.
#*Could not find CTA_0806_RRMH00303, RRMH00303, CTA_0806_RRMH00303_at, or RRMH00303_at in any database through NCBI.
#*My suspicion is that the RRMH#####_at portion of the ID is an identifier that is appended to each probe on the Affymetrix microarray when it is read by the company's software.

===November 27-28, 2013===

====Running SQL Queries to Determine Discrepancy in the Count Between the UniProt XML and the Gene Database====

*Ran an SQL query to determine if there were any ordered locus names in the database that do not match the ID "CTA_####".
select * from genenametype where type = 'ordered locus' and not value ~ 'CTA_[0-9][0-9][0-9][0-9]';
:*There were no hits.
*Ran an SQL query to show all the ordered locus names in the database that have the ID "CTA_####".
select * from genenametype where type = 'ordered locus' and value ~ 'CTA_[0-9][0-9][0-9][0-9]';
:*Scrolling through the results, I saw the ID "CTA_0406/CTA_0407/CTA_0408".
:*The [http://www.uniprot.org/manual/gene_name UniProt manual] stated that ordered locus names that are connected by a slash indicate two predicted genes that are actually one gene.
:*The combination of these IDs explain why two less IDs show up in the count by the Tally Engine in comparison to the SQL query.
*Opened the gene database within Access and filtered the ordered locus names to show only those that match the ID "CTA_0407".
:*The ID appeared on its own without being joined to other ID's with slashes.

====Testing Report for Finalized Gene Database====

Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131119_32bit_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_11
::*Original file name from [http://www.uniprot.org/uniprot/?query=organism:315277+keyword:181 UniProt site]: uniprot-organism%3A315277+keyword%3A181.xml
* Time taken to import: 1.20 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): 11/06/2013
::*Original file name: go_daily-termdb.obo-xml.gz
* Time taken to import: 13.05 min
* Time taken to process: 10.80 min

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): 11/12/13
::*Original file name: 22183.C_trachomatis_A.goa
* Time taken to import: 0.04 min

*Name of .gdb file: Ct-Std_v2_KS_20131121.gdb
*Time taken to export .gdb:
:*Start Time: 10:56 AM (approximately)
:*End Time: 11:14:26 AM
*Upload your file and link to it here: [[Media:Ct-Std v2 KS 20131121.gdb|Ct-Std v2 KS 20131121.gdb]]

Note: There is one ID "CTA_0406/CTA_0407/CTA_0408" that is a combination of three predicted genes that are actually one gene. The Tally Engine counts this as one gene that contributes to the total count of 917. However, through Access and xmlpipedb ''match'', the ID is separated into three separate genes bringing the total count to 919.

==Week 15==

===December 3, 2013===

*Worked with Hilda to separate the gene IDs from the Affymetrix IDs appended to them in the microarray data.
*Worked with Dillon to run GenMAPP and MAPPFinder with the microarray data and the gene database created for ''C. trachomatis''.

===December 5, 2013===

====Customize Tally Engine for ''C. trachomatis''====

*Opened Eclipse.
*Double-clicked xmlpipedb-gmbuilder > src > edu.lmu.xmlpipedb.gmbuilder.resource.properties > gmbuilder.properties.
*Found the following part of the code:
#
# wizard.properties
#
*Before this part of the code, added the following lines of code to specify which gene IDs to find in the UniProt XML file:
# Chlamydia trachomatis
chlamydiatrachomatis_level_amount=1

chlamydiatrachomatis_element_level0=uniprot/entry/gene/name&type&ordered locus

chlamydiatrachomatis_query_level0=select count(*) from genenametype where type = 'ordered locus';

chlamydiatrachomatis_table_name_level0=Ordered Locus
*Saved the changes and built a new version of gmbuilder.
*Opened gmbuilder and set the database to CT_KS_20131119_32bit_gmb2b71.
*Ran TallyEngine.
*Got the same results as those from November 21, 2013 despite the fact that I received an error that the species specified in the added code could not be found. However, the error also specified that the correct species ID is chlamydiatrachomatisservoara.
*Accordingly, I went back to Eclipse and modified the code added to gmbuilder.properties:
# Chlamydia trachomatis
chlamydiatrachomatisserovara_level_amount=1

chlamydiatrachomatisserovara_element_level0=uniprot/entry/gene/name&type&ordered locus

chlamydiatrachomatisserovara_query_level0=select count(*) from genenametype where type = 'ordered locus';

chlamydiatrachomatisserovara_table_name_level0=Ordered Locus

[[Category:Journal Entry]]
[[Category:Individual Homework]]

Ksherbina Project Notebook

2013-12-06T05:58:20Z

Ksherbina: /* Week 15 */ Created another section

{{Ksherbina}}

==Week 12==

===Starting the Set Up of the Testing Environment===

#Downloaded the following software onto my personal computer:
#*Latest version of GenMAPP Builder (gmbuilder2.0-b71) from [http://sourceforge.net/projects/xmlpipedb/files/?source=navbar SourceForge]
#*[http://www.oracle.com/technetwork/java/javase/downloads/index.html Java SE 7u45]
#*[http://www.eclipse.org/downloads/ Eclipse IDE for Java EE Developers]
#Created an account on [http://sourceforge.net/ SourceForge].

===Import/Export of Gene Database===

#Downloaded the [http://www.uniprot.org/uniprot/?query=organism:315277+keyword:181 UniProt XML] for ''C. trachomatis'' serovar A strain HAR-13: [[Media:Uniprot XML C.trachomatis serovar A KS 20131114.xml|Uniprot XML C.trachomatis serovar A KS 20131114.xml]]
#Downloaded the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ GOA file] for ''C. trachomatis'' serovar A: [[Media:22183.C trachomatis A KS 20131114.goa|22183.C trachomatis A KS 20131114.goa]]
#Created a new database in pgAdminIII: CT_KS_20131114_gmb2b71
#*Created tables in the database by executing the script in the file ''gmbuilder.sql''.
#Launched ''gmbuilder-32bit.bat'':
#Configured the database in gmbuilder:
#*Host or address: localhost
#*Port number: 5432
#*Database name: CT_KS_20131114_gmb2b71
#*Username: postgres
#*Password: <password of the PostgreSQL database created above>
#Imported the UniProt XML file into the PostgreSQL database through gmbuilder.
#Imported the OBO-XML file into the PostgreSQL database through gmbuilder.
#*gmbuilder was not able to completely process the data. See the testing report below for details.
#Imported the GOA file into into the PostgreSQL database through gmbuilder.
#*I did not begin the actual export of the database due to the problem with processing the OBO-XML file.

====Testing Report====

Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131114_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_11
* Time taken to import: 1.70 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped):
* Time taken to import: 9.29 min
* Time taken to process: '''Could no complete the process. The following error came up when gmbuilder was processing the 81000 term:'''
ERROR edu.lmu.xmlpipedb.gmbuilder.GenMAPPBuilder - java.util.concurrent.ExecutionException:java.lang.OutOfMemoryError:
Java heap space

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]):
* Time taken to import: 0.02 min

Name of .gdb file:
* Time taken to export .gdb:
* Upload your file and link to it here.

Note:

===Creating Custom Species Profile===

#On one of the class computers in the first row, I launched Eclipse with Subclipse.
#Went to Window > Open Perspective > Other > SVN Repository Exploring.
#Defined a new subversion by clicking on ''Add Repository'' and adding the URL https://svn.code.sf.net/p/xmlpipedb/code. Clicked Finish.
#Double-clicked on trunk, right-clicked on gmbuilder, and chose Checkout....
#Created a new project (chose Check out as a project configured using the New Project Wizard).
#Went to Window > Preferences > Java > Installed JREs. Added a new environment with JDK.
#Double-clicked on the project folder.
#Double-clicked on the lib folder.
#Shift-clicked and selected all files in the lib folder and then control-clicked on every file inside the lib folder whose name does not end in .jar. Right-click on one of them and choose Build Path > Add to Build Path from the popup menu that comes up.
#Created a species profile called ChlamydiaTrachomatisSerovarA.
#*Dondi modified the name to ChlamydiaTrachomatisSerovarAUniProtSpeciesProfile.
#Customized the species profile.
#*Opened the new profile and added the lines of code as specified in [[Coder|"Customize the Species Profile"]] directions.
#**Still need to insert the species-specific URL that returns a web page describing a gene for that species.
#Committed the profile following the directions in [[Coder|"Updating and Committing Code"]].

===Reflection===

#What were the week’s key accomplishments?
#*One of this week's key accomplishments was downloading the files necessary to run a import/export gene database cycle and the microarray files that will be analyzed. Also this week, we were able to start an import/export cycle.
#What are next week’s target accomplishments?
#*Next week, I would like to complete an import/export cycle and vet it with the QA person. In addition, I hope that we can download the software necessary to open the microarray files, format the files, and perform statistical analysis.
#What team strengths were seen this week?
#*I believe our strength this week as a team was meeting outside of class to create milestones. This meeting results in a detailed calendar of intermediate milestones that we plan to meet accomplish in order to meet the file deadline.
#What team weaknesses were seen this week?
#*It was difficult to set up the aforementioned meeting due to our busy schedules. Additionally, we did not have a clear plan for working on the project until the middle/end of the week. In the coming weeks, our goal will be to improve our work plan for each class period so as to minimize the difficulty of scheduling work meetings outside of class.

==Week 13==

===November 15, 2013===

====Import/Export Gene Database====

After the error that came up last week regarding the Java heap space, I performed the following steps to increase the heap space:
#Open up ''gmbuilder-32bit.bat'' in Notepad.
#Find the line
"C:\Program Files (x86)\Java\jre7\bin\java" -D32 -Xmx1024m -jar gmbuilder.jar
#Changed "-Xmx1024m" to "-Xmx2048m" thereby increasing the maximum heap space from 1024 MB to 2048 MB.
#Saved the file and closed it.

*Then, I tried the import/export again. I could not open ''gmbuilder-32bit.bat'' after I changed the maximum heap space. I repeated the above steps for ''gmbuilder.bat'' and found that I could then still open the program.
*Opening ''gmbuilder.bat'', I imported both the UniProt XML file and the OBO-XML file. However, I received a Java heap space error again when processing the OBO-XML file.
*I increased the heap space in ''gmbuilder.bat'' from 2048 MB to 4096 MB and tried the import/export cycle again. When this did not work, I increased the heap space in ''gmbuilder.bat'' to 8192 MB.
*Then, I was able to import the UniProt XML, OBO-XML, and GOA files as well as process the OBO-XML.
*However, I was not able to begin the export because an error popped up stating that an export to a GenMAPP database is only possible in ''gmbuilder-32bit.bat''.

=====Testing Report=====

Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131114_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]):
* Time taken to import: 0.73 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped):
* Time taken to import: 7.39 min
* Time taken to process: 47.15 min

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]):
* Time taken to import: 0.03 min

Name of .gdb file:
* Time taken to export .gdb:
* Upload your file and link to it here.

Note: '''Could not export the database. Received an error stating that an export to a GenMAPP database is only possible in ''gmbuilder-32bit.bat''.'''

====Troubleshooting====

#Ran cmd from the Start menu.
#cd to where GenMAPP Builder is located.
#Typed the same command that is in the .bat file, but directly at the prompt:
"C:\Program Files (x86)\Java\jre7\bin\java" -D32 -Xmx1536m -jar gmbuilder.jar

Received the following error:
Error occurred during initialization of VM.
Could not reserve enough space for project heap.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit

===November 19, 2013===

====Rerunning the Gene Database Import/Export====

*From November 11/15/2013, the UniProt XML, OBO-XML, and GOA files were imported and the OBO-XML file was processed the in 64-bit ''gmbuilder.bat'' with the maximum heap space set to 8192.
*Performed the export in 32-bit ''gmbuilder.bat''.

=====Testing Report=====

Version of GenMAPP Builder: Same as in previous testing report

Computer on which export was run: Same as in previous testing report

Postgres Database name: Same as in previous testing report

UniProt XML filename: Same as in previous testing report
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]):
* Time taken to import: 1.10 min

GO OBO-XML filename: Same as in previous testing report
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped):
* Time taken to import: 8.96 min
* Time taken to process: Did not reprocess because received a message stating that the GO data had already been processed.

GOA filename: Same as in previous testing report
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]):
* Time taken to import: 0.03 min

Name of .gdb file: Ct-Std v1 KS 20131119.gdb
* Time taken to export .gdb:
:*Start Time: 4:01:10 AM
:*End Time: 6:51:58 AM
* Upload your file and link to it here: [[Media:Ct-Std v1 KS 20131119.gdb|Ct-Std v1 KS 20131119.gdb]]

Note:

====Checking the Quality of the Exported Database====

#In ''gmbuilder-32bit.bat'', chose Run XML > Database Tallies for UniProt and GO....
:*The XML count for the UniProt file was 917 while the Database Count for the Uniprot file was 10087. This number discrepancy is likely due to importing the UniProt, OBO-XML, and GOA files multiple times in experimenting with the heap space threshold.
:*A new database must now be created in pgAdminIII in order to repeat the export/import gene database cycle.

===November 20-21, 2013===

====Finishing Installing Software to Set Up the Development Environment====

#Navigated to the "Download and Install" tab on the [http://subclipse.tigris.org/ Subclipse home page]
#Copied the Eclipse update site URL for the 1.10.x Release.
#Opened up Eclipse.
#Help > Install New Software...
#Pasted the URL into the "Work with" field.
#Checked the box next to "Subclipse" and "SVNKit".
#Clicked on Next and went through the process of installing the software.

====Repeating the Gene Database Import/Export Cycle====

#Created a new database in pgAdminIII.
#*Created tables in the database by executing the script in the file gmbuilder.sql.
#Since the aforementioned error occurred because duplicates were introduced into the database through repeated imports, I decided to try this new import-export cycle using 32-bit ''gmbuilder''.
#Opened ''gmbuilder-32bit.bat'' with the heap space set to 1024 MB and configured the new database.
#*Host or address: localhost
#*Port number: 5432
#*Database name: CT_KS_20131119_32bit_gmb2b71
#*Username: postgres
#*Password: <password of the PostgreSQL database created above>
#Imported the UniProt XML file into PostgreSQL through ''gmbuilder-32bit.bat''.
#Imported the OBO-XML file into PostgreSQL through ''gmbuilder-32bit.bat''.
#Processed the OBO-XML file through ''gmbuilder-32bit.bat''.
#Imported the GOA file into PostgreSQL through ''gmbuilder-32bit.bat''.

=====Testing Report=====
Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131119_32bit_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_11
::*Original file name from [http://www.uniprot.org/uniprot/?query=organism:315277+keyword:181 UniProt site]: uniprot-organism%3A315277+keyword%3A181.xml
* Time taken to import: 1.20 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): 11/06/2013
::*Original file name: go_daily-termdb.obo-xml.gz
* Time taken to import: 13.05 min
* Time taken to process: 10.80 min

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): 11/12/13
::*Original file name: 22183.C_trachomatis_A.goa
* Time taken to import: 0.04 min

Name of .gdb file: Ct-Std KS 20131121.gdb
* Time taken to export .gdb:
:*Start Time: 2:37:42 AM
:*End Time: 2:55:45 AM
* Upload your file and link to it here: [[Media:Ct-Std KS 20131121.gdb|Ct-Std KS 20131121.gdb]]

Note: '''There were no heap space errors this time, which is good. The errors probably occurred in the first place because of repeatedly importing the files.'''

===November 21, 2013===

====Revised the Custom Species Profile====

#Ran Eclipse.
#Navigated to Window > Open Perspective > Other... and chose SVN Repository Exploring
#Defined a new Subversion repository by clicking on the Add Repository button.
#Set the URL to https://svn.code.sf.net/p/xmlpipedb/code then clicked Finish.
#Double-cliked on https://svn.code.sf.net/p/xmlpipedb/code in the list.
#Double-clicked on ''trunk''.
#Right-clicked on ''gmbuilder'' and chose ''Checkout....''
#Chose ''Check out as a project configured using the New Project Wizard'' then click ''Finish''.
#In the New Project dialog that opened, chose Java Project from the list and then clicked ''Next >''.
#Entered the new project name ''xmlpipedb-gmbuilder'' and clicked ''Finish''.
#Set Eclipse to use JDK.
#*Navigated to Windows > Preferences and clicked on the ''Search'' button.
#*Expanded Computer > Local Disk (C:) > Program Files > Java > jdk1.7.0_45 and clicked OK.
#*Highlighted the row ''jdk'' and clicked on ''Edit''.
#*In "JRE home:" field, removed everything in the name after jdk1.7.0_45 and then clicked ''Finish''.
#Made sure that I was in the Java perspective by navigating to Window > Open Perspective > Other... and then choosing Java.
#Double-clicked on the ''xmlpipedb-gmbuilder'' folder.
#Double-clicked on the ''lib'' folder.
#Shift-clicked and selected all files in the lib folder and then control-clicked on every file inside the lib folder whose name did not end in .jar.
#Right-clicked on one of the files and choose Build Path > Add to Build Path.
#Check that the ''src'' folder was set to the source folder by right-clicking on it and then clicking on ''Build Path''.
#Right-clicked on the ''test'' folder and chose Build Path > Use as Source Folder. Then, made sure that there were no red x's in the list of folders.
#Right-clicked on ''build.xml'' toward the bottom of the list. Then, navigated to Team > Synchronize with Repository.
#Double-clicked on src > edu.lmu.xmlpipedb.gmbuilder.databasetoolkit.profiles > ChlamydiaTrachomatisSerovarAUniProtSpeciesProfile.java.\
#Within the code, added the gene database link pattern: "http://bacteria.ensembl.org/chlamydia_trachomatis_a_har_13/Gene/Summary?g=~" and then saved the code.
#Right-clicked on ''build.xml'' and chose Run As > Ant Build... (the one with the ellipses).
#In the Edit Configuration dialog that appears, checked on the clean and dist items in the Targets tab.
#Clicked on the ''Order...'' button and rearranged the items so that clean is first and dist is second and then clicked OK.
#Clicked the Run button.
#Right-clicked on the ''xmlpipedb-gmbuilder'' project folder and chose Refresh (F5 is its keyboard shortcut).

====Reran the Database Export With the New Build for gmbuilder====

#Made sure that pgAdminIII and the database "CT_KS_20131119_32bit_gmb2b71" is open.
#Opened the newly built ''gmbuilder-32bit.bat'' within the folder [User Folder] > workspace > xmlpipedb-gmbuilder > dist.
#Made sure that the database was configured to "CT_KS_20131119_32bit_gmb2b71".
#Performed a database export.

=====Testing Report=====

*Name of .gdb file: Ct-Std_v2_KS_20131121.gdb
*Time taken to export .gdb:
:*Start Time: 10:56 AM (approximately)
:*End Time: 11:14:26 AM
*Upload your file and link to it here: [[Media:Ct-Std v2 KS 20131121.gdb|Ct-Std v2 KS 20131121.gdb]]

Note:

====Testing the New Database with GenMAPP====
#Opened GenMAPP.
#Clicked on Data > New Gene Database and selected the gene database that was created today (Ct-Std v2 KS 20131121.gdb).
#On the toolbar, clicked on the button that has the word gene in a box.
#Clicked and dragged the cursor on the blank canvas to create a new gene box.
#Right-clicked on the gene box.
#In the pop-up window, typed in "CTA_0588" in the Gene ID search field.
#Under Gene ID System, selected Ordered Locus Names.
#Then, hit the "Search button". Interestingly, GenMAPP builder did not find this gene in the database.
#*To troubleshoot, I opened up the gene database in Access.
#*In the list of tables, I double-clicked on OrderedLocusNames.
#*I sorted the list of gene IDs from A-Z and then scrolled down to try to find "CTA_0588". I was able to find "CTA_0587" and "CTA_0589" but not "CTA_0588".
#Back in GenMAPP, I used the same steps as above to find "CTA_0587".
#"CTA_0587" was found in the database. In the window, I clicked on the link "CTA_0587" under the OrderedLocusNames section.
#*This link took me to the entry for this gene on the Model Organism Database (http://bacteria.ensembl.org/chlamydia_trachomatis_a_har_13/Gene/Summary?g=CTA_0587).

====Performing Quality Assurance on the New Database====

*Performed counts with Tally Engine.
[[Image:Tally Engine Results for Ct-Std v2 KS 20131121.jpg|thumb|none|upright=4]]
*Counted the number of unique gene IDs using xmlpipdb ''match'':
java -jar xmlpipedb-match-1.1.1.jar "CTA_[0-9][0-9][0-9][0-9]" < Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
:*'''Total unique matches = 911'''
*Counted the number of unique gene IDs using an SQL query:
select count(*) from genenametype where type = 'ordered locus' and value ~ 'CTA_[0-9][0-9][0-9][0-9]';
:*'''Count - 917'''
*Opened the database in Access. Then, double-clicked on the table "OriginalRowCounts".
{| style = "width:20%; font-size:100%"
| '''Table'''
| '''Rows'''
|-
| OrderedLocusNames
| 919
|-
| UniProt
| 917
|-
| UniProt-OrderedLocusNames
| 919
|}
*Double-clicked on the table "OrderedLocusNames" and scrolled all the way to the bottom. We found that some gene ID's had the format pCTA_####.
*Went back to the command prompt and ran the following command:
java -jar xmlpipedb-match-1.1.1.jar "[p|]CTA_[0-9][0-9][0-9][0-9]" < Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
:*'''Total unique matches = 8'''
:*Adding the results of the two runs with xmlpipedb match, we got 919 unique gene IDs, which is the same number as that in Access.

==Week 14==

===November 24, 2013===

====Ran GenMAPP Using ''C. trachomatis'' Gene Database====

#Opened GenMAPP.
#Made sure that the database was set to Ct-Std_v2_KS_20131121.gdb.
#Imported [[Media:C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt|sample microarray data]] into GenMAPP.
#*Loaded the microarray data by navigating to Expression Dataset Manager > New Dataset.
#*There were 23015 errors.
#Opened the exceptions file that was created to inspect the errors and scrolled down to where the IDs CTA_#### begin.
#*The gene IDs are actually in the format CTA_####_RRMH#####_at.
#Tried to search for a sample ID in the format found in the exceptions file in the UniProt database or any NCBI databases.
#*Could not find CTA_0806_RRMH00303, RRMH00303, CTA_0806_RRMH00303_at, or RRMH00303_at in the UniProt database.
#*Could not find CTA_0806_RRMH00303, RRMH00303, CTA_0806_RRMH00303_at, or RRMH00303_at in any database through NCBI.
#*My suspicion is that the RRMH#####_at portion of the ID is an identifier that is appended to each probe on the Affymetrix microarray when it is read by the company's software.

===November 27-28, 2013===

====Running SQL Queries to Determine Discrepancy in the Count Between the UniProt XML and the Gene Database====

*Ran an SQL query to determine if there were any ordered locus names in the database that do not match the ID "CTA_####".
select * from genenametype where type = 'ordered locus' and not value ~ 'CTA_[0-9][0-9][0-9][0-9]';
:*There were no hits.
*Ran an SQL query to show all the ordered locus names in the database that have the ID "CTA_####".
select * from genenametype where type = 'ordered locus' and value ~ 'CTA_[0-9][0-9][0-9][0-9]';
:*Scrolling through the results, I saw the ID "CTA_0406/CTA_0407/CTA_0408".
:*The [http://www.uniprot.org/manual/gene_name UniProt manual] stated that ordered locus names that are connected by a slash indicate two predicted genes that are actually one gene.
:*The combination of these IDs explain why two less IDs show up in the count by the Tally Engine in comparison to the SQL query.
*Opened the gene database within Access and filtered the ordered locus names to show only those that match the ID "CTA_0407".
:*The ID appeared on its own without being joined to other ID's with slashes.

====Testing Report for Finalized Gene Database====

Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131119_32bit_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_11
::*Original file name from [http://www.uniprot.org/uniprot/?query=organism:315277+keyword:181 UniProt site]: uniprot-organism%3A315277+keyword%3A181.xml
* Time taken to import: 1.20 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): 11/06/2013
::*Original file name: go_daily-termdb.obo-xml.gz
* Time taken to import: 13.05 min
* Time taken to process: 10.80 min

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): 11/12/13
::*Original file name: 22183.C_trachomatis_A.goa
* Time taken to import: 0.04 min

*Name of .gdb file: Ct-Std_v2_KS_20131121.gdb
*Time taken to export .gdb:
:*Start Time: 10:56 AM (approximately)
:*End Time: 11:14:26 AM
*Upload your file and link to it here: [[Media:Ct-Std v2 KS 20131121.gdb|Ct-Std v2 KS 20131121.gdb]]

Note: There is one ID "CTA_0406/CTA_0407/CTA_0408" that is a combination of three predicted genes that are actually one gene. The Tally Engine counts this as one gene that contributes to the total count of 917. However, through Access and xmlpipedb ''match'', the ID is separated into three separate genes bringing the total count to 919.

==Week 15==

===December 3, 2013===

*Worked with Hilda to separate the gene IDs from the Affymetrix IDs appended to them in the microarray data.
*Worked with Dillon to run GenMAPP and MAPPFinder with the microarray data and the gene database created for ''C. trachomatis''.

===December 5, 2013===

[[Category:Journal Entry]]
[[Category:Individual Homework]]

Ksherbina Project Notebook

2013-12-06T05:57:35Z

Ksherbina: /* Week 15 */ Recorded work done on December 3, 2013

{{Ksherbina}}

==Week 12==

===Starting the Set Up of the Testing Environment===

#Downloaded the following software onto my personal computer:
#*Latest version of GenMAPP Builder (gmbuilder2.0-b71) from [http://sourceforge.net/projects/xmlpipedb/files/?source=navbar SourceForge]
#*[http://www.oracle.com/technetwork/java/javase/downloads/index.html Java SE 7u45]
#*[http://www.eclipse.org/downloads/ Eclipse IDE for Java EE Developers]
#Created an account on [http://sourceforge.net/ SourceForge].

===Import/Export of Gene Database===

#Downloaded the [http://www.uniprot.org/uniprot/?query=organism:315277+keyword:181 UniProt XML] for ''C. trachomatis'' serovar A strain HAR-13: [[Media:Uniprot XML C.trachomatis serovar A KS 20131114.xml|Uniprot XML C.trachomatis serovar A KS 20131114.xml]]
#Downloaded the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ GOA file] for ''C. trachomatis'' serovar A: [[Media:22183.C trachomatis A KS 20131114.goa|22183.C trachomatis A KS 20131114.goa]]
#Created a new database in pgAdminIII: CT_KS_20131114_gmb2b71
#*Created tables in the database by executing the script in the file ''gmbuilder.sql''.
#Launched ''gmbuilder-32bit.bat'':
#Configured the database in gmbuilder:
#*Host or address: localhost
#*Port number: 5432
#*Database name: CT_KS_20131114_gmb2b71
#*Username: postgres
#*Password: <password of the PostgreSQL database created above>
#Imported the UniProt XML file into the PostgreSQL database through gmbuilder.
#Imported the OBO-XML file into the PostgreSQL database through gmbuilder.
#*gmbuilder was not able to completely process the data. See the testing report below for details.
#Imported the GOA file into into the PostgreSQL database through gmbuilder.
#*I did not begin the actual export of the database due to the problem with processing the OBO-XML file.

====Testing Report====

Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131114_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_11
* Time taken to import: 1.70 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped):
* Time taken to import: 9.29 min
* Time taken to process: '''Could no complete the process. The following error came up when gmbuilder was processing the 81000 term:'''
ERROR edu.lmu.xmlpipedb.gmbuilder.GenMAPPBuilder - java.util.concurrent.ExecutionException:java.lang.OutOfMemoryError:
Java heap space

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]):
* Time taken to import: 0.02 min

Name of .gdb file:
* Time taken to export .gdb:
* Upload your file and link to it here.

Note:

===Creating Custom Species Profile===

#On one of the class computers in the first row, I launched Eclipse with Subclipse.
#Went to Window > Open Perspective > Other > SVN Repository Exploring.
#Defined a new subversion by clicking on ''Add Repository'' and adding the URL https://svn.code.sf.net/p/xmlpipedb/code. Clicked Finish.
#Double-clicked on trunk, right-clicked on gmbuilder, and chose Checkout....
#Created a new project (chose Check out as a project configured using the New Project Wizard).
#Went to Window > Preferences > Java > Installed JREs. Added a new environment with JDK.
#Double-clicked on the project folder.
#Double-clicked on the lib folder.
#Shift-clicked and selected all files in the lib folder and then control-clicked on every file inside the lib folder whose name does not end in .jar. Right-click on one of them and choose Build Path > Add to Build Path from the popup menu that comes up.
#Created a species profile called ChlamydiaTrachomatisSerovarA.
#*Dondi modified the name to ChlamydiaTrachomatisSerovarAUniProtSpeciesProfile.
#Customized the species profile.
#*Opened the new profile and added the lines of code as specified in [[Coder|"Customize the Species Profile"]] directions.
#**Still need to insert the species-specific URL that returns a web page describing a gene for that species.
#Committed the profile following the directions in [[Coder|"Updating and Committing Code"]].

===Reflection===

#What were the week’s key accomplishments?
#*One of this week's key accomplishments was downloading the files necessary to run a import/export gene database cycle and the microarray files that will be analyzed. Also this week, we were able to start an import/export cycle.
#What are next week’s target accomplishments?
#*Next week, I would like to complete an import/export cycle and vet it with the QA person. In addition, I hope that we can download the software necessary to open the microarray files, format the files, and perform statistical analysis.
#What team strengths were seen this week?
#*I believe our strength this week as a team was meeting outside of class to create milestones. This meeting results in a detailed calendar of intermediate milestones that we plan to meet accomplish in order to meet the file deadline.
#What team weaknesses were seen this week?
#*It was difficult to set up the aforementioned meeting due to our busy schedules. Additionally, we did not have a clear plan for working on the project until the middle/end of the week. In the coming weeks, our goal will be to improve our work plan for each class period so as to minimize the difficulty of scheduling work meetings outside of class.

==Week 13==

===November 15, 2013===

====Import/Export Gene Database====

After the error that came up last week regarding the Java heap space, I performed the following steps to increase the heap space:
#Open up ''gmbuilder-32bit.bat'' in Notepad.
#Find the line
"C:\Program Files (x86)\Java\jre7\bin\java" -D32 -Xmx1024m -jar gmbuilder.jar
#Changed "-Xmx1024m" to "-Xmx2048m" thereby increasing the maximum heap space from 1024 MB to 2048 MB.
#Saved the file and closed it.

*Then, I tried the import/export again. I could not open ''gmbuilder-32bit.bat'' after I changed the maximum heap space. I repeated the above steps for ''gmbuilder.bat'' and found that I could then still open the program.
*Opening ''gmbuilder.bat'', I imported both the UniProt XML file and the OBO-XML file. However, I received a Java heap space error again when processing the OBO-XML file.
*I increased the heap space in ''gmbuilder.bat'' from 2048 MB to 4096 MB and tried the import/export cycle again. When this did not work, I increased the heap space in ''gmbuilder.bat'' to 8192 MB.
*Then, I was able to import the UniProt XML, OBO-XML, and GOA files as well as process the OBO-XML.
*However, I was not able to begin the export because an error popped up stating that an export to a GenMAPP database is only possible in ''gmbuilder-32bit.bat''.

=====Testing Report=====

Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131114_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]):
* Time taken to import: 0.73 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped):
* Time taken to import: 7.39 min
* Time taken to process: 47.15 min

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]):
* Time taken to import: 0.03 min

Name of .gdb file:
* Time taken to export .gdb:
* Upload your file and link to it here.

Note: '''Could not export the database. Received an error stating that an export to a GenMAPP database is only possible in ''gmbuilder-32bit.bat''.'''

====Troubleshooting====

#Ran cmd from the Start menu.
#cd to where GenMAPP Builder is located.
#Typed the same command that is in the .bat file, but directly at the prompt:
"C:\Program Files (x86)\Java\jre7\bin\java" -D32 -Xmx1536m -jar gmbuilder.jar

Received the following error:
Error occurred during initialization of VM.
Could not reserve enough space for project heap.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit

===November 19, 2013===

====Rerunning the Gene Database Import/Export====

*From November 11/15/2013, the UniProt XML, OBO-XML, and GOA files were imported and the OBO-XML file was processed the in 64-bit ''gmbuilder.bat'' with the maximum heap space set to 8192.
*Performed the export in 32-bit ''gmbuilder.bat''.

=====Testing Report=====

Version of GenMAPP Builder: Same as in previous testing report

Computer on which export was run: Same as in previous testing report

Postgres Database name: Same as in previous testing report

UniProt XML filename: Same as in previous testing report
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]):
* Time taken to import: 1.10 min

GO OBO-XML filename: Same as in previous testing report
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped):
* Time taken to import: 8.96 min
* Time taken to process: Did not reprocess because received a message stating that the GO data had already been processed.

GOA filename: Same as in previous testing report
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]):
* Time taken to import: 0.03 min

Name of .gdb file: Ct-Std v1 KS 20131119.gdb
* Time taken to export .gdb:
:*Start Time: 4:01:10 AM
:*End Time: 6:51:58 AM
* Upload your file and link to it here: [[Media:Ct-Std v1 KS 20131119.gdb|Ct-Std v1 KS 20131119.gdb]]

Note:

====Checking the Quality of the Exported Database====

#In ''gmbuilder-32bit.bat'', chose Run XML > Database Tallies for UniProt and GO....
:*The XML count for the UniProt file was 917 while the Database Count for the Uniprot file was 10087. This number discrepancy is likely due to importing the UniProt, OBO-XML, and GOA files multiple times in experimenting with the heap space threshold.
:*A new database must now be created in pgAdminIII in order to repeat the export/import gene database cycle.

===November 20-21, 2013===

====Finishing Installing Software to Set Up the Development Environment====

#Navigated to the "Download and Install" tab on the [http://subclipse.tigris.org/ Subclipse home page]
#Copied the Eclipse update site URL for the 1.10.x Release.
#Opened up Eclipse.
#Help > Install New Software...
#Pasted the URL into the "Work with" field.
#Checked the box next to "Subclipse" and "SVNKit".
#Clicked on Next and went through the process of installing the software.

====Repeating the Gene Database Import/Export Cycle====

#Created a new database in pgAdminIII.
#*Created tables in the database by executing the script in the file gmbuilder.sql.
#Since the aforementioned error occurred because duplicates were introduced into the database through repeated imports, I decided to try this new import-export cycle using 32-bit ''gmbuilder''.
#Opened ''gmbuilder-32bit.bat'' with the heap space set to 1024 MB and configured the new database.
#*Host or address: localhost
#*Port number: 5432
#*Database name: CT_KS_20131119_32bit_gmb2b71
#*Username: postgres
#*Password: <password of the PostgreSQL database created above>
#Imported the UniProt XML file into PostgreSQL through ''gmbuilder-32bit.bat''.
#Imported the OBO-XML file into PostgreSQL through ''gmbuilder-32bit.bat''.
#Processed the OBO-XML file through ''gmbuilder-32bit.bat''.
#Imported the GOA file into PostgreSQL through ''gmbuilder-32bit.bat''.

=====Testing Report=====
Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131119_32bit_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_11
::*Original file name from [http://www.uniprot.org/uniprot/?query=organism:315277+keyword:181 UniProt site]: uniprot-organism%3A315277+keyword%3A181.xml
* Time taken to import: 1.20 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): 11/06/2013
::*Original file name: go_daily-termdb.obo-xml.gz
* Time taken to import: 13.05 min
* Time taken to process: 10.80 min

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): 11/12/13
::*Original file name: 22183.C_trachomatis_A.goa
* Time taken to import: 0.04 min

Name of .gdb file: Ct-Std KS 20131121.gdb
* Time taken to export .gdb:
:*Start Time: 2:37:42 AM
:*End Time: 2:55:45 AM
* Upload your file and link to it here: [[Media:Ct-Std KS 20131121.gdb|Ct-Std KS 20131121.gdb]]

Note: '''There were no heap space errors this time, which is good. The errors probably occurred in the first place because of repeatedly importing the files.'''

===November 21, 2013===

====Revised the Custom Species Profile====

#Ran Eclipse.
#Navigated to Window > Open Perspective > Other... and chose SVN Repository Exploring
#Defined a new Subversion repository by clicking on the Add Repository button.
#Set the URL to https://svn.code.sf.net/p/xmlpipedb/code then clicked Finish.
#Double-cliked on https://svn.code.sf.net/p/xmlpipedb/code in the list.
#Double-clicked on ''trunk''.
#Right-clicked on ''gmbuilder'' and chose ''Checkout....''
#Chose ''Check out as a project configured using the New Project Wizard'' then click ''Finish''.
#In the New Project dialog that opened, chose Java Project from the list and then clicked ''Next >''.
#Entered the new project name ''xmlpipedb-gmbuilder'' and clicked ''Finish''.
#Set Eclipse to use JDK.
#*Navigated to Windows > Preferences and clicked on the ''Search'' button.
#*Expanded Computer > Local Disk (C:) > Program Files > Java > jdk1.7.0_45 and clicked OK.
#*Highlighted the row ''jdk'' and clicked on ''Edit''.
#*In "JRE home:" field, removed everything in the name after jdk1.7.0_45 and then clicked ''Finish''.
#Made sure that I was in the Java perspective by navigating to Window > Open Perspective > Other... and then choosing Java.
#Double-clicked on the ''xmlpipedb-gmbuilder'' folder.
#Double-clicked on the ''lib'' folder.
#Shift-clicked and selected all files in the lib folder and then control-clicked on every file inside the lib folder whose name did not end in .jar.
#Right-clicked on one of the files and choose Build Path > Add to Build Path.
#Check that the ''src'' folder was set to the source folder by right-clicking on it and then clicking on ''Build Path''.
#Right-clicked on the ''test'' folder and chose Build Path > Use as Source Folder. Then, made sure that there were no red x's in the list of folders.
#Right-clicked on ''build.xml'' toward the bottom of the list. Then, navigated to Team > Synchronize with Repository.
#Double-clicked on src > edu.lmu.xmlpipedb.gmbuilder.databasetoolkit.profiles > ChlamydiaTrachomatisSerovarAUniProtSpeciesProfile.java.\
#Within the code, added the gene database link pattern: "http://bacteria.ensembl.org/chlamydia_trachomatis_a_har_13/Gene/Summary?g=~" and then saved the code.
#Right-clicked on ''build.xml'' and chose Run As > Ant Build... (the one with the ellipses).
#In the Edit Configuration dialog that appears, checked on the clean and dist items in the Targets tab.
#Clicked on the ''Order...'' button and rearranged the items so that clean is first and dist is second and then clicked OK.
#Clicked the Run button.
#Right-clicked on the ''xmlpipedb-gmbuilder'' project folder and chose Refresh (F5 is its keyboard shortcut).

====Reran the Database Export With the New Build for gmbuilder====

#Made sure that pgAdminIII and the database "CT_KS_20131119_32bit_gmb2b71" is open.
#Opened the newly built ''gmbuilder-32bit.bat'' within the folder [User Folder] > workspace > xmlpipedb-gmbuilder > dist.
#Made sure that the database was configured to "CT_KS_20131119_32bit_gmb2b71".
#Performed a database export.

=====Testing Report=====

*Name of .gdb file: Ct-Std_v2_KS_20131121.gdb
*Time taken to export .gdb:
:*Start Time: 10:56 AM (approximately)
:*End Time: 11:14:26 AM
*Upload your file and link to it here: [[Media:Ct-Std v2 KS 20131121.gdb|Ct-Std v2 KS 20131121.gdb]]

Note:

====Testing the New Database with GenMAPP====
#Opened GenMAPP.
#Clicked on Data > New Gene Database and selected the gene database that was created today (Ct-Std v2 KS 20131121.gdb).
#On the toolbar, clicked on the button that has the word gene in a box.
#Clicked and dragged the cursor on the blank canvas to create a new gene box.
#Right-clicked on the gene box.
#In the pop-up window, typed in "CTA_0588" in the Gene ID search field.
#Under Gene ID System, selected Ordered Locus Names.
#Then, hit the "Search button". Interestingly, GenMAPP builder did not find this gene in the database.
#*To troubleshoot, I opened up the gene database in Access.
#*In the list of tables, I double-clicked on OrderedLocusNames.
#*I sorted the list of gene IDs from A-Z and then scrolled down to try to find "CTA_0588". I was able to find "CTA_0587" and "CTA_0589" but not "CTA_0588".
#Back in GenMAPP, I used the same steps as above to find "CTA_0587".
#"CTA_0587" was found in the database. In the window, I clicked on the link "CTA_0587" under the OrderedLocusNames section.
#*This link took me to the entry for this gene on the Model Organism Database (http://bacteria.ensembl.org/chlamydia_trachomatis_a_har_13/Gene/Summary?g=CTA_0587).

====Performing Quality Assurance on the New Database====

*Performed counts with Tally Engine.
[[Image:Tally Engine Results for Ct-Std v2 KS 20131121.jpg|thumb|none|upright=4]]
*Counted the number of unique gene IDs using xmlpipdb ''match'':
java -jar xmlpipedb-match-1.1.1.jar "CTA_[0-9][0-9][0-9][0-9]" < Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
:*'''Total unique matches = 911'''
*Counted the number of unique gene IDs using an SQL query:
select count(*) from genenametype where type = 'ordered locus' and value ~ 'CTA_[0-9][0-9][0-9][0-9]';
:*'''Count - 917'''
*Opened the database in Access. Then, double-clicked on the table "OriginalRowCounts".
{| style = "width:20%; font-size:100%"
| '''Table'''
| '''Rows'''
|-
| OrderedLocusNames
| 919
|-
| UniProt
| 917
|-
| UniProt-OrderedLocusNames
| 919
|}
*Double-clicked on the table "OrderedLocusNames" and scrolled all the way to the bottom. We found that some gene ID's had the format pCTA_####.
*Went back to the command prompt and ran the following command:
java -jar xmlpipedb-match-1.1.1.jar "[p|]CTA_[0-9][0-9][0-9][0-9]" < Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
:*'''Total unique matches = 8'''
:*Adding the results of the two runs with xmlpipedb match, we got 919 unique gene IDs, which is the same number as that in Access.

==Week 14==

===November 24, 2013===

====Ran GenMAPP Using ''C. trachomatis'' Gene Database====

#Opened GenMAPP.
#Made sure that the database was set to Ct-Std_v2_KS_20131121.gdb.
#Imported [[Media:C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt|sample microarray data]] into GenMAPP.
#*Loaded the microarray data by navigating to Expression Dataset Manager > New Dataset.
#*There were 23015 errors.
#Opened the exceptions file that was created to inspect the errors and scrolled down to where the IDs CTA_#### begin.
#*The gene IDs are actually in the format CTA_####_RRMH#####_at.
#Tried to search for a sample ID in the format found in the exceptions file in the UniProt database or any NCBI databases.
#*Could not find CTA_0806_RRMH00303, RRMH00303, CTA_0806_RRMH00303_at, or RRMH00303_at in the UniProt database.
#*Could not find CTA_0806_RRMH00303, RRMH00303, CTA_0806_RRMH00303_at, or RRMH00303_at in any database through NCBI.
#*My suspicion is that the RRMH#####_at portion of the ID is an identifier that is appended to each probe on the Affymetrix microarray when it is read by the company's software.

===November 27-28, 2013===

====Running SQL Queries to Determine Discrepancy in the Count Between the UniProt XML and the Gene Database====

*Ran an SQL query to determine if there were any ordered locus names in the database that do not match the ID "CTA_####".
select * from genenametype where type = 'ordered locus' and not value ~ 'CTA_[0-9][0-9][0-9][0-9]';
:*There were no hits.
*Ran an SQL query to show all the ordered locus names in the database that have the ID "CTA_####".
select * from genenametype where type = 'ordered locus' and value ~ 'CTA_[0-9][0-9][0-9][0-9]';
:*Scrolling through the results, I saw the ID "CTA_0406/CTA_0407/CTA_0408".
:*The [http://www.uniprot.org/manual/gene_name UniProt manual] stated that ordered locus names that are connected by a slash indicate two predicted genes that are actually one gene.
:*The combination of these IDs explain why two less IDs show up in the count by the Tally Engine in comparison to the SQL query.
*Opened the gene database within Access and filtered the ordered locus names to show only those that match the ID "CTA_0407".
:*The ID appeared on its own without being joined to other ID's with slashes.

====Testing Report for Finalized Gene Database====

Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131119_32bit_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_11
::*Original file name from [http://www.uniprot.org/uniprot/?query=organism:315277+keyword:181 UniProt site]: uniprot-organism%3A315277+keyword%3A181.xml
* Time taken to import: 1.20 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): 11/06/2013
::*Original file name: go_daily-termdb.obo-xml.gz
* Time taken to import: 13.05 min
* Time taken to process: 10.80 min

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): 11/12/13
::*Original file name: 22183.C_trachomatis_A.goa
* Time taken to import: 0.04 min

*Name of .gdb file: Ct-Std_v2_KS_20131121.gdb
*Time taken to export .gdb:
:*Start Time: 10:56 AM (approximately)
:*End Time: 11:14:26 AM
*Upload your file and link to it here: [[Media:Ct-Std v2 KS 20131121.gdb|Ct-Std v2 KS 20131121.gdb]]

Note: There is one ID "CTA_0406/CTA_0407/CTA_0408" that is a combination of three predicted genes that are actually one gene. The Tally Engine counts this as one gene that contributes to the total count of 917. However, through Access and xmlpipedb ''match'', the ID is separated into three separate genes bringing the total count to 919.

==Week 15==

===December 3, 2013===

*Worked with Hilda to separate the gene IDs from the Affymetrix IDs appended to them in the microarray data.
*Worked with Dillon to run GenMAPP and MAPPFinder with the microarray data and the gene database created for ''C. trachomatis''.

[[Category:Journal Entry]]
[[Category:Individual Homework]]

Ksherbina Project Notebook

2013-12-06T05:55:39Z

Ksherbina: Added section for Week 15

{{Ksherbina}}

==Week 12==

===Starting the Set Up of the Testing Environment===

#Downloaded the following software onto my personal computer:
#*Latest version of GenMAPP Builder (gmbuilder2.0-b71) from [http://sourceforge.net/projects/xmlpipedb/files/?source=navbar SourceForge]
#*[http://www.oracle.com/technetwork/java/javase/downloads/index.html Java SE 7u45]
#*[http://www.eclipse.org/downloads/ Eclipse IDE for Java EE Developers]
#Created an account on [http://sourceforge.net/ SourceForge].

===Import/Export of Gene Database===

#Downloaded the [http://www.uniprot.org/uniprot/?query=organism:315277+keyword:181 UniProt XML] for ''C. trachomatis'' serovar A strain HAR-13: [[Media:Uniprot XML C.trachomatis serovar A KS 20131114.xml|Uniprot XML C.trachomatis serovar A KS 20131114.xml]]
#Downloaded the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ GOA file] for ''C. trachomatis'' serovar A: [[Media:22183.C trachomatis A KS 20131114.goa|22183.C trachomatis A KS 20131114.goa]]
#Created a new database in pgAdminIII: CT_KS_20131114_gmb2b71
#*Created tables in the database by executing the script in the file ''gmbuilder.sql''.
#Launched ''gmbuilder-32bit.bat'':
#Configured the database in gmbuilder:
#*Host or address: localhost
#*Port number: 5432
#*Database name: CT_KS_20131114_gmb2b71
#*Username: postgres
#*Password: <password of the PostgreSQL database created above>
#Imported the UniProt XML file into the PostgreSQL database through gmbuilder.
#Imported the OBO-XML file into the PostgreSQL database through gmbuilder.
#*gmbuilder was not able to completely process the data. See the testing report below for details.
#Imported the GOA file into into the PostgreSQL database through gmbuilder.
#*I did not begin the actual export of the database due to the problem with processing the OBO-XML file.

====Testing Report====

Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131114_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_11
* Time taken to import: 1.70 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped):
* Time taken to import: 9.29 min
* Time taken to process: '''Could no complete the process. The following error came up when gmbuilder was processing the 81000 term:'''
ERROR edu.lmu.xmlpipedb.gmbuilder.GenMAPPBuilder - java.util.concurrent.ExecutionException:java.lang.OutOfMemoryError:
Java heap space

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]):
* Time taken to import: 0.02 min

Name of .gdb file:
* Time taken to export .gdb:
* Upload your file and link to it here.

Note:

===Creating Custom Species Profile===

#On one of the class computers in the first row, I launched Eclipse with Subclipse.
#Went to Window > Open Perspective > Other > SVN Repository Exploring.
#Defined a new subversion by clicking on ''Add Repository'' and adding the URL https://svn.code.sf.net/p/xmlpipedb/code. Clicked Finish.
#Double-clicked on trunk, right-clicked on gmbuilder, and chose Checkout....
#Created a new project (chose Check out as a project configured using the New Project Wizard).
#Went to Window > Preferences > Java > Installed JREs. Added a new environment with JDK.
#Double-clicked on the project folder.
#Double-clicked on the lib folder.
#Shift-clicked and selected all files in the lib folder and then control-clicked on every file inside the lib folder whose name does not end in .jar. Right-click on one of them and choose Build Path > Add to Build Path from the popup menu that comes up.
#Created a species profile called ChlamydiaTrachomatisSerovarA.
#*Dondi modified the name to ChlamydiaTrachomatisSerovarAUniProtSpeciesProfile.
#Customized the species profile.
#*Opened the new profile and added the lines of code as specified in [[Coder|"Customize the Species Profile"]] directions.
#**Still need to insert the species-specific URL that returns a web page describing a gene for that species.
#Committed the profile following the directions in [[Coder|"Updating and Committing Code"]].

===Reflection===

#What were the week’s key accomplishments?
#*One of this week's key accomplishments was downloading the files necessary to run a import/export gene database cycle and the microarray files that will be analyzed. Also this week, we were able to start an import/export cycle.
#What are next week’s target accomplishments?
#*Next week, I would like to complete an import/export cycle and vet it with the QA person. In addition, I hope that we can download the software necessary to open the microarray files, format the files, and perform statistical analysis.
#What team strengths were seen this week?
#*I believe our strength this week as a team was meeting outside of class to create milestones. This meeting results in a detailed calendar of intermediate milestones that we plan to meet accomplish in order to meet the file deadline.
#What team weaknesses were seen this week?
#*It was difficult to set up the aforementioned meeting due to our busy schedules. Additionally, we did not have a clear plan for working on the project until the middle/end of the week. In the coming weeks, our goal will be to improve our work plan for each class period so as to minimize the difficulty of scheduling work meetings outside of class.

==Week 13==

===November 15, 2013===

====Import/Export Gene Database====

After the error that came up last week regarding the Java heap space, I performed the following steps to increase the heap space:
#Open up ''gmbuilder-32bit.bat'' in Notepad.
#Find the line
"C:\Program Files (x86)\Java\jre7\bin\java" -D32 -Xmx1024m -jar gmbuilder.jar
#Changed "-Xmx1024m" to "-Xmx2048m" thereby increasing the maximum heap space from 1024 MB to 2048 MB.
#Saved the file and closed it.

*Then, I tried the import/export again. I could not open ''gmbuilder-32bit.bat'' after I changed the maximum heap space. I repeated the above steps for ''gmbuilder.bat'' and found that I could then still open the program.
*Opening ''gmbuilder.bat'', I imported both the UniProt XML file and the OBO-XML file. However, I received a Java heap space error again when processing the OBO-XML file.
*I increased the heap space in ''gmbuilder.bat'' from 2048 MB to 4096 MB and tried the import/export cycle again. When this did not work, I increased the heap space in ''gmbuilder.bat'' to 8192 MB.
*Then, I was able to import the UniProt XML, OBO-XML, and GOA files as well as process the OBO-XML.
*However, I was not able to begin the export because an error popped up stating that an export to a GenMAPP database is only possible in ''gmbuilder-32bit.bat''.

=====Testing Report=====

Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131114_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]):
* Time taken to import: 0.73 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped):
* Time taken to import: 7.39 min
* Time taken to process: 47.15 min

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]):
* Time taken to import: 0.03 min

Name of .gdb file:
* Time taken to export .gdb:
* Upload your file and link to it here.

Note: '''Could not export the database. Received an error stating that an export to a GenMAPP database is only possible in ''gmbuilder-32bit.bat''.'''

====Troubleshooting====

#Ran cmd from the Start menu.
#cd to where GenMAPP Builder is located.
#Typed the same command that is in the .bat file, but directly at the prompt:
"C:\Program Files (x86)\Java\jre7\bin\java" -D32 -Xmx1536m -jar gmbuilder.jar

Received the following error:
Error occurred during initialization of VM.
Could not reserve enough space for project heap.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit

===November 19, 2013===

====Rerunning the Gene Database Import/Export====

*From November 11/15/2013, the UniProt XML, OBO-XML, and GOA files were imported and the OBO-XML file was processed the in 64-bit ''gmbuilder.bat'' with the maximum heap space set to 8192.
*Performed the export in 32-bit ''gmbuilder.bat''.

=====Testing Report=====

Version of GenMAPP Builder: Same as in previous testing report

Computer on which export was run: Same as in previous testing report

Postgres Database name: Same as in previous testing report

UniProt XML filename: Same as in previous testing report
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]):
* Time taken to import: 1.10 min

GO OBO-XML filename: Same as in previous testing report
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped):
* Time taken to import: 8.96 min
* Time taken to process: Did not reprocess because received a message stating that the GO data had already been processed.

GOA filename: Same as in previous testing report
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]):
* Time taken to import: 0.03 min

Name of .gdb file: Ct-Std v1 KS 20131119.gdb
* Time taken to export .gdb:
:*Start Time: 4:01:10 AM
:*End Time: 6:51:58 AM
* Upload your file and link to it here: [[Media:Ct-Std v1 KS 20131119.gdb|Ct-Std v1 KS 20131119.gdb]]

Note:

====Checking the Quality of the Exported Database====

#In ''gmbuilder-32bit.bat'', chose Run XML > Database Tallies for UniProt and GO....
:*The XML count for the UniProt file was 917 while the Database Count for the Uniprot file was 10087. This number discrepancy is likely due to importing the UniProt, OBO-XML, and GOA files multiple times in experimenting with the heap space threshold.
:*A new database must now be created in pgAdminIII in order to repeat the export/import gene database cycle.

===November 20-21, 2013===

====Finishing Installing Software to Set Up the Development Environment====

#Navigated to the "Download and Install" tab on the [http://subclipse.tigris.org/ Subclipse home page]
#Copied the Eclipse update site URL for the 1.10.x Release.
#Opened up Eclipse.
#Help > Install New Software...
#Pasted the URL into the "Work with" field.
#Checked the box next to "Subclipse" and "SVNKit".
#Clicked on Next and went through the process of installing the software.

====Repeating the Gene Database Import/Export Cycle====

#Created a new database in pgAdminIII.
#*Created tables in the database by executing the script in the file gmbuilder.sql.
#Since the aforementioned error occurred because duplicates were introduced into the database through repeated imports, I decided to try this new import-export cycle using 32-bit ''gmbuilder''.
#Opened ''gmbuilder-32bit.bat'' with the heap space set to 1024 MB and configured the new database.
#*Host or address: localhost
#*Port number: 5432
#*Database name: CT_KS_20131119_32bit_gmb2b71
#*Username: postgres
#*Password: <password of the PostgreSQL database created above>
#Imported the UniProt XML file into PostgreSQL through ''gmbuilder-32bit.bat''.
#Imported the OBO-XML file into PostgreSQL through ''gmbuilder-32bit.bat''.
#Processed the OBO-XML file through ''gmbuilder-32bit.bat''.
#Imported the GOA file into PostgreSQL through ''gmbuilder-32bit.bat''.

=====Testing Report=====
Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131119_32bit_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_11
::*Original file name from [http://www.uniprot.org/uniprot/?query=organism:315277+keyword:181 UniProt site]: uniprot-organism%3A315277+keyword%3A181.xml
* Time taken to import: 1.20 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): 11/06/2013
::*Original file name: go_daily-termdb.obo-xml.gz
* Time taken to import: 13.05 min
* Time taken to process: 10.80 min

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): 11/12/13
::*Original file name: 22183.C_trachomatis_A.goa
* Time taken to import: 0.04 min

Name of .gdb file: Ct-Std KS 20131121.gdb
* Time taken to export .gdb:
:*Start Time: 2:37:42 AM
:*End Time: 2:55:45 AM
* Upload your file and link to it here: [[Media:Ct-Std KS 20131121.gdb|Ct-Std KS 20131121.gdb]]

Note: '''There were no heap space errors this time, which is good. The errors probably occurred in the first place because of repeatedly importing the files.'''

===November 21, 2013===

====Revised the Custom Species Profile====

#Ran Eclipse.
#Navigated to Window > Open Perspective > Other... and chose SVN Repository Exploring
#Defined a new Subversion repository by clicking on the Add Repository button.
#Set the URL to https://svn.code.sf.net/p/xmlpipedb/code then clicked Finish.
#Double-cliked on https://svn.code.sf.net/p/xmlpipedb/code in the list.
#Double-clicked on ''trunk''.
#Right-clicked on ''gmbuilder'' and chose ''Checkout....''
#Chose ''Check out as a project configured using the New Project Wizard'' then click ''Finish''.
#In the New Project dialog that opened, chose Java Project from the list and then clicked ''Next >''.
#Entered the new project name ''xmlpipedb-gmbuilder'' and clicked ''Finish''.
#Set Eclipse to use JDK.
#*Navigated to Windows > Preferences and clicked on the ''Search'' button.
#*Expanded Computer > Local Disk (C:) > Program Files > Java > jdk1.7.0_45 and clicked OK.
#*Highlighted the row ''jdk'' and clicked on ''Edit''.
#*In "JRE home:" field, removed everything in the name after jdk1.7.0_45 and then clicked ''Finish''.
#Made sure that I was in the Java perspective by navigating to Window > Open Perspective > Other... and then choosing Java.
#Double-clicked on the ''xmlpipedb-gmbuilder'' folder.
#Double-clicked on the ''lib'' folder.
#Shift-clicked and selected all files in the lib folder and then control-clicked on every file inside the lib folder whose name did not end in .jar.
#Right-clicked on one of the files and choose Build Path > Add to Build Path.
#Check that the ''src'' folder was set to the source folder by right-clicking on it and then clicking on ''Build Path''.
#Right-clicked on the ''test'' folder and chose Build Path > Use as Source Folder. Then, made sure that there were no red x's in the list of folders.
#Right-clicked on ''build.xml'' toward the bottom of the list. Then, navigated to Team > Synchronize with Repository.
#Double-clicked on src > edu.lmu.xmlpipedb.gmbuilder.databasetoolkit.profiles > ChlamydiaTrachomatisSerovarAUniProtSpeciesProfile.java.\
#Within the code, added the gene database link pattern: "http://bacteria.ensembl.org/chlamydia_trachomatis_a_har_13/Gene/Summary?g=~" and then saved the code.
#Right-clicked on ''build.xml'' and chose Run As > Ant Build... (the one with the ellipses).
#In the Edit Configuration dialog that appears, checked on the clean and dist items in the Targets tab.
#Clicked on the ''Order...'' button and rearranged the items so that clean is first and dist is second and then clicked OK.
#Clicked the Run button.
#Right-clicked on the ''xmlpipedb-gmbuilder'' project folder and chose Refresh (F5 is its keyboard shortcut).

====Reran the Database Export With the New Build for gmbuilder====

#Made sure that pgAdminIII and the database "CT_KS_20131119_32bit_gmb2b71" is open.
#Opened the newly built ''gmbuilder-32bit.bat'' within the folder [User Folder] > workspace > xmlpipedb-gmbuilder > dist.
#Made sure that the database was configured to "CT_KS_20131119_32bit_gmb2b71".
#Performed a database export.

=====Testing Report=====

*Name of .gdb file: Ct-Std_v2_KS_20131121.gdb
*Time taken to export .gdb:
:*Start Time: 10:56 AM (approximately)
:*End Time: 11:14:26 AM
*Upload your file and link to it here: [[Media:Ct-Std v2 KS 20131121.gdb|Ct-Std v2 KS 20131121.gdb]]

Note:

====Testing the New Database with GenMAPP====
#Opened GenMAPP.
#Clicked on Data > New Gene Database and selected the gene database that was created today (Ct-Std v2 KS 20131121.gdb).
#On the toolbar, clicked on the button that has the word gene in a box.
#Clicked and dragged the cursor on the blank canvas to create a new gene box.
#Right-clicked on the gene box.
#In the pop-up window, typed in "CTA_0588" in the Gene ID search field.
#Under Gene ID System, selected Ordered Locus Names.
#Then, hit the "Search button". Interestingly, GenMAPP builder did not find this gene in the database.
#*To troubleshoot, I opened up the gene database in Access.
#*In the list of tables, I double-clicked on OrderedLocusNames.
#*I sorted the list of gene IDs from A-Z and then scrolled down to try to find "CTA_0588". I was able to find "CTA_0587" and "CTA_0589" but not "CTA_0588".
#Back in GenMAPP, I used the same steps as above to find "CTA_0587".
#"CTA_0587" was found in the database. In the window, I clicked on the link "CTA_0587" under the OrderedLocusNames section.
#*This link took me to the entry for this gene on the Model Organism Database (http://bacteria.ensembl.org/chlamydia_trachomatis_a_har_13/Gene/Summary?g=CTA_0587).

====Performing Quality Assurance on the New Database====

*Performed counts with Tally Engine.
[[Image:Tally Engine Results for Ct-Std v2 KS 20131121.jpg|thumb|none|upright=4]]
*Counted the number of unique gene IDs using xmlpipdb ''match'':
java -jar xmlpipedb-match-1.1.1.jar "CTA_[0-9][0-9][0-9][0-9]" < Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
:*'''Total unique matches = 911'''
*Counted the number of unique gene IDs using an SQL query:
select count(*) from genenametype where type = 'ordered locus' and value ~ 'CTA_[0-9][0-9][0-9][0-9]';
:*'''Count - 917'''
*Opened the database in Access. Then, double-clicked on the table "OriginalRowCounts".
{| style = "width:20%; font-size:100%"
| '''Table'''
| '''Rows'''
|-
| OrderedLocusNames
| 919
|-
| UniProt
| 917
|-
| UniProt-OrderedLocusNames
| 919
|}
*Double-clicked on the table "OrderedLocusNames" and scrolled all the way to the bottom. We found that some gene ID's had the format pCTA_####.
*Went back to the command prompt and ran the following command:
java -jar xmlpipedb-match-1.1.1.jar "[p|]CTA_[0-9][0-9][0-9][0-9]" < Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
:*'''Total unique matches = 8'''
:*Adding the results of the two runs with xmlpipedb match, we got 919 unique gene IDs, which is the same number as that in Access.

==Week 14==

===November 24, 2013===

====Ran GenMAPP Using ''C. trachomatis'' Gene Database====

#Opened GenMAPP.
#Made sure that the database was set to Ct-Std_v2_KS_20131121.gdb.
#Imported [[Media:C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt|sample microarray data]] into GenMAPP.
#*Loaded the microarray data by navigating to Expression Dataset Manager > New Dataset.
#*There were 23015 errors.
#Opened the exceptions file that was created to inspect the errors and scrolled down to where the IDs CTA_#### begin.
#*The gene IDs are actually in the format CTA_####_RRMH#####_at.
#Tried to search for a sample ID in the format found in the exceptions file in the UniProt database or any NCBI databases.
#*Could not find CTA_0806_RRMH00303, RRMH00303, CTA_0806_RRMH00303_at, or RRMH00303_at in the UniProt database.
#*Could not find CTA_0806_RRMH00303, RRMH00303, CTA_0806_RRMH00303_at, or RRMH00303_at in any database through NCBI.
#*My suspicion is that the RRMH#####_at portion of the ID is an identifier that is appended to each probe on the Affymetrix microarray when it is read by the company's software.

===November 27-28, 2013===

====Running SQL Queries to Determine Discrepancy in the Count Between the UniProt XML and the Gene Database====

*Ran an SQL query to determine if there were any ordered locus names in the database that do not match the ID "CTA_####".
select * from genenametype where type = 'ordered locus' and not value ~ 'CTA_[0-9][0-9][0-9][0-9]';
:*There were no hits.
*Ran an SQL query to show all the ordered locus names in the database that have the ID "CTA_####".
select * from genenametype where type = 'ordered locus' and value ~ 'CTA_[0-9][0-9][0-9][0-9]';
:*Scrolling through the results, I saw the ID "CTA_0406/CTA_0407/CTA_0408".
:*The [http://www.uniprot.org/manual/gene_name UniProt manual] stated that ordered locus names that are connected by a slash indicate two predicted genes that are actually one gene.
:*The combination of these IDs explain why two less IDs show up in the count by the Tally Engine in comparison to the SQL query.
*Opened the gene database within Access and filtered the ordered locus names to show only those that match the ID "CTA_0407".
:*The ID appeared on its own without being joined to other ID's with slashes.

====Testing Report for Finalized Gene Database====

Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131119_32bit_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_11
::*Original file name from [http://www.uniprot.org/uniprot/?query=organism:315277+keyword:181 UniProt site]: uniprot-organism%3A315277+keyword%3A181.xml
* Time taken to import: 1.20 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): 11/06/2013
::*Original file name: go_daily-termdb.obo-xml.gz
* Time taken to import: 13.05 min
* Time taken to process: 10.80 min

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): 11/12/13
::*Original file name: 22183.C_trachomatis_A.goa
* Time taken to import: 0.04 min

*Name of .gdb file: Ct-Std_v2_KS_20131121.gdb
*Time taken to export .gdb:
:*Start Time: 10:56 AM (approximately)
:*End Time: 11:14:26 AM
*Upload your file and link to it here: [[Media:Ct-Std v2 KS 20131121.gdb|Ct-Std v2 KS 20131121.gdb]]

Note: There is one ID "CTA_0406/CTA_0407/CTA_0408" that is a combination of three predicted genes that are actually one gene. The Tally Engine counts this as one gene that contributes to the total count of 917. However, through Access and xmlpipedb ''match'', the ID is separated into three separate genes bringing the total count to 919.

==Week 15==

[[Category:Journal Entry]]
[[Category:Individual Homework]]

Chlamydia trachomatis

2013-12-05T18:27:19Z

Ksherbina: /* Deadlines and Intermediate Milestones */Fixed initials

{{Team_H(oo)KD}}

==Deadlines and Intermediate Milestones==

{| style="width: 100%; margin: 1em 0" cellpadding="5ex"
! style="width: 8em; background: #ddd" | Monday
! style="width: 8em; background: #ddd" | Tuesday
! style="width: 8em; background: #ddd" | Wednesday
! style="width: 8em; background: #ddd" | Thursday
! style="width: 8em; background: #ddd" | Friday
|-
| style="vertical-align: top" | 11/11
| style="vertical-align: top" | 11/12
| style="vertical-align: top" | 11/13
| style="vertical-align: top" | 11/14
*Run an export/import of gene database cycle '''KS/HD'''
*Create species profile '''KS'''
*Obtain and format raw microarray data '''DW'''
| style="vertical-align: top" | 11/15
*Rerun export/import of gene database cycle '''KS'''
|-
| style="vertical-align: top; background: #eee" | 11/18
| style="vertical-align: top; background: #eee" | 11/19
*Vet the gene database exported last week '''KS/HD/DW'''
*Format the raw microarray data '''DW'''
| style="vertical-align: top; background: #eee" | 11/20
| style="vertical-align: top; background: #eee" | 11/21
*Perform microarray statistical analysis '''DW'''
*Work on custom species profile '''KS/HD'''
| style="vertical-align: top; background: #eee" | 11/22
|-
| style="vertical-align: top" | 11/25
| style="vertical-align: top" | 11/26
*Build and commit species profile '''KS'''
*Customize Tally Engine '''KS/HD'''
*Perform GenMAPP and MAPPFinder Analysis '''DW'''
| style="vertical-align: top" | 11/27
'''THANKSGIVING BREAK'''
| style="vertical-align: top" | 11/28
'''THANKSGIVING BREAK'''
| style="vertical-align: top" | 11/29
'''THANKSGIVING BREAK'''
|-
| style="vertical-align: top; background: #eee" | 12/02
| style="vertical-align: top; background: #eee" | 12/03
*Run GenMAPP and MAPPFinder '''DW'''
*Customize the Tally Engine for ''C. trachomatis'' '''DW'''
| style="vertical-align: top; background: #eee" | 12/04
*Run GenMAPP and MAPPFinder again if necessary '''DW'''
| style="vertical-align: top; background: #eee" | 12/05
*Begin working on the relational database schema '''HD'''
*Commit the changes to gmbuilder '''KS'''
| style="vertical-align: top; background: #eee" | 12/06
|-
| style="vertical-align: top; background: #eee" | 12/09
*Write the paper '''HD/KS/DW'''
*Work on the presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/10
*Write the paper '''HD/KS/DW'''
*Work on the presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/11
*Finish the presentation '''HD/KS/DW'''
*Practice the final presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/12
*'''Final Presentation'''
*Proofread the final paper '''HD/KS/DW'''
*Make sure all the deliverables are raedy '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/13
|}

==Annotated Bibliography of Genomics Papers for ''C. trachomatis''==

===Whole genome sequencing of ''C. trachomatis''===

#Journal article describing results of whole genome sequencing:
#*Database: PubMed
#*Search Terms: Chlamydia trachomatis [MeSH Term] AND genome [Title]
#*There were 29 results. Many of the articles that appear using the above search terms have to do with performing whole-genome analysis of ''C. trachomatis'' to look at genetic polymorphisms in the bacterium and molecular and genetic characteristics that are observed when the bacterium infects a host.
#*Stephens, R.S., Kalman, S., Lammel, C., Fan, J., Marathe, R., Aravind, L., Mitchell, W., Olinger, L., Tatusov, R., Zhao, Q., Koonin, E. V., Davis, R.W. (1998) Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis. ''Science'' '''282''': 754-759. 10.1126/science.282.5389.754.
#*The article can be viewed as an [http://www.sciencemag.org/content/282/5389/754.long HTML].
#*The article can be viewed as a [[Media: Science-1998-Stephens-754-9.pdf | PDF]].
#*This article includes [http://www.sciencemag.org/site/feature/data/982604.xhtml Supplementary Material].
#Use the genome sequencing article you found to perform a prospective search in the ISI Web of Science/Knowledge database.
#*How many results did you get?: 872
#*Based on the titles and abstracts of the papers, what type of research directions have been taken now that the genome for that organism has been sequenced?: Based on the titles and abstracts found in the results, it seems that research has been primarily in the direction of understanding how the disease functions and assessing the severity of the disease (and how to cure it).

===Microarray articles===
The EBI ArrayExpress database was used to locate each of the four following microarray articles. The search terms used were Chlamydia trachomatis under 'By organism' while the rest of the drop down menus were not altered.

The following articles appear in order of preference:

#Omsland, A., Sager, J., Nair, V., Sturdevant, D.E., Hackstadt, T. (2012) [http://www.pnas.org/content/109/48/19781.full Developmental stage-specific metabolic and transcriptional activity of Chlamydia trachomatis in an axenic medium]. ''PNAS'' '''109''': 19781-19785. doi: 10.1073/pnas.1212831109.
#*[[Media: PNAS-2012-Omsland-19781-5.pdf|PDF Version of Paper]]
#*[http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-39530/?keywords=&organism=Chlamydia+trachomatis&array=&exptype%5B%5D=&exptype%5B%5D= Microarray Data]
#*Chlamydiae undergo a biphasic developmental cycle characterized by an infectious cell type known as either an elementary body (EB) and an intracellular replicative form called a reticulate body (RB). Chlamydia was incubated under microaerobic conditions to test the differences in preferred energy source between EB's and RB's.
#*There weren't necessarily a "treatment" group and a "control" group. Referencing figure 4, it can be inferred that EB would be considered the "treatment" group because of the way that the ratio was set up, being EB to RB. In this way the RB would be the "control" group as they are looking at the EB population relative to the RB population.
#*4 Biological Replicates were performed for the control and treatment; as the article states that "Density gradient-purified EBs and RBs were incubated in quadruplicate in four-well plates...".


==Journal Club Presentation on ''C. trachomatis'' Genome Sequencing Paper==

[[Media:Presentation on C.trachomatis Genome Paper KS HD DW 20131112.pdf|Here]] is the PDF version of the presentation given on November 12, 2013 on the Stephens ''et al.'' (1998) paper.


==Important Files==

*[[Media: Go_daily-termdb_v2_HD_20131107.obo-xml.gz|Go_daily-termdb_v2_HD_20131107.obo-xml.gz]]
:GO OBO-XML file for ''C. trachomatis'' serovar A.
*[[Media: Uniprot XML C.trachomatis serovar A KS 20131114.xml|Uniprot XML C.trachomatis serovar A KS 20131114.xml]]
:UniProt XML file for ''C. trachomatis'' serovar A.
*[[Media: 22183.C trachomatis A KS 20131114.goa|22183.C trachomatis A KS 20131114.goa]]
:GOA file for ''C. trachomatis'' serovar A.
*[[Media: Ct-Std v1 KS 20131119.gdb|Ct-Std v1 KS 20131119.gdb]]
:First iteration of the export/import gene database cycle.
*[[Media: Ct-Std KS 20131121.gdb|Ct-Std KS 20131121.gdb]]
:Updated gene database.
*[[Media: Ct-Std v2 KS 20131121.gdb|Ct-Std v2 KS 20131121.gdb]]
:New gene database created with the new version of gmbuilder built on 11/21/2013 using the custom species profile for ''C. trachomatis.''

'''Array Express Article Raw Files'''

*[[Media:Raw_Data_DW_V1_12112013_A-GEOD-4692.adf.txt|Raw_Data_DW_V1_12112013_A-GEOD-4692.adf.txt]]
:Microarray raw ADF file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.idf.txt|Raw_Data_DW_V1_12112013_E-GEOD-39530.idf.txt]]
:Microarray raw IDF file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.processed.1.zip|Raw_Data_DW_V1_12112013_E-GEOD-39530.processed.1.zip]]
:Microarray raw processed zip file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.raw.1.zip|Raw_Data_DW_V1_12112013_E-GEOD-39530.raw.1.zip]]
:Microarray raw zip file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.sdrf.txt|Raw_Data_DW_V1_12112013_E-GEOD-39530.sdrf.txt]]
:Microarray SDRF text file for ''C. trachomatis''.
*[[Media:Sdrf information excel.xls|Sdrf information excel.xls]]
:sdrf information in excel spreadsheet format.

'''Excel Master Spreadsheet File Versions (Containing the Formatted Raw Data)'''

*[[Media:Master Spreadsheet.xls|master spreadsheet work in progress]]
:Initial Microarray raw data formatted and opened in Microsoft Excel.
*[[Media: C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.xls|C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.xls]]
:Sample C. trachomatis microarray data that was used to run GenMAPP with the latest ''C. trachomatis'' database.
*[[Media: C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt|C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt]]
:Tab-delimited version of the file above.
*[[Media: Final Master Spreadsheet.xls|Final Master Spreadsheet]]
:Microarray data with identified Gene ID's and Log base data.
*[[Media: Master_Spreadsheet_Chlamydia_20131125_KD.xls | Master_Spreadsheet_Chlamydia_20131125_KD.xls]]
:version edited during office hours by Dr. Dahlquist to demonstrate statistics
*[[Media: Master Spreadsheet Chlamydia V1 20131125 DW.xls]]
:finished version of version edited during office hours by Dr. Dahlquist to demonstrate statistics -- finished by Dillon Williams
*[[Media: Copy of Master Spreadsheet Chlamydia V2 20131203 HD.xls]]
:Microarray data with separated IDs CTA_#### from _RRMH#####_...
*[[Media: For GenMAPP Chlamydia V4 20131203 DW.xls]]
:Excel Speadsheet with data formatted for GenMAPP (Version run in GenMAPP).

'''For GenMAPP Files'''
*[[Media: Ctrachomatis EbtoRb ATP metabolic process.mapp]]
: EB to RB ATP Metabolic Process Mapp file.
*[[Media: Ctrachomatis EbtoRB glucose catabolic process 20131204 DW.mapp]]
: EB to RB glucose catabolic process map.
*[[Media: CTrachomatis EBtoRB Rifampicin 20131204 DW-Criterion1-GO.txt]]
: EB to RB Rifampicin go.txt file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.gex]]
: For_GenMAPP .gex file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.gmf]]
: For GenMAPP .gmf file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.zip]]
: Access file created with .gex file.

==Wiki Formatting==

Template: [[Template:Team_H(oo)KD]]

[[Category: Group Projects]]
[[Category: Team H(oo)KD]]

Chlamydia trachomatis

2013-12-05T18:26:48Z

Ksherbina: /* Deadlines and Intermediate Milestones */ Changed milestones for week 15

{{Team_H(oo)KD}}

==Deadlines and Intermediate Milestones==

{| style="width: 100%; margin: 1em 0" cellpadding="5ex"
! style="width: 8em; background: #ddd" | Monday
! style="width: 8em; background: #ddd" | Tuesday
! style="width: 8em; background: #ddd" | Wednesday
! style="width: 8em; background: #ddd" | Thursday
! style="width: 8em; background: #ddd" | Friday
|-
| style="vertical-align: top" | 11/11
| style="vertical-align: top" | 11/12
| style="vertical-align: top" | 11/13
| style="vertical-align: top" | 11/14
*Run an export/import of gene database cycle '''KS/HD'''
*Create species profile '''KS'''
*Obtain and format raw microarray data '''DW'''
| style="vertical-align: top" | 11/15
*Rerun export/import of gene database cycle '''KS'''
|-
| style="vertical-align: top; background: #eee" | 11/18
| style="vertical-align: top; background: #eee" | 11/19
*Vet the gene database exported last week '''KS/HD/DW'''
*Format the raw microarray data '''DW'''
| style="vertical-align: top; background: #eee" | 11/20
| style="vertical-align: top; background: #eee" | 11/21
*Perform microarray statistical analysis '''DW'''
*Work on custom species profile '''KS/HD'''
| style="vertical-align: top; background: #eee" | 11/22
|-
| style="vertical-align: top" | 11/25
| style="vertical-align: top" | 11/26
*Build and commit species profile '''KS'''
*Customize Tally Engine '''KS/HD'''
*Perform GenMAPP and MAPPFinder Analysis '''DW'''
| style="vertical-align: top" | 11/27
'''THANKSGIVING BREAK'''
| style="vertical-align: top" | 11/28
'''THANKSGIVING BREAK'''
| style="vertical-align: top" | 11/29
'''THANKSGIVING BREAK'''
|-
| style="vertical-align: top; background: #eee" | 12/02
| style="vertical-align: top; background: #eee" | 12/03
*Run GenMAPP and MAPPFinder '''DW'''
*Customize the Tally Engine for ''C. trachomatis'' '''KW'''
| style="vertical-align: top; background: #eee" | 12/04
*Run GenMAPP and MAPPFinder again if necessary '''DW'''
| style="vertical-align: top; background: #eee" | 12/05
*Begin working on the relational database schema '''HD'''
*Commit the changes to gmbuilder '''KS'''
| style="vertical-align: top; background: #eee" | 12/06
|-
| style="vertical-align: top; background: #eee" | 12/09
*Write the paper '''HD/KS/DW'''
*Work on the presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/10
*Write the paper '''HD/KS/DW'''
*Work on the presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/11
*Finish the presentation '''HD/KS/DW'''
*Practice the final presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/12
*'''Final Presentation'''
*Proofread the final paper '''HD/KS/DW'''
*Make sure all the deliverables are raedy '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/13
|}

==Annotated Bibliography of Genomics Papers for ''C. trachomatis''==

===Whole genome sequencing of ''C. trachomatis''===

#Journal article describing results of whole genome sequencing:
#*Database: PubMed
#*Search Terms: Chlamydia trachomatis [MeSH Term] AND genome [Title]
#*There were 29 results. Many of the articles that appear using the above search terms have to do with performing whole-genome analysis of ''C. trachomatis'' to look at genetic polymorphisms in the bacterium and molecular and genetic characteristics that are observed when the bacterium infects a host.
#*Stephens, R.S., Kalman, S., Lammel, C., Fan, J., Marathe, R., Aravind, L., Mitchell, W., Olinger, L., Tatusov, R., Zhao, Q., Koonin, E. V., Davis, R.W. (1998) Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis. ''Science'' '''282''': 754-759. 10.1126/science.282.5389.754.
#*The article can be viewed as an [http://www.sciencemag.org/content/282/5389/754.long HTML].
#*The article can be viewed as a [[Media: Science-1998-Stephens-754-9.pdf | PDF]].
#*This article includes [http://www.sciencemag.org/site/feature/data/982604.xhtml Supplementary Material].
#Use the genome sequencing article you found to perform a prospective search in the ISI Web of Science/Knowledge database.
#*How many results did you get?: 872
#*Based on the titles and abstracts of the papers, what type of research directions have been taken now that the genome for that organism has been sequenced?: Based on the titles and abstracts found in the results, it seems that research has been primarily in the direction of understanding how the disease functions and assessing the severity of the disease (and how to cure it).

===Microarray articles===
The EBI ArrayExpress database was used to locate each of the four following microarray articles. The search terms used were Chlamydia trachomatis under 'By organism' while the rest of the drop down menus were not altered.

The following articles appear in order of preference:

#Omsland, A., Sager, J., Nair, V., Sturdevant, D.E., Hackstadt, T. (2012) [http://www.pnas.org/content/109/48/19781.full Developmental stage-specific metabolic and transcriptional activity of Chlamydia trachomatis in an axenic medium]. ''PNAS'' '''109''': 19781-19785. doi: 10.1073/pnas.1212831109.
#*[[Media: PNAS-2012-Omsland-19781-5.pdf|PDF Version of Paper]]
#*[http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-39530/?keywords=&organism=Chlamydia+trachomatis&array=&exptype%5B%5D=&exptype%5B%5D= Microarray Data]
#*Chlamydiae undergo a biphasic developmental cycle characterized by an infectious cell type known as either an elementary body (EB) and an intracellular replicative form called a reticulate body (RB). Chlamydia was incubated under microaerobic conditions to test the differences in preferred energy source between EB's and RB's.
#*There weren't necessarily a "treatment" group and a "control" group. Referencing figure 4, it can be inferred that EB would be considered the "treatment" group because of the way that the ratio was set up, being EB to RB. In this way the RB would be the "control" group as they are looking at the EB population relative to the RB population.
#*4 Biological Replicates were performed for the control and treatment; as the article states that "Density gradient-purified EBs and RBs were incubated in quadruplicate in four-well plates...".


==Journal Club Presentation on ''C. trachomatis'' Genome Sequencing Paper==

[[Media:Presentation on C.trachomatis Genome Paper KS HD DW 20131112.pdf|Here]] is the PDF version of the presentation given on November 12, 2013 on the Stephens ''et al.'' (1998) paper.


==Important Files==

*[[Media: Go_daily-termdb_v2_HD_20131107.obo-xml.gz|Go_daily-termdb_v2_HD_20131107.obo-xml.gz]]
:GO OBO-XML file for ''C. trachomatis'' serovar A.
*[[Media: Uniprot XML C.trachomatis serovar A KS 20131114.xml|Uniprot XML C.trachomatis serovar A KS 20131114.xml]]
:UniProt XML file for ''C. trachomatis'' serovar A.
*[[Media: 22183.C trachomatis A KS 20131114.goa|22183.C trachomatis A KS 20131114.goa]]
:GOA file for ''C. trachomatis'' serovar A.
*[[Media: Ct-Std v1 KS 20131119.gdb|Ct-Std v1 KS 20131119.gdb]]
:First iteration of the export/import gene database cycle.
*[[Media: Ct-Std KS 20131121.gdb|Ct-Std KS 20131121.gdb]]
:Updated gene database.
*[[Media: Ct-Std v2 KS 20131121.gdb|Ct-Std v2 KS 20131121.gdb]]
:New gene database created with the new version of gmbuilder built on 11/21/2013 using the custom species profile for ''C. trachomatis.''

'''Array Express Article Raw Files'''

*[[Media:Raw_Data_DW_V1_12112013_A-GEOD-4692.adf.txt|Raw_Data_DW_V1_12112013_A-GEOD-4692.adf.txt]]
:Microarray raw ADF file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.idf.txt|Raw_Data_DW_V1_12112013_E-GEOD-39530.idf.txt]]
:Microarray raw IDF file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.processed.1.zip|Raw_Data_DW_V1_12112013_E-GEOD-39530.processed.1.zip]]
:Microarray raw processed zip file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.raw.1.zip|Raw_Data_DW_V1_12112013_E-GEOD-39530.raw.1.zip]]
:Microarray raw zip file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.sdrf.txt|Raw_Data_DW_V1_12112013_E-GEOD-39530.sdrf.txt]]
:Microarray SDRF text file for ''C. trachomatis''.
*[[Media:Sdrf information excel.xls|Sdrf information excel.xls]]
:sdrf information in excel spreadsheet format.

'''Excel Master Spreadsheet File Versions (Containing the Formatted Raw Data)'''

*[[Media:Master Spreadsheet.xls|master spreadsheet work in progress]]
:Initial Microarray raw data formatted and opened in Microsoft Excel.
*[[Media: C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.xls|C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.xls]]
:Sample C. trachomatis microarray data that was used to run GenMAPP with the latest ''C. trachomatis'' database.
*[[Media: C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt|C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt]]
:Tab-delimited version of the file above.
*[[Media: Final Master Spreadsheet.xls|Final Master Spreadsheet]]
:Microarray data with identified Gene ID's and Log base data.
*[[Media: Master_Spreadsheet_Chlamydia_20131125_KD.xls | Master_Spreadsheet_Chlamydia_20131125_KD.xls]]
:version edited during office hours by Dr. Dahlquist to demonstrate statistics
*[[Media: Master Spreadsheet Chlamydia V1 20131125 DW.xls]]
:finished version of version edited during office hours by Dr. Dahlquist to demonstrate statistics -- finished by Dillon Williams
*[[Media: Copy of Master Spreadsheet Chlamydia V2 20131203 HD.xls]]
:Microarray data with separated IDs CTA_#### from _RRMH#####_...
*[[Media: For GenMAPP Chlamydia V4 20131203 DW.xls]]
:Excel Speadsheet with data formatted for GenMAPP (Version run in GenMAPP).

'''For GenMAPP Files'''
*[[Media: Ctrachomatis EbtoRb ATP metabolic process.mapp]]
: EB to RB ATP Metabolic Process Mapp file.
*[[Media: Ctrachomatis EbtoRB glucose catabolic process 20131204 DW.mapp]]
: EB to RB glucose catabolic process map.
*[[Media: CTrachomatis EBtoRB Rifampicin 20131204 DW-Criterion1-GO.txt]]
: EB to RB Rifampicin go.txt file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.gex]]
: For_GenMAPP .gex file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.gmf]]
: For GenMAPP .gmf file.
*[[Media: For GenMAPP Chlamydia V4 20131204 KS.zip]]
: Access file created with .gex file.

==Wiki Formatting==

Template: [[Template:Team_H(oo)KD]]

[[Category: Group Projects]]
[[Category: Team H(oo)KD]]

Team H(oo)KD Week 15 Status Report

2013-12-05T18:22:32Z

Ksherbina: Added sections

{{Team H(oo)KD}}

==Coder Status Report==

===Reflection===

==Quality Assurance Status Report==

===Reflection===

==GenMAPP User Status Report==

===Reflection===

Team H(oo)KD Week 15 Status Report

2013-12-05T18:17:14Z

Ksherbina: Added template

Template:Team H(oo)KD

2013-12-05T18:16:50Z

Ksherbina: Added link to group status report for week 15

Chlamydia trachomatis

2013-12-03T04:03:19Z

Ksherbina: /* Important Files */ Moved around some links and renamed some links

{{Team_H(oo)KD}}

==Deadlines and Intermediate Milestones==

{| style="width: 100%; margin: 1em 0" cellpadding="5ex"
! style="width: 8em; background: #ddd" | Monday
! style="width: 8em; background: #ddd" | Tuesday
! style="width: 8em; background: #ddd" | Wednesday
! style="width: 8em; background: #ddd" | Thursday
! style="width: 8em; background: #ddd" | Friday
|-
| style="vertical-align: top" | 11/11
| style="vertical-align: top" | 11/12
| style="vertical-align: top" | 11/13
| style="vertical-align: top" | 11/14
*Run an export/import of gene database cycle '''KS/HD'''
*Create species profile '''KS'''
*Obtain and format raw microarray data '''DW'''
| style="vertical-align: top" | 11/15
*Rerun export/import of gene database cycle '''KS'''
|-
| style="vertical-align: top; background: #eee" | 11/18
| style="vertical-align: top; background: #eee" | 11/19
*Vet the gene database exported last week '''KS/HD/DW'''
*Format the raw microarray data '''DW'''
| style="vertical-align: top; background: #eee" | 11/20
| style="vertical-align: top; background: #eee" | 11/21
*Perform microarray statistical analysis '''DW'''
*Work on custom species profile '''KS/HD'''
| style="vertical-align: top; background: #eee" | 11/22
|-
| style="vertical-align: top" | 11/25
| style="vertical-align: top" | 11/26
*Build and commit species profile '''KS'''
*Customize Tally Engine '''KS/HD'''
*Perform GenMAPP and MAPPFinder Analysis '''DW'''
| style="vertical-align: top" | 11/27
'''THANKSGIVING BREAK'''
| style="vertical-align: top" | 11/28
'''THANKSGIVING BREAK'''
| style="vertical-align: top" | 11/29
'''THANKSGIVING BREAK'''
|-
| style="vertical-align: top; background: #eee" | 12/02
| style="vertical-align: top; background: #eee" | 12/03
*Work on the final paper outline '''DW'''
*Document the relational database schema '''HD'''
*Commit changes to GenMAPP '''KS'''
| style="vertical-align: top; background: #eee" | 12/04
| style="vertical-align: top; background: #eee" | 12/05
*Begin writing the paper '''DW/KS/HD'''
*Begin working on the presentation '''DW/KS/HD'''
| style="vertical-align: top; background: #eee" | 12/06
|-
| style="vertical-align: top; background: #eee" | 12/09
| style="vertical-align: top; background: #eee" | 12/10
*Continue writing the paper '''HD/KS/DW'''
*Continue working on the presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/11
*Practice the final presentation '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/12
*'''Final Presentation'''
*Proofread the final paper '''HD/KS/DW'''
| style="vertical-align: top; background: #eee" | 12/13
|}

==Annotated Bibliography of Genomics Papers for ''C. trachomatis''==

===Whole genome sequencing of ''C. trachomatis''===

#Journal article describing results of whole genome sequencing:
#*Database: PubMed
#*Search Terms: Chlamydia trachomatis [MeSH Term] AND genome [Title]
#*There were 29 results. Many of the articles that appear using the above search terms have to do with performing whole-genome analysis of ''C. trachomatis'' to look at genetic polymorphisms in the bacterium and molecular and genetic characteristics that are observed when the bacterium infects a host.
#*Stephens, R.S., Kalman, S., Lammel, C., Fan, J., Marathe, R., Aravind, L., Mitchell, W., Olinger, L., Tatusov, R., Zhao, Q., Koonin, E. V., Davis, R.W. (1998) Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis. ''Science'' '''282''': 754-759. 10.1126/science.282.5389.754.
#*The article can be viewed as an [http://www.sciencemag.org/content/282/5389/754.long HTML].
#*The article can be viewed as a [[Media: Science-1998-Stephens-754-9.pdf | PDF]].
#*This article includes [http://www.sciencemag.org/site/feature/data/982604.xhtml Supplementary Material].
#Use the genome sequencing article you found to perform a prospective search in the ISI Web of Science/Knowledge database.
#*How many results did you get?: 872
#*Based on the titles and abstracts of the papers, what type of research directions have been taken now that the genome for that organism has been sequenced?: Based on the titles and abstracts found in the results, it seems that research has been primarily in the direction of understanding how the disease functions and assessing the severity of the disease (and how to cure it).

===Microarray articles===
The EBI ArrayExpress database was used to locate each of the four following microarray articles. The search terms used were Chlamydia trachomatis under 'By organism' while the rest of the drop down menus were not altered.

The following articles appear in order of preference:

#Omsland, A., Sager, J., Nair, V., Sturdevant, D.E., Hackstadt, T. (2012) [http://www.pnas.org/content/109/48/19781.full Developmental stage-specific metabolic and transcriptional activity of Chlamydia trachomatis in an axenic medium]. ''PNAS'' '''109''': 19781-19785. doi: 10.1073/pnas.1212831109.
#*[[Media: PNAS-2012-Omsland-19781-5.pdf|PDF Version of Paper]]
#*[http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-39530/?keywords=&organism=Chlamydia+trachomatis&array=&exptype%5B%5D=&exptype%5B%5D= Microarray Data]
#*Chlamydiae undergo a biphasic developmental cycle characterized by an infectious cell type known as either an elementary body (EB) and an intracellular replicative form called a reticulate body (RB). Chlamydia was incubated under microaerobic conditions to test the differences in preferred energy source between EB's and RB's.
#*There weren't necessarily a "treatment" group and a "control" group. Referencing figure 4, it can be inferred that EB would be considered the "treatment" group because of the way that the ratio was set up, being EB to RB. In this way the RB would be the "control" group as they are looking at the EB population relative to the RB population.
#*4 Biological Replicates were performed for the control and treatment; as the article states that "Density gradient-purified EBs and RBs were incubated in quadruplicate in four-well plates...".


==Journal Club Presentation on ''C. trachomatis'' Genome Sequencing Paper==

[[Media:Presentation on C.trachomatis Genome Paper KS HD DW 20131112.pdf|Here]] is the PDF version of the presentation given on November 12, 2013 on the Stephens ''et al.'' (1998) paper.


==Important Files==

*[[Media: Go_daily-termdb_v2_HD_20131107.obo-xml.gz|Go_daily-termdb_v2_HD_20131107.obo-xml.gz]]
:GO OBO-XML file for ''C. trachomatis'' serovar A.
*[[Media: Uniprot XML C.trachomatis serovar A KS 20131114.xml|Uniprot XML C.trachomatis serovar A KS 20131114.xml]]
:UniProt XML file for ''C. trachomatis'' serovar A.
*[[Media: 22183.C trachomatis A KS 20131114.goa|22183.C trachomatis A KS 20131114.goa]]
:GOA file for ''C. trachomatis'' serovar A.
*[[Media: Ct-Std v1 KS 20131119.gdb|Ct-Std v1 KS 20131119.gdb]]
:First iteration of the export/import gene database cycle.
*[[Media: Ct-Std KS 20131121.gdb|Ct-Std KS 20131121.gdb]]
:Updated gene database.
*[[Media: Ct-Std v2 KS 20131121.gdb|Ct-Std v2 KS 20131121.gdb]]
:New gene database created with the new version of gmbuilder built on 11/21/2013 using the custom species profile for ''C. trachomatis.''

'''Array Express Article Raw Files'''

*[[Media:Raw_Data_DW_V1_12112013_A-GEOD-4692.adf.txt|Raw_Data_DW_V1_12112013_A-GEOD-4692.adf.txt]]
:Microarray raw ADF file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.idf.txt|Raw_Data_DW_V1_12112013_E-GEOD-39530.idf.txt]]
:Microarray raw IDF file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.processed.1.zip|Raw_Data_DW_V1_12112013_E-GEOD-39530.processed.1.zip]]
:Microarray raw processed zip file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.raw.1.zip|Raw_Data_DW_V1_12112013_E-GEOD-39530.raw.1.zip]]
:Microarray raw zip file for ''C. trachomatis''.
*[[Media:Raw_Data_DW_V1_12112013_E-GEOD-39530.sdrf.txt|Raw_Data_DW_V1_12112013_E-GEOD-39530.sdrf.txt]]
:Microarray SDRF text file for ''C. trachomatis''.
*[[Media:Sdrf information excel.xls|Sdrf information excel.xls]]
:sdrf information in excel spreadsheet format.

'''Excel Master Spreadsheet'''

*[[Media:Master Spreadsheet.xls|master spreadsheet work in progress]]
:Initial Microarray raw data formatted and opened in Microsoft Excel.
*[[Media: C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.xls|C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.xls]]
:Sample C. trachomatis microarray data that was used to run GenMAPP with the latest ''C. trachomatis'' database.
*[[Media: C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt|C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt]]
:Tab-delimited version of the file above.
*[[Media: Final Master Spreadsheet.xls|Final Master Spreadsheet]]
:Microarray data with identified Gene ID's and Log base data.
*[[Media: Master_Spreadsheet_Chlamydia_20131125_KD.xls | Master_Spreadsheet_Chlamydia_20131125_KD.xls]]
** version edited during office hours by Dr. Dahlquist to demonstrate statistics

==Wiki Formatting==

Template: [[Template:Team_H(oo)KD]]

[[Category: Group Projects]]
[[Category: Team H(oo)KD]]

Ksherbina Project Notebook

2013-11-28T23:49:11Z

Ksherbina: /* Testing Report for Finalized Gene Database */ Filled out the testing report regarding the gene count

{{Ksherbina}}

==Week 12==

===Starting the Set Up of the Testing Environment===

#Downloaded the following software onto my personal computer:
#*Latest version of GenMAPP Builder (gmbuilder2.0-b71) from [http://sourceforge.net/projects/xmlpipedb/files/?source=navbar SourceForge]
#*[http://www.oracle.com/technetwork/java/javase/downloads/index.html Java SE 7u45]
#*[http://www.eclipse.org/downloads/ Eclipse IDE for Java EE Developers]
#Created an account on [http://sourceforge.net/ SourceForge].

===Import/Export of Gene Database===

#Downloaded the [http://www.uniprot.org/uniprot/?query=organism:315277+keyword:181 UniProt XML] for ''C. trachomatis'' serovar A strain HAR-13: [[Media:Uniprot XML C.trachomatis serovar A KS 20131114.xml|Uniprot XML C.trachomatis serovar A KS 20131114.xml]]
#Downloaded the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ GOA file] for ''C. trachomatis'' serovar A: [[Media:22183.C trachomatis A KS 20131114.goa|22183.C trachomatis A KS 20131114.goa]]
#Created a new database in pgAdminIII: CT_KS_20131114_gmb2b71
#*Created tables in the database by executing the script in the file ''gmbuilder.sql''.
#Launched ''gmbuilder-32bit.bat'':
#Configured the database in gmbuilder:
#*Host or address: localhost
#*Port number: 5432
#*Database name: CT_KS_20131114_gmb2b71
#*Username: postgres
#*Password: <password of the PostgreSQL database created above>
#Imported the UniProt XML file into the PostgreSQL database through gmbuilder.
#Imported the OBO-XML file into the PostgreSQL database through gmbuilder.
#*gmbuilder was not able to completely process the data. See the testing report below for details.
#Imported the GOA file into into the PostgreSQL database through gmbuilder.
#*I did not begin the actual export of the database due to the problem with processing the OBO-XML file.

====Testing Report====

Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131114_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_11
* Time taken to import: 1.70 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped):
* Time taken to import: 9.29 min
* Time taken to process: '''Could no complete the process. The following error came up when gmbuilder was processing the 81000 term:'''
ERROR edu.lmu.xmlpipedb.gmbuilder.GenMAPPBuilder - java.util.concurrent.ExecutionException:java.lang.OutOfMemoryError:
Java heap space

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]):
* Time taken to import: 0.02 min

Name of .gdb file:
* Time taken to export .gdb:
* Upload your file and link to it here.

Note:

===Creating Custom Species Profile===

#On one of the class computers in the first row, I launched Eclipse with Subclipse.
#Went to Window > Open Perspective > Other > SVN Repository Exploring.
#Defined a new subversion by clicking on ''Add Repository'' and adding the URL https://svn.code.sf.net/p/xmlpipedb/code. Clicked Finish.
#Double-clicked on trunk, right-clicked on gmbuilder, and chose Checkout....
#Created a new project (chose Check out as a project configured using the New Project Wizard).
#Went to Window > Preferences > Java > Installed JREs. Added a new environment with JDK.
#Double-clicked on the project folder.
#Double-clicked on the lib folder.
#Shift-clicked and selected all files in the lib folder and then control-clicked on every file inside the lib folder whose name does not end in .jar. Right-click on one of them and choose Build Path > Add to Build Path from the popup menu that comes up.
#Created a species profile called ChlamydiaTrachomatisSerovarA.
#*Dondi modified the name to ChlamydiaTrachomatisSerovarAUniProtSpeciesProfile.
#Customized the species profile.
#*Opened the new profile and added the lines of code as specified in [[Coder|"Customize the Species Profile"]] directions.
#**Still need to insert the species-specific URL that returns a web page describing a gene for that species.
#Committed the profile following the directions in [[Coder|"Updating and Committing Code"]].

===Reflection===

#What were the week’s key accomplishments?
#*One of this week's key accomplishments was downloading the files necessary to run a import/export gene database cycle and the microarray files that will be analyzed. Also this week, we were able to start an import/export cycle.
#What are next week’s target accomplishments?
#*Next week, I would like to complete an import/export cycle and vet it with the QA person. In addition, I hope that we can download the software necessary to open the microarray files, format the files, and perform statistical analysis.
#What team strengths were seen this week?
#*I believe our strength this week as a team was meeting outside of class to create milestones. This meeting results in a detailed calendar of intermediate milestones that we plan to meet accomplish in order to meet the file deadline.
#What team weaknesses were seen this week?
#*It was difficult to set up the aforementioned meeting due to our busy schedules. Additionally, we did not have a clear plan for working on the project until the middle/end of the week. In the coming weeks, our goal will be to improve our work plan for each class period so as to minimize the difficulty of scheduling work meetings outside of class.

==Week 13==

===November 15, 2013===

====Import/Export Gene Database====

After the error that came up last week regarding the Java heap space, I performed the following steps to increase the heap space:
#Open up ''gmbuilder-32bit.bat'' in Notepad.
#Find the line
"C:\Program Files (x86)\Java\jre7\bin\java" -D32 -Xmx1024m -jar gmbuilder.jar
#Changed "-Xmx1024m" to "-Xmx2048m" thereby increasing the maximum heap space from 1024 MB to 2048 MB.
#Saved the file and closed it.

*Then, I tried the import/export again. I could not open ''gmbuilder-32bit.bat'' after I changed the maximum heap space. I repeated the above steps for ''gmbuilder.bat'' and found that I could then still open the program.
*Opening ''gmbuilder.bat'', I imported both the UniProt XML file and the OBO-XML file. However, I received a Java heap space error again when processing the OBO-XML file.
*I increased the heap space in ''gmbuilder.bat'' from 2048 MB to 4096 MB and tried the import/export cycle again. When this did not work, I increased the heap space in ''gmbuilder.bat'' to 8192 MB.
*Then, I was able to import the UniProt XML, OBO-XML, and GOA files as well as process the OBO-XML.
*However, I was not able to begin the export because an error popped up stating that an export to a GenMAPP database is only possible in ''gmbuilder-32bit.bat''.

=====Testing Report=====

Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131114_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]):
* Time taken to import: 0.73 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped):
* Time taken to import: 7.39 min
* Time taken to process: 47.15 min

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]):
* Time taken to import: 0.03 min

Name of .gdb file:
* Time taken to export .gdb:
* Upload your file and link to it here.

Note: '''Could not export the database. Received an error stating that an export to a GenMAPP database is only possible in ''gmbuilder-32bit.bat''.'''

====Troubleshooting====

#Ran cmd from the Start menu.
#cd to where GenMAPP Builder is located.
#Typed the same command that is in the .bat file, but directly at the prompt:
"C:\Program Files (x86)\Java\jre7\bin\java" -D32 -Xmx1536m -jar gmbuilder.jar

Received the following error:
Error occurred during initialization of VM.
Could not reserve enough space for project heap.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit

===November 19, 2013===

====Rerunning the Gene Database Import/Export====

*From November 11/15/2013, the UniProt XML, OBO-XML, and GOA files were imported and the OBO-XML file was processed the in 64-bit ''gmbuilder.bat'' with the maximum heap space set to 8192.
*Performed the export in 32-bit ''gmbuilder.bat''.

=====Testing Report=====

Version of GenMAPP Builder: Same as in previous testing report

Computer on which export was run: Same as in previous testing report

Postgres Database name: Same as in previous testing report

UniProt XML filename: Same as in previous testing report
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]):
* Time taken to import: 1.10 min

GO OBO-XML filename: Same as in previous testing report
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped):
* Time taken to import: 8.96 min
* Time taken to process: Did not reprocess because received a message stating that the GO data had already been processed.

GOA filename: Same as in previous testing report
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]):
* Time taken to import: 0.03 min

Name of .gdb file: Ct-Std v1 KS 20131119.gdb
* Time taken to export .gdb:
:*Start Time: 4:01:10 AM
:*End Time: 6:51:58 AM
* Upload your file and link to it here: [[Media:Ct-Std v1 KS 20131119.gdb|Ct-Std v1 KS 20131119.gdb]]

Note:

====Checking the Quality of the Exported Database====

#In ''gmbuilder-32bit.bat'', chose Run XML > Database Tallies for UniProt and GO....
:*The XML count for the UniProt file was 917 while the Database Count for the Uniprot file was 10087. This number discrepancy is likely due to importing the UniProt, OBO-XML, and GOA files multiple times in experimenting with the heap space threshold.
:*A new database must now be created in pgAdminIII in order to repeat the export/import gene database cycle.

===November 20-21, 2013===

====Finishing Installing Software to Set Up the Development Environment====

#Navigated to the "Download and Install" tab on the [http://subclipse.tigris.org/ Subclipse home page]
#Copied the Eclipse update site URL for the 1.10.x Release.
#Opened up Eclipse.
#Help > Install New Software...
#Pasted the URL into the "Work with" field.
#Checked the box next to "Subclipse" and "SVNKit".
#Clicked on Next and went through the process of installing the software.

====Repeating the Gene Database Import/Export Cycle====

#Created a new database in pgAdminIII.
#*Created tables in the database by executing the script in the file gmbuilder.sql.
#Since the aforementioned error occurred because duplicates were introduced into the database through repeated imports, I decided to try this new import-export cycle using 32-bit ''gmbuilder''.
#Opened ''gmbuilder-32bit.bat'' with the heap space set to 1024 MB and configured the new database.
#*Host or address: localhost
#*Port number: 5432
#*Database name: CT_KS_20131119_32bit_gmb2b71
#*Username: postgres
#*Password: <password of the PostgreSQL database created above>
#Imported the UniProt XML file into PostgreSQL through ''gmbuilder-32bit.bat''.
#Imported the OBO-XML file into PostgreSQL through ''gmbuilder-32bit.bat''.
#Processed the OBO-XML file through ''gmbuilder-32bit.bat''.
#Imported the GOA file into PostgreSQL through ''gmbuilder-32bit.bat''.

=====Testing Report=====
Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131119_32bit_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_11
::*Original file name from [http://www.uniprot.org/uniprot/?query=organism:315277+keyword:181 UniProt site]: uniprot-organism%3A315277+keyword%3A181.xml
* Time taken to import: 1.20 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): 11/06/2013
::*Original file name: go_daily-termdb.obo-xml.gz
* Time taken to import: 13.05 min
* Time taken to process: 10.80 min

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): 11/12/13
::*Original file name: 22183.C_trachomatis_A.goa
* Time taken to import: 0.04 min

Name of .gdb file: Ct-Std KS 20131121.gdb
* Time taken to export .gdb:
:*Start Time: 2:37:42 AM
:*End Time: 2:55:45 AM
* Upload your file and link to it here: [[Media:Ct-Std KS 20131121.gdb|Ct-Std KS 20131121.gdb]]

Note: '''There were no heap space errors this time, which is good. The errors probably occurred in the first place because of repeatedly importing the files.'''

===November 21, 2013===

====Revised the Custom Species Profile====

#Ran Eclipse.
#Navigated to Window > Open Perspective > Other... and chose SVN Repository Exploring
#Defined a new Subversion repository by clicking on the Add Repository button.
#Set the URL to https://svn.code.sf.net/p/xmlpipedb/code then clicked Finish.
#Double-cliked on https://svn.code.sf.net/p/xmlpipedb/code in the list.
#Double-clicked on ''trunk''.
#Right-clicked on ''gmbuilder'' and chose ''Checkout....''
#Chose ''Check out as a project configured using the New Project Wizard'' then click ''Finish''.
#In the New Project dialog that opened, chose Java Project from the list and then clicked ''Next >''.
#Entered the new project name ''xmlpipedb-gmbuilder'' and clicked ''Finish''.
#Set Eclipse to use JDK.
#*Navigated to Windows > Preferences and clicked on the ''Search'' button.
#*Expanded Computer > Local Disk (C:) > Program Files > Java > jdk1.7.0_45 and clicked OK.
#*Highlighted the row ''jdk'' and clicked on ''Edit''.
#*In "JRE home:" field, removed everything in the name after jdk1.7.0_45 and then clicked ''Finish''.
#Made sure that I was in the Java perspective by navigating to Window > Open Perspective > Other... and then choosing Java.
#Double-clicked on the ''xmlpipedb-gmbuilder'' folder.
#Double-clicked on the ''lib'' folder.
#Shift-clicked and selected all files in the lib folder and then control-clicked on every file inside the lib folder whose name did not end in .jar.
#Right-clicked on one of the files and choose Build Path > Add to Build Path.
#Check that the ''src'' folder was set to the source folder by right-clicking on it and then clicking on ''Build Path''.
#Right-clicked on the ''test'' folder and chose Build Path > Use as Source Folder. Then, made sure that there were no red x's in the list of folders.
#Right-clicked on ''build.xml'' toward the bottom of the list. Then, navigated to Team > Synchronize with Repository.
#Double-clicked on src > edu.lmu.xmlpipedb.gmbuilder.databasetoolkit.profiles > ChlamydiaTrachomatisSerovarAUniProtSpeciesProfile.java.\
#Within the code, added the gene database link pattern: "http://bacteria.ensembl.org/chlamydia_trachomatis_a_har_13/Gene/Summary?g=~" and then saved the code.
#Right-clicked on ''build.xml'' and chose Run As > Ant Build... (the one with the ellipses).
#In the Edit Configuration dialog that appears, checked on the clean and dist items in the Targets tab.
#Clicked on the ''Order...'' button and rearranged the items so that clean is first and dist is second and then clicked OK.
#Clicked the Run button.
#Right-clicked on the ''xmlpipedb-gmbuilder'' project folder and chose Refresh (F5 is its keyboard shortcut).

====Reran the Database Export With the New Build for gmbuilder====

#Made sure that pgAdminIII and the database "CT_KS_20131119_32bit_gmb2b71" is open.
#Opened the newly built ''gmbuilder-32bit.bat'' within the folder [User Folder] > workspace > xmlpipedb-gmbuilder > dist.
#Made sure that the database was configured to "CT_KS_20131119_32bit_gmb2b71".
#Performed a database export.

=====Testing Report=====

*Name of .gdb file: Ct-Std_v2_KS_20131121.gdb
*Time taken to export .gdb:
:*Start Time: 10:56 AM (approximately)
:*End Time: 11:14:26 AM
*Upload your file and link to it here: [[Media:Ct-Std v2 KS 20131121.gdb|Ct-Std v2 KS 20131121.gdb]]

Note:

====Testing the New Database with GenMAPP====
#Opened GenMAPP.
#Clicked on Data > New Gene Database and selected the gene database that was created today (Ct-Std v2 KS 20131121.gdb).
#On the toolbar, clicked on the button that has the word gene in a box.
#Clicked and dragged the cursor on the blank canvas to create a new gene box.
#Right-clicked on the gene box.
#In the pop-up window, typed in "CTA_0588" in the Gene ID search field.
#Under Gene ID System, selected Ordered Locus Names.
#Then, hit the "Search button". Interestingly, GenMAPP builder did not find this gene in the database.
#*To troubleshoot, I opened up the gene database in Access.
#*In the list of tables, I double-clicked on OrderedLocusNames.
#*I sorted the list of gene IDs from A-Z and then scrolled down to try to find "CTA_0588". I was able to find "CTA_0587" and "CTA_0589" but not "CTA_0588".
#Back in GenMAPP, I used the same steps as above to find "CTA_0587".
#"CTA_0587" was found in the database. In the window, I clicked on the link "CTA_0587" under the OrderedLocusNames section.
#*This link took me to the entry for this gene on the Model Organism Database (http://bacteria.ensembl.org/chlamydia_trachomatis_a_har_13/Gene/Summary?g=CTA_0587).

====Performing Quality Assurance on the New Database====

*Performed counts with Tally Engine.
[[Image:Tally Engine Results for Ct-Std v2 KS 20131121.jpg|thumb|none|upright=4]]
*Counted the number of unique gene IDs using xmlpipdb ''match'':
java -jar xmlpipedb-match-1.1.1.jar "CTA_[0-9][0-9][0-9][0-9]" < Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
:*'''Total unique matches = 911'''
*Counted the number of unique gene IDs using an SQL query:
select count(*) from genenametype where type = 'ordered locus' and value ~ 'CTA_[0-9][0-9][0-9][0-9]';
:*'''Count - 917'''
*Opened the database in Access. Then, double-clicked on the table "OriginalRowCounts".
{| style = "width:20%; font-size:100%"
| '''Table'''
| '''Rows'''
|-
| OrderedLocusNames
| 919
|-
| UniProt
| 917
|-
| UniProt-OrderedLocusNames
| 919
|}
*Double-clicked on the table "OrderedLocusNames" and scrolled all the way to the bottom. We found that some gene ID's had the format pCTA_####.
*Went back to the command prompt and ran the following command:
java -jar xmlpipedb-match-1.1.1.jar "[p|]CTA_[0-9][0-9][0-9][0-9]" < Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
:*'''Total unique matches = 8'''
:*Adding the results of the two runs with xmlpipedb match, we got 919 unique gene IDs, which is the same number as that in Access.

==Week 14==

===November 24, 2013===

====Ran GenMAPP Using ''C. trachomatis'' Gene Database====

#Opened GenMAPP.
#Made sure that the database was set to Ct-Std_v2_KS_20131121.gdb.
#Imported [[Media:C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt|sample microarray data]] into GenMAPP.
#*Loaded the microarray data by navigating to Expression Dataset Manager > New Dataset.
#*There were 23015 errors.
#Opened the exceptions file that was created to inspect the errors and scrolled down to where the IDs CTA_#### begin.
#*The gene IDs are actually in the format CTA_####_RRMH#####_at.
#Tried to search for a sample ID in the format found in the exceptions file in the UniProt database or any NCBI databases.
#*Could not find CTA_0806_RRMH00303, RRMH00303, CTA_0806_RRMH00303_at, or RRMH00303_at in the UniProt database.
#*Could not find CTA_0806_RRMH00303, RRMH00303, CTA_0806_RRMH00303_at, or RRMH00303_at in any database through NCBI.
#*My suspicion is that the RRMH#####_at portion of the ID is an identifier that is appended to each probe on the Affymetrix microarray when it is read by the company's software.

===November 27-28, 2013===

====Running SQL Queries to Determine Discrepancy in the Count Between the UniProt XML and the Gene Database====

*Ran an SQL query to determine if there were any ordered locus names in the database that do not match the ID "CTA_####".
select * from genenametype where type = 'ordered locus' and not value ~ 'CTA_[0-9][0-9][0-9][0-9]';
:*There were no hits.
*Ran an SQL query to show all the ordered locus names in the database that have the ID "CTA_####".
select * from genenametype where type = 'ordered locus' and value ~ 'CTA_[0-9][0-9][0-9][0-9]';
:*Scrolling through the results, I saw the ID "CTA_0406/CTA_0407/CTA_0408".
:*The [http://www.uniprot.org/manual/gene_name UniProt manual] stated that ordered locus names that are connected by a slash indicate two predicted genes that are actually one gene.
:*The combination of these IDs explain why two less IDs show up in the count by the Tally Engine in comparison to the SQL query.
*Opened the gene database within Access and filtered the ordered locus names to show only those that match the ID "CTA_0407".
:*The ID appeared on its own without being joined to other ID's with slashes.

====Testing Report for Finalized Gene Database====

Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131119_32bit_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_11
::*Original file name from [http://www.uniprot.org/uniprot/?query=organism:315277+keyword:181 UniProt site]: uniprot-organism%3A315277+keyword%3A181.xml
* Time taken to import: 1.20 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): 11/06/2013
::*Original file name: go_daily-termdb.obo-xml.gz
* Time taken to import: 13.05 min
* Time taken to process: 10.80 min

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): 11/12/13
::*Original file name: 22183.C_trachomatis_A.goa
* Time taken to import: 0.04 min

*Name of .gdb file: Ct-Std_v2_KS_20131121.gdb
*Time taken to export .gdb:
:*Start Time: 10:56 AM (approximately)
:*End Time: 11:14:26 AM
*Upload your file and link to it here: [[Media:Ct-Std v2 KS 20131121.gdb|Ct-Std v2 KS 20131121.gdb]]

Note: There is one ID "CTA_0406/CTA_0407/CTA_0408" that is a combination of three predicted genes that are actually one gene. The Tally Engine counts this as one gene that contributes to the total count of 917. However, through Access and xmlpipedb ''match'', the ID is separated into three separate genes bringing the total count to 919.
[[Category:Journal Entry]]
[[Category:Individual Homework]]

Ksherbina Project Notebook

2013-11-28T23:43:39Z

Ksherbina: /* Testing Report for Finalized Gene Database */ Started the testing report

{{Ksherbina}}

==Week 12==

===Starting the Set Up of the Testing Environment===

#Downloaded the following software onto my personal computer:
#*Latest version of GenMAPP Builder (gmbuilder2.0-b71) from [http://sourceforge.net/projects/xmlpipedb/files/?source=navbar SourceForge]
#*[http://www.oracle.com/technetwork/java/javase/downloads/index.html Java SE 7u45]
#*[http://www.eclipse.org/downloads/ Eclipse IDE for Java EE Developers]
#Created an account on [http://sourceforge.net/ SourceForge].

===Import/Export of Gene Database===

#Downloaded the [http://www.uniprot.org/uniprot/?query=organism:315277+keyword:181 UniProt XML] for ''C. trachomatis'' serovar A strain HAR-13: [[Media:Uniprot XML C.trachomatis serovar A KS 20131114.xml|Uniprot XML C.trachomatis serovar A KS 20131114.xml]]
#Downloaded the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ GOA file] for ''C. trachomatis'' serovar A: [[Media:22183.C trachomatis A KS 20131114.goa|22183.C trachomatis A KS 20131114.goa]]
#Created a new database in pgAdminIII: CT_KS_20131114_gmb2b71
#*Created tables in the database by executing the script in the file ''gmbuilder.sql''.
#Launched ''gmbuilder-32bit.bat'':
#Configured the database in gmbuilder:
#*Host or address: localhost
#*Port number: 5432
#*Database name: CT_KS_20131114_gmb2b71
#*Username: postgres
#*Password: <password of the PostgreSQL database created above>
#Imported the UniProt XML file into the PostgreSQL database through gmbuilder.
#Imported the OBO-XML file into the PostgreSQL database through gmbuilder.
#*gmbuilder was not able to completely process the data. See the testing report below for details.
#Imported the GOA file into into the PostgreSQL database through gmbuilder.
#*I did not begin the actual export of the database due to the problem with processing the OBO-XML file.

====Testing Report====

Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131114_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_11
* Time taken to import: 1.70 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped):
* Time taken to import: 9.29 min
* Time taken to process: '''Could no complete the process. The following error came up when gmbuilder was processing the 81000 term:'''
ERROR edu.lmu.xmlpipedb.gmbuilder.GenMAPPBuilder - java.util.concurrent.ExecutionException:java.lang.OutOfMemoryError:
Java heap space

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]):
* Time taken to import: 0.02 min

Name of .gdb file:
* Time taken to export .gdb:
* Upload your file and link to it here.

Note:

===Creating Custom Species Profile===

#On one of the class computers in the first row, I launched Eclipse with Subclipse.
#Went to Window > Open Perspective > Other > SVN Repository Exploring.
#Defined a new subversion by clicking on ''Add Repository'' and adding the URL https://svn.code.sf.net/p/xmlpipedb/code. Clicked Finish.
#Double-clicked on trunk, right-clicked on gmbuilder, and chose Checkout....
#Created a new project (chose Check out as a project configured using the New Project Wizard).
#Went to Window > Preferences > Java > Installed JREs. Added a new environment with JDK.
#Double-clicked on the project folder.
#Double-clicked on the lib folder.
#Shift-clicked and selected all files in the lib folder and then control-clicked on every file inside the lib folder whose name does not end in .jar. Right-click on one of them and choose Build Path > Add to Build Path from the popup menu that comes up.
#Created a species profile called ChlamydiaTrachomatisSerovarA.
#*Dondi modified the name to ChlamydiaTrachomatisSerovarAUniProtSpeciesProfile.
#Customized the species profile.
#*Opened the new profile and added the lines of code as specified in [[Coder|"Customize the Species Profile"]] directions.
#**Still need to insert the species-specific URL that returns a web page describing a gene for that species.
#Committed the profile following the directions in [[Coder|"Updating and Committing Code"]].

===Reflection===

#What were the week’s key accomplishments?
#*One of this week's key accomplishments was downloading the files necessary to run a import/export gene database cycle and the microarray files that will be analyzed. Also this week, we were able to start an import/export cycle.
#What are next week’s target accomplishments?
#*Next week, I would like to complete an import/export cycle and vet it with the QA person. In addition, I hope that we can download the software necessary to open the microarray files, format the files, and perform statistical analysis.
#What team strengths were seen this week?
#*I believe our strength this week as a team was meeting outside of class to create milestones. This meeting results in a detailed calendar of intermediate milestones that we plan to meet accomplish in order to meet the file deadline.
#What team weaknesses were seen this week?
#*It was difficult to set up the aforementioned meeting due to our busy schedules. Additionally, we did not have a clear plan for working on the project until the middle/end of the week. In the coming weeks, our goal will be to improve our work plan for each class period so as to minimize the difficulty of scheduling work meetings outside of class.

==Week 13==

===November 15, 2013===

====Import/Export Gene Database====

After the error that came up last week regarding the Java heap space, I performed the following steps to increase the heap space:
#Open up ''gmbuilder-32bit.bat'' in Notepad.
#Find the line
"C:\Program Files (x86)\Java\jre7\bin\java" -D32 -Xmx1024m -jar gmbuilder.jar
#Changed "-Xmx1024m" to "-Xmx2048m" thereby increasing the maximum heap space from 1024 MB to 2048 MB.
#Saved the file and closed it.

*Then, I tried the import/export again. I could not open ''gmbuilder-32bit.bat'' after I changed the maximum heap space. I repeated the above steps for ''gmbuilder.bat'' and found that I could then still open the program.
*Opening ''gmbuilder.bat'', I imported both the UniProt XML file and the OBO-XML file. However, I received a Java heap space error again when processing the OBO-XML file.
*I increased the heap space in ''gmbuilder.bat'' from 2048 MB to 4096 MB and tried the import/export cycle again. When this did not work, I increased the heap space in ''gmbuilder.bat'' to 8192 MB.
*Then, I was able to import the UniProt XML, OBO-XML, and GOA files as well as process the OBO-XML.
*However, I was not able to begin the export because an error popped up stating that an export to a GenMAPP database is only possible in ''gmbuilder-32bit.bat''.

=====Testing Report=====

Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131114_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]):
* Time taken to import: 0.73 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped):
* Time taken to import: 7.39 min
* Time taken to process: 47.15 min

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]):
* Time taken to import: 0.03 min

Name of .gdb file:
* Time taken to export .gdb:
* Upload your file and link to it here.

Note: '''Could not export the database. Received an error stating that an export to a GenMAPP database is only possible in ''gmbuilder-32bit.bat''.'''

====Troubleshooting====

#Ran cmd from the Start menu.
#cd to where GenMAPP Builder is located.
#Typed the same command that is in the .bat file, but directly at the prompt:
"C:\Program Files (x86)\Java\jre7\bin\java" -D32 -Xmx1536m -jar gmbuilder.jar

Received the following error:
Error occurred during initialization of VM.
Could not reserve enough space for project heap.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit

===November 19, 2013===

====Rerunning the Gene Database Import/Export====

*From November 11/15/2013, the UniProt XML, OBO-XML, and GOA files were imported and the OBO-XML file was processed the in 64-bit ''gmbuilder.bat'' with the maximum heap space set to 8192.
*Performed the export in 32-bit ''gmbuilder.bat''.

=====Testing Report=====

Version of GenMAPP Builder: Same as in previous testing report

Computer on which export was run: Same as in previous testing report

Postgres Database name: Same as in previous testing report

UniProt XML filename: Same as in previous testing report
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]):
* Time taken to import: 1.10 min

GO OBO-XML filename: Same as in previous testing report
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped):
* Time taken to import: 8.96 min
* Time taken to process: Did not reprocess because received a message stating that the GO data had already been processed.

GOA filename: Same as in previous testing report
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]):
* Time taken to import: 0.03 min

Name of .gdb file: Ct-Std v1 KS 20131119.gdb
* Time taken to export .gdb:
:*Start Time: 4:01:10 AM
:*End Time: 6:51:58 AM
* Upload your file and link to it here: [[Media:Ct-Std v1 KS 20131119.gdb|Ct-Std v1 KS 20131119.gdb]]

Note:

====Checking the Quality of the Exported Database====

#In ''gmbuilder-32bit.bat'', chose Run XML > Database Tallies for UniProt and GO....
:*The XML count for the UniProt file was 917 while the Database Count for the Uniprot file was 10087. This number discrepancy is likely due to importing the UniProt, OBO-XML, and GOA files multiple times in experimenting with the heap space threshold.
:*A new database must now be created in pgAdminIII in order to repeat the export/import gene database cycle.

===November 20-21, 2013===

====Finishing Installing Software to Set Up the Development Environment====

#Navigated to the "Download and Install" tab on the [http://subclipse.tigris.org/ Subclipse home page]
#Copied the Eclipse update site URL for the 1.10.x Release.
#Opened up Eclipse.
#Help > Install New Software...
#Pasted the URL into the "Work with" field.
#Checked the box next to "Subclipse" and "SVNKit".
#Clicked on Next and went through the process of installing the software.

====Repeating the Gene Database Import/Export Cycle====

#Created a new database in pgAdminIII.
#*Created tables in the database by executing the script in the file gmbuilder.sql.
#Since the aforementioned error occurred because duplicates were introduced into the database through repeated imports, I decided to try this new import-export cycle using 32-bit ''gmbuilder''.
#Opened ''gmbuilder-32bit.bat'' with the heap space set to 1024 MB and configured the new database.
#*Host or address: localhost
#*Port number: 5432
#*Database name: CT_KS_20131119_32bit_gmb2b71
#*Username: postgres
#*Password: <password of the PostgreSQL database created above>
#Imported the UniProt XML file into PostgreSQL through ''gmbuilder-32bit.bat''.
#Imported the OBO-XML file into PostgreSQL through ''gmbuilder-32bit.bat''.
#Processed the OBO-XML file through ''gmbuilder-32bit.bat''.
#Imported the GOA file into PostgreSQL through ''gmbuilder-32bit.bat''.

=====Testing Report=====
Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131119_32bit_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_11
::*Original file name from [http://www.uniprot.org/uniprot/?query=organism:315277+keyword:181 UniProt site]: uniprot-organism%3A315277+keyword%3A181.xml
* Time taken to import: 1.20 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): 11/06/2013
::*Original file name: go_daily-termdb.obo-xml.gz
* Time taken to import: 13.05 min
* Time taken to process: 10.80 min

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): 11/12/13
::*Original file name: 22183.C_trachomatis_A.goa
* Time taken to import: 0.04 min

Name of .gdb file: Ct-Std KS 20131121.gdb
* Time taken to export .gdb:
:*Start Time: 2:37:42 AM
:*End Time: 2:55:45 AM
* Upload your file and link to it here: [[Media:Ct-Std KS 20131121.gdb|Ct-Std KS 20131121.gdb]]

Note: '''There were no heap space errors this time, which is good. The errors probably occurred in the first place because of repeatedly importing the files.'''

===November 21, 2013===

====Revised the Custom Species Profile====

#Ran Eclipse.
#Navigated to Window > Open Perspective > Other... and chose SVN Repository Exploring
#Defined a new Subversion repository by clicking on the Add Repository button.
#Set the URL to https://svn.code.sf.net/p/xmlpipedb/code then clicked Finish.
#Double-cliked on https://svn.code.sf.net/p/xmlpipedb/code in the list.
#Double-clicked on ''trunk''.
#Right-clicked on ''gmbuilder'' and chose ''Checkout....''
#Chose ''Check out as a project configured using the New Project Wizard'' then click ''Finish''.
#In the New Project dialog that opened, chose Java Project from the list and then clicked ''Next >''.
#Entered the new project name ''xmlpipedb-gmbuilder'' and clicked ''Finish''.
#Set Eclipse to use JDK.
#*Navigated to Windows > Preferences and clicked on the ''Search'' button.
#*Expanded Computer > Local Disk (C:) > Program Files > Java > jdk1.7.0_45 and clicked OK.
#*Highlighted the row ''jdk'' and clicked on ''Edit''.
#*In "JRE home:" field, removed everything in the name after jdk1.7.0_45 and then clicked ''Finish''.
#Made sure that I was in the Java perspective by navigating to Window > Open Perspective > Other... and then choosing Java.
#Double-clicked on the ''xmlpipedb-gmbuilder'' folder.
#Double-clicked on the ''lib'' folder.
#Shift-clicked and selected all files in the lib folder and then control-clicked on every file inside the lib folder whose name did not end in .jar.
#Right-clicked on one of the files and choose Build Path > Add to Build Path.
#Check that the ''src'' folder was set to the source folder by right-clicking on it and then clicking on ''Build Path''.
#Right-clicked on the ''test'' folder and chose Build Path > Use as Source Folder. Then, made sure that there were no red x's in the list of folders.
#Right-clicked on ''build.xml'' toward the bottom of the list. Then, navigated to Team > Synchronize with Repository.
#Double-clicked on src > edu.lmu.xmlpipedb.gmbuilder.databasetoolkit.profiles > ChlamydiaTrachomatisSerovarAUniProtSpeciesProfile.java.\
#Within the code, added the gene database link pattern: "http://bacteria.ensembl.org/chlamydia_trachomatis_a_har_13/Gene/Summary?g=~" and then saved the code.
#Right-clicked on ''build.xml'' and chose Run As > Ant Build... (the one with the ellipses).
#In the Edit Configuration dialog that appears, checked on the clean and dist items in the Targets tab.
#Clicked on the ''Order...'' button and rearranged the items so that clean is first and dist is second and then clicked OK.
#Clicked the Run button.
#Right-clicked on the ''xmlpipedb-gmbuilder'' project folder and chose Refresh (F5 is its keyboard shortcut).

====Reran the Database Export With the New Build for gmbuilder====

#Made sure that pgAdminIII and the database "CT_KS_20131119_32bit_gmb2b71" is open.
#Opened the newly built ''gmbuilder-32bit.bat'' within the folder [User Folder] > workspace > xmlpipedb-gmbuilder > dist.
#Made sure that the database was configured to "CT_KS_20131119_32bit_gmb2b71".
#Performed a database export.

=====Testing Report=====

*Name of .gdb file: Ct-Std_v2_KS_20131121.gdb
*Time taken to export .gdb:
:*Start Time: 10:56 AM (approximately)
:*End Time: 11:14:26 AM
*Upload your file and link to it here: [[Media:Ct-Std v2 KS 20131121.gdb|Ct-Std v2 KS 20131121.gdb]]

Note:

====Testing the New Database with GenMAPP====
#Opened GenMAPP.
#Clicked on Data > New Gene Database and selected the gene database that was created today (Ct-Std v2 KS 20131121.gdb).
#On the toolbar, clicked on the button that has the word gene in a box.
#Clicked and dragged the cursor on the blank canvas to create a new gene box.
#Right-clicked on the gene box.
#In the pop-up window, typed in "CTA_0588" in the Gene ID search field.
#Under Gene ID System, selected Ordered Locus Names.
#Then, hit the "Search button". Interestingly, GenMAPP builder did not find this gene in the database.
#*To troubleshoot, I opened up the gene database in Access.
#*In the list of tables, I double-clicked on OrderedLocusNames.
#*I sorted the list of gene IDs from A-Z and then scrolled down to try to find "CTA_0588". I was able to find "CTA_0587" and "CTA_0589" but not "CTA_0588".
#Back in GenMAPP, I used the same steps as above to find "CTA_0587".
#"CTA_0587" was found in the database. In the window, I clicked on the link "CTA_0587" under the OrderedLocusNames section.
#*This link took me to the entry for this gene on the Model Organism Database (http://bacteria.ensembl.org/chlamydia_trachomatis_a_har_13/Gene/Summary?g=CTA_0587).

====Performing Quality Assurance on the New Database====

*Performed counts with Tally Engine.
[[Image:Tally Engine Results for Ct-Std v2 KS 20131121.jpg|thumb|none|upright=4]]
*Counted the number of unique gene IDs using xmlpipdb ''match'':
java -jar xmlpipedb-match-1.1.1.jar "CTA_[0-9][0-9][0-9][0-9]" < Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
:*'''Total unique matches = 911'''
*Counted the number of unique gene IDs using an SQL query:
select count(*) from genenametype where type = 'ordered locus' and value ~ 'CTA_[0-9][0-9][0-9][0-9]';
:*'''Count - 917'''
*Opened the database in Access. Then, double-clicked on the table "OriginalRowCounts".
{| style = "width:20%; font-size:100%"
| '''Table'''
| '''Rows'''
|-
| OrderedLocusNames
| 919
|-
| UniProt
| 917
|-
| UniProt-OrderedLocusNames
| 919
|}
*Double-clicked on the table "OrderedLocusNames" and scrolled all the way to the bottom. We found that some gene ID's had the format pCTA_####.
*Went back to the command prompt and ran the following command:
java -jar xmlpipedb-match-1.1.1.jar "[p|]CTA_[0-9][0-9][0-9][0-9]" < Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
:*'''Total unique matches = 8'''
:*Adding the results of the two runs with xmlpipedb match, we got 919 unique gene IDs, which is the same number as that in Access.

==Week 14==

===November 24, 2013===

====Ran GenMAPP Using ''C. trachomatis'' Gene Database====

#Opened GenMAPP.
#Made sure that the database was set to Ct-Std_v2_KS_20131121.gdb.
#Imported [[Media:C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt|sample microarray data]] into GenMAPP.
#*Loaded the microarray data by navigating to Expression Dataset Manager > New Dataset.
#*There were 23015 errors.
#Opened the exceptions file that was created to inspect the errors and scrolled down to where the IDs CTA_#### begin.
#*The gene IDs are actually in the format CTA_####_RRMH#####_at.
#Tried to search for a sample ID in the format found in the exceptions file in the UniProt database or any NCBI databases.
#*Could not find CTA_0806_RRMH00303, RRMH00303, CTA_0806_RRMH00303_at, or RRMH00303_at in the UniProt database.
#*Could not find CTA_0806_RRMH00303, RRMH00303, CTA_0806_RRMH00303_at, or RRMH00303_at in any database through NCBI.
#*My suspicion is that the RRMH#####_at portion of the ID is an identifier that is appended to each probe on the Affymetrix microarray when it is read by the company's software.

===November 27-28, 2013===

====Running SQL Queries to Determine Discrepancy in the Count Between the UniProt XML and the Gene Database====

*Ran an SQL query to determine if there were any ordered locus names in the database that do not match the ID "CTA_####".
select * from genenametype where type = 'ordered locus' and not value ~ 'CTA_[0-9][0-9][0-9][0-9]';
:*There were no hits.
*Ran an SQL query to show all the ordered locus names in the database that have the ID "CTA_####".
select * from genenametype where type = 'ordered locus' and value ~ 'CTA_[0-9][0-9][0-9][0-9]';
:*Scrolling through the results, I saw the ID "CTA_0406/CTA_0407/CTA_0408".
:*The [http://www.uniprot.org/manual/gene_name UniProt manual] stated that ordered locus names that are connected by a slash indicate two predicted genes that are actually one gene.
:*The combination of these IDs explain why two less IDs show up in the count by the Tally Engine in comparison to the SQL query.
*Opened the gene database within Access and filtered the ordered locus names to show only those that match the ID "CTA_0407".
:*The ID appeared on its own without being joined to other ID's with slashes.

====Testing Report for Finalized Gene Database====

Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131119_32bit_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_11
::*Original file name from [http://www.uniprot.org/uniprot/?query=organism:315277+keyword:181 UniProt site]: uniprot-organism%3A315277+keyword%3A181.xml
* Time taken to import: 1.20 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): 11/06/2013
::*Original file name: go_daily-termdb.obo-xml.gz
* Time taken to import: 13.05 min
* Time taken to process: 10.80 min

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): 11/12/13
::*Original file name: 22183.C_trachomatis_A.goa
* Time taken to import: 0.04 min

Name of .gdb file:
* Time taken to export .gdb:
:*Start Time:
:*End Time:
* Upload your file and link to it here:

[[Category:Journal Entry]]
[[Category:Individual Homework]]

Ksherbina Project Notebook

2013-11-28T23:42:36Z

Ksherbina: /* November 27-28, 2013 */ Added a section to include the testing report for the finalized gene database

{{Ksherbina}}

==Week 12==

===Starting the Set Up of the Testing Environment===

#Downloaded the following software onto my personal computer:
#*Latest version of GenMAPP Builder (gmbuilder2.0-b71) from [http://sourceforge.net/projects/xmlpipedb/files/?source=navbar SourceForge]
#*[http://www.oracle.com/technetwork/java/javase/downloads/index.html Java SE 7u45]
#*[http://www.eclipse.org/downloads/ Eclipse IDE for Java EE Developers]
#Created an account on [http://sourceforge.net/ SourceForge].

===Import/Export of Gene Database===

#Downloaded the [http://www.uniprot.org/uniprot/?query=organism:315277+keyword:181 UniProt XML] for ''C. trachomatis'' serovar A strain HAR-13: [[Media:Uniprot XML C.trachomatis serovar A KS 20131114.xml|Uniprot XML C.trachomatis serovar A KS 20131114.xml]]
#Downloaded the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ GOA file] for ''C. trachomatis'' serovar A: [[Media:22183.C trachomatis A KS 20131114.goa|22183.C trachomatis A KS 20131114.goa]]
#Created a new database in pgAdminIII: CT_KS_20131114_gmb2b71
#*Created tables in the database by executing the script in the file ''gmbuilder.sql''.
#Launched ''gmbuilder-32bit.bat'':
#Configured the database in gmbuilder:
#*Host or address: localhost
#*Port number: 5432
#*Database name: CT_KS_20131114_gmb2b71
#*Username: postgres
#*Password: <password of the PostgreSQL database created above>
#Imported the UniProt XML file into the PostgreSQL database through gmbuilder.
#Imported the OBO-XML file into the PostgreSQL database through gmbuilder.
#*gmbuilder was not able to completely process the data. See the testing report below for details.
#Imported the GOA file into into the PostgreSQL database through gmbuilder.
#*I did not begin the actual export of the database due to the problem with processing the OBO-XML file.

====Testing Report====

Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131114_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_11
* Time taken to import: 1.70 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped):
* Time taken to import: 9.29 min
* Time taken to process: '''Could no complete the process. The following error came up when gmbuilder was processing the 81000 term:'''
ERROR edu.lmu.xmlpipedb.gmbuilder.GenMAPPBuilder - java.util.concurrent.ExecutionException:java.lang.OutOfMemoryError:
Java heap space

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]):
* Time taken to import: 0.02 min

Name of .gdb file:
* Time taken to export .gdb:
* Upload your file and link to it here.

Note:

===Creating Custom Species Profile===

#On one of the class computers in the first row, I launched Eclipse with Subclipse.
#Went to Window > Open Perspective > Other > SVN Repository Exploring.
#Defined a new subversion by clicking on ''Add Repository'' and adding the URL https://svn.code.sf.net/p/xmlpipedb/code. Clicked Finish.
#Double-clicked on trunk, right-clicked on gmbuilder, and chose Checkout....
#Created a new project (chose Check out as a project configured using the New Project Wizard).
#Went to Window > Preferences > Java > Installed JREs. Added a new environment with JDK.
#Double-clicked on the project folder.
#Double-clicked on the lib folder.
#Shift-clicked and selected all files in the lib folder and then control-clicked on every file inside the lib folder whose name does not end in .jar. Right-click on one of them and choose Build Path > Add to Build Path from the popup menu that comes up.
#Created a species profile called ChlamydiaTrachomatisSerovarA.
#*Dondi modified the name to ChlamydiaTrachomatisSerovarAUniProtSpeciesProfile.
#Customized the species profile.
#*Opened the new profile and added the lines of code as specified in [[Coder|"Customize the Species Profile"]] directions.
#**Still need to insert the species-specific URL that returns a web page describing a gene for that species.
#Committed the profile following the directions in [[Coder|"Updating and Committing Code"]].

===Reflection===

#What were the week’s key accomplishments?
#*One of this week's key accomplishments was downloading the files necessary to run a import/export gene database cycle and the microarray files that will be analyzed. Also this week, we were able to start an import/export cycle.
#What are next week’s target accomplishments?
#*Next week, I would like to complete an import/export cycle and vet it with the QA person. In addition, I hope that we can download the software necessary to open the microarray files, format the files, and perform statistical analysis.
#What team strengths were seen this week?
#*I believe our strength this week as a team was meeting outside of class to create milestones. This meeting results in a detailed calendar of intermediate milestones that we plan to meet accomplish in order to meet the file deadline.
#What team weaknesses were seen this week?
#*It was difficult to set up the aforementioned meeting due to our busy schedules. Additionally, we did not have a clear plan for working on the project until the middle/end of the week. In the coming weeks, our goal will be to improve our work plan for each class period so as to minimize the difficulty of scheduling work meetings outside of class.

==Week 13==

===November 15, 2013===

====Import/Export Gene Database====

After the error that came up last week regarding the Java heap space, I performed the following steps to increase the heap space:
#Open up ''gmbuilder-32bit.bat'' in Notepad.
#Find the line
"C:\Program Files (x86)\Java\jre7\bin\java" -D32 -Xmx1024m -jar gmbuilder.jar
#Changed "-Xmx1024m" to "-Xmx2048m" thereby increasing the maximum heap space from 1024 MB to 2048 MB.
#Saved the file and closed it.

*Then, I tried the import/export again. I could not open ''gmbuilder-32bit.bat'' after I changed the maximum heap space. I repeated the above steps for ''gmbuilder.bat'' and found that I could then still open the program.
*Opening ''gmbuilder.bat'', I imported both the UniProt XML file and the OBO-XML file. However, I received a Java heap space error again when processing the OBO-XML file.
*I increased the heap space in ''gmbuilder.bat'' from 2048 MB to 4096 MB and tried the import/export cycle again. When this did not work, I increased the heap space in ''gmbuilder.bat'' to 8192 MB.
*Then, I was able to import the UniProt XML, OBO-XML, and GOA files as well as process the OBO-XML.
*However, I was not able to begin the export because an error popped up stating that an export to a GenMAPP database is only possible in ''gmbuilder-32bit.bat''.

=====Testing Report=====

Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131114_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]):
* Time taken to import: 0.73 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped):
* Time taken to import: 7.39 min
* Time taken to process: 47.15 min

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]):
* Time taken to import: 0.03 min

Name of .gdb file:
* Time taken to export .gdb:
* Upload your file and link to it here.

Note: '''Could not export the database. Received an error stating that an export to a GenMAPP database is only possible in ''gmbuilder-32bit.bat''.'''

====Troubleshooting====

#Ran cmd from the Start menu.
#cd to where GenMAPP Builder is located.
#Typed the same command that is in the .bat file, but directly at the prompt:
"C:\Program Files (x86)\Java\jre7\bin\java" -D32 -Xmx1536m -jar gmbuilder.jar

Received the following error:
Error occurred during initialization of VM.
Could not reserve enough space for project heap.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit

===November 19, 2013===

====Rerunning the Gene Database Import/Export====

*From November 11/15/2013, the UniProt XML, OBO-XML, and GOA files were imported and the OBO-XML file was processed the in 64-bit ''gmbuilder.bat'' with the maximum heap space set to 8192.
*Performed the export in 32-bit ''gmbuilder.bat''.

=====Testing Report=====

Version of GenMAPP Builder: Same as in previous testing report

Computer on which export was run: Same as in previous testing report

Postgres Database name: Same as in previous testing report

UniProt XML filename: Same as in previous testing report
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]):
* Time taken to import: 1.10 min

GO OBO-XML filename: Same as in previous testing report
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped):
* Time taken to import: 8.96 min
* Time taken to process: Did not reprocess because received a message stating that the GO data had already been processed.

GOA filename: Same as in previous testing report
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]):
* Time taken to import: 0.03 min

Name of .gdb file: Ct-Std v1 KS 20131119.gdb
* Time taken to export .gdb:
:*Start Time: 4:01:10 AM
:*End Time: 6:51:58 AM
* Upload your file and link to it here: [[Media:Ct-Std v1 KS 20131119.gdb|Ct-Std v1 KS 20131119.gdb]]

Note:

====Checking the Quality of the Exported Database====

#In ''gmbuilder-32bit.bat'', chose Run XML > Database Tallies for UniProt and GO....
:*The XML count for the UniProt file was 917 while the Database Count for the Uniprot file was 10087. This number discrepancy is likely due to importing the UniProt, OBO-XML, and GOA files multiple times in experimenting with the heap space threshold.
:*A new database must now be created in pgAdminIII in order to repeat the export/import gene database cycle.

===November 20-21, 2013===

====Finishing Installing Software to Set Up the Development Environment====

#Navigated to the "Download and Install" tab on the [http://subclipse.tigris.org/ Subclipse home page]
#Copied the Eclipse update site URL for the 1.10.x Release.
#Opened up Eclipse.
#Help > Install New Software...
#Pasted the URL into the "Work with" field.
#Checked the box next to "Subclipse" and "SVNKit".
#Clicked on Next and went through the process of installing the software.

====Repeating the Gene Database Import/Export Cycle====

#Created a new database in pgAdminIII.
#*Created tables in the database by executing the script in the file gmbuilder.sql.
#Since the aforementioned error occurred because duplicates were introduced into the database through repeated imports, I decided to try this new import-export cycle using 32-bit ''gmbuilder''.
#Opened ''gmbuilder-32bit.bat'' with the heap space set to 1024 MB and configured the new database.
#*Host or address: localhost
#*Port number: 5432
#*Database name: CT_KS_20131119_32bit_gmb2b71
#*Username: postgres
#*Password: <password of the PostgreSQL database created above>
#Imported the UniProt XML file into PostgreSQL through ''gmbuilder-32bit.bat''.
#Imported the OBO-XML file into PostgreSQL through ''gmbuilder-32bit.bat''.
#Processed the OBO-XML file through ''gmbuilder-32bit.bat''.
#Imported the GOA file into PostgreSQL through ''gmbuilder-32bit.bat''.

=====Testing Report=====
Version of GenMAPP Builder: 2.0b71

Computer on which export was run: Personal computer

Postgres Database name: CT_KS_20131119_32bit_gmb2b71

UniProt XML filename: Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
* UniProt XML version (The version information can be found at [http://uniprot.org/news the UniProt News Page]): UniProt release 2013_11
::*Original file name from [http://www.uniprot.org/uniprot/?query=organism:315277+keyword:181 UniProt site]: uniprot-organism%3A315277+keyword%3A181.xml
* Time taken to import: 1.20 min

GO OBO-XML filename: Go_daily-termdb_v2_HD_20131107.obo-xml
* GO OBO-XML version (The version information can be found in the file properties after the file downloaded from the [http://beta.geneontology.org/page/download-ontology GO Download page] has been unzipped): 11/06/2013
::*Original file name: go_daily-termdb.obo-xml.gz
* Time taken to import: 13.05 min
* Time taken to process: 10.80 min

GOA filename: 22183.C_trachomatis_A_KS_20131114.goa
* GOA version (News on [http://www.ebi.ac.uk/GOA/ this page] records past releases; current information can be found in the Last modified field on the [ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/proteomes/ FTP site]): 11/12/13
::*Original file name: 22183.C_trachomatis_A.goa
* Time taken to import: 0.04 min

Name of .gdb file: Ct-Std KS 20131121.gdb
* Time taken to export .gdb:
:*Start Time: 2:37:42 AM
:*End Time: 2:55:45 AM
* Upload your file and link to it here: [[Media:Ct-Std KS 20131121.gdb|Ct-Std KS 20131121.gdb]]

Note: '''There were no heap space errors this time, which is good. The errors probably occurred in the first place because of repeatedly importing the files.'''

===November 21, 2013===

====Revised the Custom Species Profile====

#Ran Eclipse.
#Navigated to Window > Open Perspective > Other... and chose SVN Repository Exploring
#Defined a new Subversion repository by clicking on the Add Repository button.
#Set the URL to https://svn.code.sf.net/p/xmlpipedb/code then clicked Finish.
#Double-cliked on https://svn.code.sf.net/p/xmlpipedb/code in the list.
#Double-clicked on ''trunk''.
#Right-clicked on ''gmbuilder'' and chose ''Checkout....''
#Chose ''Check out as a project configured using the New Project Wizard'' then click ''Finish''.
#In the New Project dialog that opened, chose Java Project from the list and then clicked ''Next >''.
#Entered the new project name ''xmlpipedb-gmbuilder'' and clicked ''Finish''.
#Set Eclipse to use JDK.
#*Navigated to Windows > Preferences and clicked on the ''Search'' button.
#*Expanded Computer > Local Disk (C:) > Program Files > Java > jdk1.7.0_45 and clicked OK.
#*Highlighted the row ''jdk'' and clicked on ''Edit''.
#*In "JRE home:" field, removed everything in the name after jdk1.7.0_45 and then clicked ''Finish''.
#Made sure that I was in the Java perspective by navigating to Window > Open Perspective > Other... and then choosing Java.
#Double-clicked on the ''xmlpipedb-gmbuilder'' folder.
#Double-clicked on the ''lib'' folder.
#Shift-clicked and selected all files in the lib folder and then control-clicked on every file inside the lib folder whose name did not end in .jar.
#Right-clicked on one of the files and choose Build Path > Add to Build Path.
#Check that the ''src'' folder was set to the source folder by right-clicking on it and then clicking on ''Build Path''.
#Right-clicked on the ''test'' folder and chose Build Path > Use as Source Folder. Then, made sure that there were no red x's in the list of folders.
#Right-clicked on ''build.xml'' toward the bottom of the list. Then, navigated to Team > Synchronize with Repository.
#Double-clicked on src > edu.lmu.xmlpipedb.gmbuilder.databasetoolkit.profiles > ChlamydiaTrachomatisSerovarAUniProtSpeciesProfile.java.\
#Within the code, added the gene database link pattern: "http://bacteria.ensembl.org/chlamydia_trachomatis_a_har_13/Gene/Summary?g=~" and then saved the code.
#Right-clicked on ''build.xml'' and chose Run As > Ant Build... (the one with the ellipses).
#In the Edit Configuration dialog that appears, checked on the clean and dist items in the Targets tab.
#Clicked on the ''Order...'' button and rearranged the items so that clean is first and dist is second and then clicked OK.
#Clicked the Run button.
#Right-clicked on the ''xmlpipedb-gmbuilder'' project folder and chose Refresh (F5 is its keyboard shortcut).

====Reran the Database Export With the New Build for gmbuilder====

#Made sure that pgAdminIII and the database "CT_KS_20131119_32bit_gmb2b71" is open.
#Opened the newly built ''gmbuilder-32bit.bat'' within the folder [User Folder] > workspace > xmlpipedb-gmbuilder > dist.
#Made sure that the database was configured to "CT_KS_20131119_32bit_gmb2b71".
#Performed a database export.

=====Testing Report=====

*Name of .gdb file: Ct-Std_v2_KS_20131121.gdb
*Time taken to export .gdb:
:*Start Time: 10:56 AM (approximately)
:*End Time: 11:14:26 AM
*Upload your file and link to it here: [[Media:Ct-Std v2 KS 20131121.gdb|Ct-Std v2 KS 20131121.gdb]]

Note:

====Testing the New Database with GenMAPP====
#Opened GenMAPP.
#Clicked on Data > New Gene Database and selected the gene database that was created today (Ct-Std v2 KS 20131121.gdb).
#On the toolbar, clicked on the button that has the word gene in a box.
#Clicked and dragged the cursor on the blank canvas to create a new gene box.
#Right-clicked on the gene box.
#In the pop-up window, typed in "CTA_0588" in the Gene ID search field.
#Under Gene ID System, selected Ordered Locus Names.
#Then, hit the "Search button". Interestingly, GenMAPP builder did not find this gene in the database.
#*To troubleshoot, I opened up the gene database in Access.
#*In the list of tables, I double-clicked on OrderedLocusNames.
#*I sorted the list of gene IDs from A-Z and then scrolled down to try to find "CTA_0588". I was able to find "CTA_0587" and "CTA_0589" but not "CTA_0588".
#Back in GenMAPP, I used the same steps as above to find "CTA_0587".
#"CTA_0587" was found in the database. In the window, I clicked on the link "CTA_0587" under the OrderedLocusNames section.
#*This link took me to the entry for this gene on the Model Organism Database (http://bacteria.ensembl.org/chlamydia_trachomatis_a_har_13/Gene/Summary?g=CTA_0587).

====Performing Quality Assurance on the New Database====

*Performed counts with Tally Engine.
[[Image:Tally Engine Results for Ct-Std v2 KS 20131121.jpg|thumb|none|upright=4]]
*Counted the number of unique gene IDs using xmlpipdb ''match'':
java -jar xmlpipedb-match-1.1.1.jar "CTA_[0-9][0-9][0-9][0-9]" < Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
:*'''Total unique matches = 911'''
*Counted the number of unique gene IDs using an SQL query:
select count(*) from genenametype where type = 'ordered locus' and value ~ 'CTA_[0-9][0-9][0-9][0-9]';
:*'''Count - 917'''
*Opened the database in Access. Then, double-clicked on the table "OriginalRowCounts".
{| style = "width:20%; font-size:100%"
| '''Table'''
| '''Rows'''
|-
| OrderedLocusNames
| 919
|-
| UniProt
| 917
|-
| UniProt-OrderedLocusNames
| 919
|}
*Double-clicked on the table "OrderedLocusNames" and scrolled all the way to the bottom. We found that some gene ID's had the format pCTA_####.
*Went back to the command prompt and ran the following command:
java -jar xmlpipedb-match-1.1.1.jar "[p|]CTA_[0-9][0-9][0-9][0-9]" < Uniprot_XML_C.trachomatis_serovar_A_KS_20131114.xml
:*'''Total unique matches = 8'''
:*Adding the results of the two runs with xmlpipedb match, we got 919 unique gene IDs, which is the same number as that in Access.

==Week 14==

===November 24, 2013===

====Ran GenMAPP Using ''C. trachomatis'' Gene Database====

#Opened GenMAPP.
#Made sure that the database was set to Ct-Std_v2_KS_20131121.gdb.
#Imported [[Media:C.trachomatis_Sample_Microarray_Data_for_GenMAPP_KS_20131124.txt|sample microarray data]] into GenMAPP.
#*Loaded the microarray data by navigating to Expression Dataset Manager > New Dataset.
#*There were 23015 errors.
#Opened the exceptions file that was created to inspect the errors and scrolled down to where the IDs CTA_#### begin.
#*The gene IDs are actually in the format CTA_####_RRMH#####_at.
#Tried to search for a sample ID in the format found in the exceptions file in the UniProt database or any NCBI databases.
#*Could not find CTA_0806_RRMH00303, RRMH00303, CTA_0806_RRMH00303_at, or RRMH00303_at in the UniProt database.
#*Could not find CTA_0806_RRMH00303, RRMH00303, CTA_0806_RRMH00303_at, or RRMH00303_at in any database through NCBI.
#*My suspicion is that the RRMH#####_at portion of the ID is an identifier that is appended to each probe on the Affymetrix microarray when it is read by the company's software.

===November 27-28, 2013===

====Running SQL Queries to Determine Discrepancy in the Count Between the UniProt XML and the Gene Database====

*Ran an SQL query to determine if there were any ordered locus names in the database that do not match the ID "CTA_####".
select * from genenametype where type = 'ordered locus' and not value ~ 'CTA_[0-9][0-9][0-9][0-9]';
:*There were no hits.
*Ran an SQL query to show all the ordered locus names in the database that have the ID "CTA_####".
select * from genenametype where type = 'ordered locus' and value ~ 'CTA_[0-9][0-9][0-9][0-9]';
:*Scrolling through the results, I saw the ID "CTA_0406/CTA_0407/CTA_0408".
:*The [http://www.uniprot.org/manual/gene_name UniProt manual] stated that ordered locus names that are connected by a slash indicate two predicted genes that are actually one gene.
:*The combination of these IDs explain why two less IDs show up in the count by the Tally Engine in comparison to the SQL query.
*Opened the gene database within Access and filtered the ordered locus names to show only those that match the ID "CTA_0407".
:*The ID appeared on its own without being joined to other ID's with slashes.

====Testing Report for Finalized Gene Database====

[[Category:Journal Entry]]
[[Category:Individual Homework]]