Difference between revisions of "Ajvree Week 13"
(began week 13 counting info) |
(→11/21/13: table analysis) |
||
(21 intermediate revisions by one user not shown) | |||
Line 1: | Line 1: | ||
==Week 12 Information== | ==Week 12 Information== | ||
− | Export Counting: -open Access, open file (remember to change all to all files) TIGR4 file: | + | Export Counting: -open Access, open file (remember to change all to all files) TIGR4 file:<br> |
− | ids used: SP_#### | + | ids used: SP_####<br> |
− | orderedlocusnames count total: 2126 entries | + | orderedlocusnames count total: 2126 entries<br> |
− | R6 file: | + | R6 file:<br> |
− | orderedlocusnames count total: 2115 entries | + | orderedlocusnames count total: 2115 entries<br> |
− | ids used: SPG_#### | + | ids used: SPG_####<br> |
− | G54 file: | + | G54 file:<br> |
− | ids used: SPG_#### | + | ids used: SPG_####<br> |
− | orderedlocusnames count total: 2115 entries | + | orderedlocusnames count total: 2115 entries<br> |
− | After finding identical results for R6 and G54 files, realized that R6 file was actually for the G54 strain. Checked a few times (reopened files multiple times to confirm) | + | After finding identical results for R6 and G54 files, realized that R6 file was actually for the G54 strain. Checked a few times (reopened files multiple times to confirm)<br> |
− | First try on Tally Engine for TIGR4: | + | First try on Tally Engine for TIGR4:<br> |
− | XML Count: | + | XML Count:<br> |
− | orderedlocus: 2127 | + | orderedlocus: 2127<br> |
− | refseq: 2106 | + | refseq: 2106<br> |
− | Database Count: | + | Database Count:<br> |
− | ordered locus: 3831 | + | ordered locus: 3831<br> |
− | refseq: 3403 | + | refseq: 3403<br> |
− | ==Week 13== | + | =='''Week 13'''== |
'''Tally Engine:''' | '''Tally Engine:''' | ||
*created new database in pgadmin III | *created new database in pgadmin III | ||
*in sql, opened gmbuilder.sql | *in sql, opened gmbuilder.sql | ||
*ran query, database tables were inserted | *ran query, database tables were inserted | ||
− | *went in to tally engine and imported files (xml, | + | *went in to tally engine and imported files |
− | * | + | **Xml import took 5.41 min |
+ | **GOA import took 0.07 min | ||
+ | *unzipped go-xml file | ||
+ | *OBO-XML import time: 19.92 min | ||
+ | *additional gene ontology information was processed, this took 14.96 min | ||
+ | *ran tally, came up with error | ||
+ | *refreshed gmbuilder and tried again successfully | ||
+ | Results: | ||
+ | [[Image:TallyEngineTrial2.PNG]] | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | '''XMLpipedb Match''' | ||
+ | *downloaded program from sourceforge | ||
+ | *opened cmd program | ||
+ | *cd Downloads file | ||
+ | *moved xmlmatch jar file to download folder | ||
+ | *used match to look for pattern SP_[0-9][0-9][0-9][0-9] | ||
+ | *Total unique matches: 2126 | ||
+ | *almost identical to tally engine results of 2127, minus one result | ||
+ | Results: | ||
+ | [[Image:20131107 XMLmatch tATK TIGR4 AJV.PNG]] | ||
+ | |||
+ | |||
+ | |||
+ | '''OriginalRowCounts''' | ||
+ | *Looked at TIGR4 gdb file and benchmard VD file for table similarities/differences | ||
+ | *seemed to have same tables/same information | ||
+ | *took screenshots of both, included here: | ||
+ | TIGR4: [[Image:20131119 ogrowcounts tATK TIGR4 AJV.PNG]] | ||
+ | Benchmark: [[Image:20131119 benchmarkrowcounts tATK TIGR4 AJV.PNG]] | ||
+ | |||
+ | *Note: a few of the rows are missing in the benchmark screenshot- could not fit all of them on screen. | ||
+ | |||
+ | |||
+ | |||
+ | '''SQL''' | ||
+ | *used following query to search for matches: | ||
+ | **select count(*) from genenametype where type = 'ordered locus' and value ~ 'SP_[0-9][0-9][0-9][0-9]'; | ||
+ | *Result given was 2126 | ||
+ | [[Image:20131119 SQLcountresults tATK TIGR4 AJV.PNG]] | ||
+ | |||
+ | ==11/21/13== | ||
+ | |||
+ | '''Tally Engine for Export 3''' | ||
+ | *downloaded Taurus's version of gmbuilder to redo tally engine counting | ||
+ | *used export 3 files instead of previous export 1 files | ||
+ | *connected to avreelan database in pgadminIII, inserted new gmbuilder tables | ||
+ | *opened new version of gmbuilder/tally engine, connected to avreelan database | ||
+ | *XML file import took: 2.02 min | ||
+ | *OBO-XML file import took: 6.25 min | ||
+ | **additional gene ontology data processing took: 4.81 min | ||
+ | *GOA file import took: 0.04 min | ||
+ | *Results: | ||
+ | **GeneId's now visible, total of 2105 in both xml and database counts | ||
+ | **orderedlocusnames still at same value of 2126 in both xml and database counts | ||
+ | **screenshot: [[Image:20131121 TallyEngineE3 tATK TIGR4 AJV.PNG]] | ||
+ | |||
+ | |||
+ | '''Original Row Counts for Export 3''' | ||
+ | *redid row counts using the export 3 file | ||
+ | *compared with benchmark file | ||
+ | *both files had identical number of tables with same categories in each, although some were out of order. | ||
+ | *Screenshots: | ||
+ | E3 TIGR4: [[Image:20131121 E3rowcounts tATK TIGR4 AJV.PNG]] | ||
+ | Benchmark:[[Image:20131121 benchmarkrowcounts tATK TIGR4 AJV.PNG]] | ||
+ | |||
+ | '''Table Analysis''' | ||
+ | *looked at tables within E3 gdb file to find inconsistencies in data | ||
+ | |||
+ | '''Systems Table'''<br> | ||
+ | [[Image:20131121 E3Systemstable tATK TIGR4 AJV.PNG]] | ||
+ | *There are missing dates for quite a few gene ID systems | ||
+ | |||
+ | '''OrderedLocusNames Table''' | ||
+ | *All ID's took the expected form, SP_#### | ||
+ | |||
+ | '''UniProt Table''' | ||
+ | *ID's are scattered. Have a general pattern of beginning with P or Q, following with five characters (mix of numbers and letters) | ||
+ | |||
+ | '''RefSeq Table''' | ||
+ | *all ID's in form NP_###### | ||
+ | |||
+ | ==Links== | ||
+ | {{Team ATK}} |
Latest revision as of 18:56, 21 November 2013
Contents |
[edit] Week 12 Information
Export Counting: -open Access, open file (remember to change all to all files) TIGR4 file:
ids used: SP_####
orderedlocusnames count total: 2126 entries
R6 file:
orderedlocusnames count total: 2115 entries
ids used: SPG_####
G54 file:
ids used: SPG_####
orderedlocusnames count total: 2115 entries
After finding identical results for R6 and G54 files, realized that R6 file was actually for the G54 strain. Checked a few times (reopened files multiple times to confirm)
First try on Tally Engine for TIGR4:
XML Count:
orderedlocus: 2127
refseq: 2106
Database Count:
ordered locus: 3831
refseq: 3403
[edit] Week 13
Tally Engine:
- created new database in pgadmin III
- in sql, opened gmbuilder.sql
- ran query, database tables were inserted
- went in to tally engine and imported files
- Xml import took 5.41 min
- GOA import took 0.07 min
- unzipped go-xml file
- OBO-XML import time: 19.92 min
- additional gene ontology information was processed, this took 14.96 min
- ran tally, came up with error
- refreshed gmbuilder and tried again successfully
XMLpipedb Match
- downloaded program from sourceforge
- opened cmd program
- cd Downloads file
- moved xmlmatch jar file to download folder
- used match to look for pattern SP_[0-9][0-9][0-9][0-9]
- Total unique matches: 2126
- almost identical to tally engine results of 2127, minus one result
OriginalRowCounts
- Looked at TIGR4 gdb file and benchmard VD file for table similarities/differences
- seemed to have same tables/same information
- took screenshots of both, included here:
- Note: a few of the rows are missing in the benchmark screenshot- could not fit all of them on screen.
SQL
- used following query to search for matches:
- select count(*) from genenametype where type = 'ordered locus' and value ~ 'SP_[0-9][0-9][0-9][0-9]';
- Result given was 2126
[edit] 11/21/13
Tally Engine for Export 3
- downloaded Taurus's version of gmbuilder to redo tally engine counting
- used export 3 files instead of previous export 1 files
- connected to avreelan database in pgadminIII, inserted new gmbuilder tables
- opened new version of gmbuilder/tally engine, connected to avreelan database
- XML file import took: 2.02 min
- OBO-XML file import took: 6.25 min
- additional gene ontology data processing took: 4.81 min
- GOA file import took: 0.04 min
- Results:
Original Row Counts for Export 3
- redid row counts using the export 3 file
- compared with benchmark file
- both files had identical number of tables with same categories in each, although some were out of order.
- Screenshots:
Table Analysis
- looked at tables within E3 gdb file to find inconsistencies in data
- There are missing dates for quite a few gene ID systems
OrderedLocusNames Table
- All ID's took the expected form, SP_####
UniProt Table
- ID's are scattered. Have a general pattern of beginning with P or Q, following with five characters (mix of numbers and letters)
RefSeq Table
- all ID's in form NP_######
[edit] Links
Alina's User Page | Kevin's User Page | Tauras's User Page |
Biological Databases Class Page | Gene Database Project | Gene Database Project Report Guidelines |
- Import Export Cycle 1: tATK Export One: TIGR4 Testing Report
- Import Export Cycle 2: tATK E2: TIGR4 Testing Report
- Import Export Cycle 3: tATK E3: TIGR4 Testing Report
- Import Export Cycle 4: tATK E4: TIGR4 Testing Report
Project Roles: | Project Manager | Coder | GenMAPP User | Quality Assurance |