Troque Week 14

From LMU BioDB 2015
Revision as of 06:04, 4 December 2015 by Troque (Talk | contribs) (Linking the Testing report page)

Jump to: navigation, search

User Page        Bio Databases Main Page       


Running New Build

Name of .gdb file (give filename and upload and link to compressed file): Sf-Std_20151201.gdb

  • Time taken to export:
    • Start time: 4:19:22 PM PDT
    • End time: 8:30:08 PM PDT
    • Note:

Important Files

Identifying the Gene IDs

  • Regular expression: (CP|SF?)[0-9][0-9][0-9][0-9](\.[0-9])?(/|</name>)
  • Observations:
    • In order to lessen the number of matches, we had to add the end tag "</name>" to our regular expression. This brought down the number of matches from over 8000, to just 7517. Since TallyEngine's results were 7567, this means that 150 IDs were not being caught. In order to account for this, we had to add the genes with ID's of the form CP#### (there were 50 instances of these), and those with the form SF####.# or S####.#. This led us to get 7566 gene IDs.
    • When I looked at the IDs in Microsoft Access, the IDs total 7569. In order to account for this last piece of gene formatting, we also had to account for the genes with the form SF?####/SF?####. These 2 extra genes that were not accounted for by TallyEngine is actually not supposed to be separated since the genes are formatted such that it can be interpreted that the IDs are interchangeable. When the gdb file was created, it would seem that these genes have been split down the "/".
    • In other words, there are 3 ordered locus names with formatting that is different from the rest: SF2223/SF2224, S2352/S2353, and S3359/S3360.
    • I wasn't able to exactly hit the number outputted by Tally Engine since there are other genes with the same format that were already caught with the patterns SF#### or S####.
    • Note: It turns out the ShiBASE database only uses the pattern SF#### instead of S#### or CP#### so the regular expression would really have to be just SF?[0-9][0-9][0-9][0-9](\.[0-9])?(/|</name>)

FOR THE FULL REPORT ON IDENTIFYING THE ID, VISIT THE GENE DATABASE TESTING REPORT PAGE.

Assignment Links

Weekly Assignments

Individual Journal Entries

Shared Journal Entries