Difference between revisions of "Troque Week 14"

From LMU BioDB 2015
Jump to: navigation, search
m (Added regular expression)
m (Regular Expression: Added more bulletpoints)
Line 13: Line 13:
 
== Regular Expression ==
 
== Regular Expression ==
 
* (CP|SF?)([0-9][0-9][0-9][0-9])(\.[0-9])?(/|</name>)
 
* (CP|SF?)([0-9][0-9][0-9][0-9])(\.[0-9])?(/|</name>)
 +
* Observations:
 +
** In order to lessen the number of matches, we had to add the end tag "</name>" to our regular expression. This brought down the number of matches from over 8000, to just 7517. Since TallyEngine's results were 7567, this means that 150 IDs were not being caught. In order to account for this, we had to add the genes with ID's of the form CP#### (there were 50 instances of these), and those with the form SF####.# or S####.#. This led us to get 7566 gene IDs.
 +
* When I looked at the IDs in Microsoft Access, the IDs total 7569. In order to account for this last piece of gene formatting, we also had to account for the genes with the form SF?####/SF?####.
  
 
{{Template:Troque_Journal}}
 
{{Template:Troque_Journal}}

Revision as of 01:22, 4 December 2015

User Page        Bio Databases Main Page       


Running New Build

Name of .gdb file (give filename and upload and link to compressed file): Sf-Std_20151201.gdb

  • Time taken to export:
    • Start time: 4:19:22 PM PDT
    • End time: 8:30:08 PM PDT
    • Note:

Important Files

Regular Expression

  • (CP|SF?)([0-9][0-9][0-9][0-9])(\.[0-9])?(/|</name>)
  • Observations:
    • In order to lessen the number of matches, we had to add the end tag "</name>" to our regular expression. This brought down the number of matches from over 8000, to just 7517. Since TallyEngine's results were 7567, this means that 150 IDs were not being caught. In order to account for this, we had to add the genes with ID's of the form CP#### (there were 50 instances of these), and those with the form SF####.# or S####.#. This led us to get 7566 gene IDs.
  • When I looked at the IDs in Microsoft Access, the IDs total 7569. In order to account for this last piece of gene formatting, we also had to account for the genes with the form SF?####/SF?####.

Assignment Links

Weekly Assignments

Individual Journal Entries

Shared Journal Entries