Difference between revisions of "Influenza Research Database"

From LMU BioDB 2017
Jump to: navigation, search
(General utility of the database to the scientific community: Search Section)
(General utility of the database to the scientific community: Progress on Formats, User-Friendly, Organization, and Help)
Line 37: Line 37:
 
# Is it convenient to browse the data?
 
# Is it convenient to browse the data?
 
#:: IRD has multiple ways to browse and search the data within their database. They also have a number of tools that make it convenient to refine and analyze the search as well as save the data you are working on to your "workbench" so you can come back to it later.
 
#:: IRD has multiple ways to browse and search the data within their database. They also have a number of tools that make it convenient to refine and analyze the search as well as save the data you are working on to your "workbench" so you can come back to it later.
#::* You can use their quick search tool, to "search for sequence records using any text terms in key text fields and public IDs (e.g. accession numbers) of nucleotide and protein sequence records, strain data, surveillance data, and human clinical metadata."
+
#::# You can use their quick search tool, to "search for sequence records using any text terms in key text fields and public IDs (e.g. accession numbers) of nucleotide and protein sequence records, strain data, surveillance data, and human clinical metadata."
#::* Or, you search using any of the following, focused, search tools:
+
#::# Or, you search using any of the following, focused, search tools:
#::** Sequences & Strains
+
#::#* Sequences & Strains
#::** Animal Surveillance
+
#::#* Animal Surveillance
#::** 3D Protein Structure Files
+
#::#* 3D Protein Structure Files
#::** Human Clinical Metadata
+
#::#* Human Clinical Metadata
#::** Serology Experiments (Beta)
+
#::#* Serology Experiments (Beta)
#::** Host Factor Data
+
#::#* Host Factor Data
#::** Antiviral Drugs
+
#::#* Antiviral Drugs
#::** Immune Epitopes  
+
#::#* Immune Epitopes  
#::** Phenotypes
+
#::#* Phenotypes
#::** PCR Primer Probe Data
+
#::#* PCR Primer Probe Data
#::** Sequence Feature Variant Types
+
#::#* Sequence Feature Variant Types
#::** Human Clinical Studies and Lab Experiments (Beta)
+
#::#* Human Clinical Studies and Lab Experiments (Beta)
 
#:: With all of these tools, you have many options to access any single piece of data. This provides a lot of convenience when it comes too trying to locate anything within their database.
 
#:: With all of these tools, you have many options to access any single piece of data. This provides a lot of convenience when it comes too trying to locate anything within their database.
 
# Is it convenient to download the data?
 
# Is it convenient to download the data?
 +
#:: Yes, once you've used one of their various search tools to find the data point that you need, you can download the data into one of the following formats:
 +
#::# GFF3
 +
#::# Segment FASTA
 +
#::# Gene FASTA
 +
#::# CDS FASTA
 +
#::# Protein FASTA
 
#* In what file formats are the data provided?
 
#* In what file formats are the data provided?
#* Are they standard or non-standard formats?
+
#:: Listed above.
# Evaluate the “user-friendliness” of the database: can a naive user quickly navigate the website and gather useful information?
+
#* Are they standard or non-standard formats?
#* Is the website well-organized?
+
#:: All file formats that they provide are standard in bioinformatics.
#* Does it have a help section or tutorial?
+
# Evaluate the “user-friendliness” of the database: can a naive user quickly navigate the website and gather useful information?
 +
#:: It is a pretty user-friendly database. They have done a good job making all of their search and download tools very obvious and easy to use. Obviously is someone who has no biological background would have a hard time searching data, but that doesn't make the site not user-friendly.
 +
#* Is the website well-organized?  
 +
#:: For the most part everything is clearly labeled and organized well. They have five main sections right under their logo which help direct you to the part of the site that you are looking for. The dropdown menus under each of these sections has labels that aren't confusing, and therefore take you to exactly the part of the site that you think you are going to.
 +
#* Does it have a help section or tutorial?
 +
#:: Yes, one of the five main sections is labeled help (easily spotted and right at the top-center of the page) and has the following sub-sections:
 +
#::# Help Manual
 +
#::# Tutorials & Training Materials
 +
#::# Frequently Asked Questions
 +
#::# IRD Computational Protocols
 +
#::# IRD Glossary
 +
#::# Contact Us
 +
#::# Cite IRD
 +
#:: Their help manual is very extensive with detailed written instructions on how to access/use any part of the site. Their Tutorials and Training Materials page is also very helpful because it provides links to video instructions on how to do the most used tasks within IRD.
 
#* Are the search options sensible?
 
#* Are the search options sensible?
 
#* Run a sample query.  Do the results make sense?
 
#* Run a sample query.  Do the results make sense?

Revision as of 21:46, 4 October 2017

General information about the database

The database we chose is the Influenza Research Database

Type of Database

  • Contains avian and non-human mammalian influenza surveillance data, human clinical data associated with virus extracts, phenotypic characteristics of viruses isolated from extracts, and all genomic and proteomic data available in public repositories for influenza viruses.
  • Includes both primary and secondary data that appears to be curated by the community

Maintenance

  • Maintained privately by a team of 29 individuals belonging to Northrop Grumman Health IT, Vecna Technologies, DMID/NIAID/NIH/DHHS, and J. Craig Venter Institute.
  • Public is encouraged to submit their own data

Funding

  • IRD is funded by the National Institute of Allergy and Infectious Diseases, National Institutes of Health, and Department of Health and Human Services.
  • It is a collaboration between Northrop Grumman Health and Human Services, J. Craig Venter Institute, and Vecna Technologies

Scientific quality of the database

  1. Does the content appear to completely cover its content domain?
    • How many records does the database contain?
    • What claims do the database owners make about coverage in the corresponding paper?
  2. What species are covered in the database?
  3. Is the database content useful? I.e., what biological questions can it be used to answer?
  4. Is the database content timely?
    • Is there a need in the scientific community for such a database at this time?
    • Is the content covered by other databases already?
    • When did the database first go online?
    • How often is the database updated?
    • When was the last update?

General utility of the database to the scientific community

  1. Are there links to other databases? Which ones?
    Although most of the data is generated by the IRD team, the database also imports data from the following databases:
  2. Is it convenient to browse the data?
    IRD has multiple ways to browse and search the data within their database. They also have a number of tools that make it convenient to refine and analyze the search as well as save the data you are working on to your "workbench" so you can come back to it later.
    1. You can use their quick search tool, to "search for sequence records using any text terms in key text fields and public IDs (e.g. accession numbers) of nucleotide and protein sequence records, strain data, surveillance data, and human clinical metadata."
    2. Or, you search using any of the following, focused, search tools:
      • Sequences & Strains
      • Animal Surveillance
      • 3D Protein Structure Files
      • Human Clinical Metadata
      • Serology Experiments (Beta)
      • Host Factor Data
      • Antiviral Drugs
      • Immune Epitopes
      • Phenotypes
      • PCR Primer Probe Data
      • Sequence Feature Variant Types
      • Human Clinical Studies and Lab Experiments (Beta)
    With all of these tools, you have many options to access any single piece of data. This provides a lot of convenience when it comes too trying to locate anything within their database.
  3. Is it convenient to download the data?
    Yes, once you've used one of their various search tools to find the data point that you need, you can download the data into one of the following formats:
    1. GFF3
    2. Segment FASTA
    3. Gene FASTA
    4. CDS FASTA
    5. Protein FASTA
    • In what file formats are the data provided?
    Listed above.
    • Are they standard or non-standard formats?
    All file formats that they provide are standard in bioinformatics.
  4. Evaluate the “user-friendliness” of the database: can a naive user quickly navigate the website and gather useful information?
    It is a pretty user-friendly database. They have done a good job making all of their search and download tools very obvious and easy to use. Obviously is someone who has no biological background would have a hard time searching data, but that doesn't make the site not user-friendly.
    • Is the website well-organized?
    For the most part everything is clearly labeled and organized well. They have five main sections right under their logo which help direct you to the part of the site that you are looking for. The dropdown menus under each of these sections has labels that aren't confusing, and therefore take you to exactly the part of the site that you think you are going to.
    • Does it have a help section or tutorial?
    Yes, one of the five main sections is labeled help (easily spotted and right at the top-center of the page) and has the following sub-sections:
    1. Help Manual
    2. Tutorials & Training Materials
    3. Frequently Asked Questions
    4. IRD Computational Protocols
    5. IRD Glossary
    6. Contact Us
    7. Cite IRD
    Their help manual is very extensive with detailed written instructions on how to access/use any part of the site. Their Tutorials and Training Materials page is also very helpful because it provides links to video instructions on how to do the most used tasks within IRD.
    • Are the search options sensible?
    • Run a sample query. Do the results make sense?
  5. Access: Is there a license agreement or any restrictions on access to the database?

Summary judgment

  1. Would you direct a colleague unfamiliar with the field to use it?
  2. Is this a professional or hobby database?

Some Definitions

  • Electronic curation occurs when someone writes a program to add information to a database record from another database.
  • Manual curation occurs when a human reviews the information being added to a record to validate it as true.
    • In-house is when the human works for the database organization.
    • Community is when the database allows members of the scientific community that don't work for the database organization to add information to the record.