Difference between revisions of "Influenza Research Database"

From LMU BioDB 2017
Jump to: navigation, search
(General information about the Influenza Research Database: adding periods)
(Scientific quality of the database: fixing sentence structure)
Line 18: Line 18:
 
=Scientific quality of the database=
 
=Scientific quality of the database=
 
===Content===
 
===Content===
*The content does appear to completely cover its content domain
+
*The content does appear to completely cover its content domain:
**2,408,618 aggregated records and 1,434,077 derived records ([https://www.fludb.org/brc/dataSummary.spg?decorator=influenza Data Summary])
+
**IRD has 2,408,618 aggregated records and 1,434,077 derived records ([https://www.fludb.org/brc/dataSummary.spg?decorator=influenza Data Summary])
 
**In the [https://academic.oup.com/nar/article/45/D1/D466/2770652/Influenza-Research-Database-An-integrated?searchresult=1 corresponding paper] the owners of the database claim:  
 
**In the [https://academic.oup.com/nar/article/45/D1/D466/2770652/Influenza-Research-Database-An-integrated?searchresult=1 corresponding paper] the owners of the database claim:  
 
  "IRD facilitates the research and development of vaccines, diagnostics and therapeutics against influenza virus by providing a comprehensive collection of influenza-related data integrated from various sources, a growing suite of analysis and visualization tools for data mining and hypothesis generation, personal workbench spaces for data storage and sharing, and active user community support."   
 
  "IRD facilitates the research and development of vaccines, diagnostics and therapeutics against influenza virus by providing a comprehensive collection of influenza-related data integrated from various sources, a growing suite of analysis and visualization tools for data mining and hypothesis generation, personal workbench spaces for data storage and sharing, and active user community support."   
*There is data coverage on both birds and mammals
+
*IRD has data coverage on both birds and mammals.
 
===Usefulness===
 
===Usefulness===
 
  "The [https://academic.oup.com/nar/article/45/D1/D466/2770652/Influenza-Research-Database-An-integrated?searchresult=1 objective] of the IRD resource is to provide a one-stop shop for influenza virus data and analysis tools to drive new discoveries about influenza virus transmission, virulence, host range and pathogenesis, and to develop novel strategies for diagnosis, prevention and therapeutic intervention."
 
  "The [https://academic.oup.com/nar/article/45/D1/D466/2770652/Influenza-Research-Database-An-integrated?searchresult=1 objective] of the IRD resource is to provide a one-stop shop for influenza virus data and analysis tools to drive new discoveries about influenza virus transmission, virulence, host range and pathogenesis, and to develop novel strategies for diagnosis, prevention and therapeutic intervention."
*Can be used to answer questions about genetic sequences, animal surveillance, immune epitopes, 3D protein structures, phenotype, human clinical metadata, antiviral drugs, and much more.  
+
*This database can be used to answer questions about genetic sequences, animal surveillance, immune epitopes, 3D protein structures, phenotype, human clinical metadata, antiviral drugs, and much more.  
*Contains [https://www.fludb.org/brc/analysis_landing.spg?decorator=influenza analysis tools] that assist biologists in analyzing their own data
+
*It contains [https://www.fludb.org/brc/analysis_landing.spg?decorator=influenza analysis tools] that assist biologists in analyzing their own data.
 
===Relevance===
 
===Relevance===
 
*IRD content is very timely because there is a need in the scientific community for such a database; the influenza virus is a major global public threat with complex processes. IRD allows for an easily accessed compilation of all data corresponding to the influenza virus while at the same time assisting biologists in their own analysis of data.  
 
*IRD content is very timely because there is a need in the scientific community for such a database; the influenza virus is a major global public threat with complex processes. IRD allows for an easily accessed compilation of all data corresponding to the influenza virus while at the same time assisting biologists in their own analysis of data.  
 
*IRD is not completely unique as NCBI covers similar content in their [https://www.ncbi.nlm.nih.gov/genomes/FLU/Database/nph-select.cgi?go=database Influenza Virus Database].
 
*IRD is not completely unique as NCBI covers similar content in their [https://www.ncbi.nlm.nih.gov/genomes/FLU/Database/nph-select.cgi?go=database Influenza Virus Database].
 
===Upkeep===
 
===Upkeep===
*It is unclear when the database was first released online. However, its most distant publication was in 2006 ([https://www.fludb.org/brc/publicationsIRD.spg?method=pubsPres&decorator=influenza IRD Resource Publications and Presentations])
+
*It is unclear when the database was first released online. However, its most distant publication was in 2006 ([https://www.fludb.org/brc/publicationsIRD.spg?method=pubsPres&decorator=influenza IRD Resource Publications and Presentations]).
 
*While sequences are downloaded from GenBank, curated, and added to the IRD database daily, updates to generated and imported public data are made less often. The most recent updates for this information occurred between May 2017 and October 2017 ([https://www.fludb.org/brc/dataUpdates.spg?decorator=influenza IRD Data Updates])
 
*While sequences are downloaded from GenBank, curated, and added to the IRD database daily, updates to generated and imported public data are made less often. The most recent updates for this information occurred between May 2017 and October 2017 ([https://www.fludb.org/brc/dataUpdates.spg?decorator=influenza IRD Data Updates])
  

Revision as of 05:17, 5 October 2017

General information about the Influenza Research Database

Content

  • IRD is comprised of 3 features (found from their corresponding paper in NAR):
    • Influenza virus-related data (surveillance, clinical, phenotypic, genomic, and proteomic)
    • Analytical and visualization tools (e.g. sequence comparison and analysis tools, BLAST, protein structure visualization tools, etc.)
    • Personal workbench (data storage and sharing)
  • IRD contains avian and non-human mammalian influenza surveillance data, human clinical data associated with virus extracts, phenotypic characteristics of viruses isolated from extracts, and all genomic and proteomic data available in public repositories for influenza viruses (Mission).
  • It includes both primary and secondary data that appears to be both electronically curated and manually curated in-house. On the Data Sources page it mentions that it uses algorithms to generate different data types. In the NAR paper authors also mention that some of the data is integrated from IRD in-house curation and annotation pipelines.

Maintenance

  • IRD is maintained privately by a team of 29 individuals belonging to Northrop Grumman Health IT, Vecna Technologies, DMID/NIAID/NIH/DHHS, and the J. Craig Venter Institute.
  • It really encourages the public is encouraged to submit its own data using the Workbench and Submit Data options.

Funding

  • IRD is funded by the National Institute of Allergy and Infectious Diseases, National Institutes of Health, and Department of Health and Human Services.
  • It is a collaboration between Northrop Grumman Health and Human Services, J. Craig Venter Institute, and Vecna Technologies.

Scientific quality of the database

Content

  • The content does appear to completely cover its content domain:
"IRD facilitates the research and development of vaccines, diagnostics and therapeutics against influenza virus by providing a comprehensive collection of influenza-related data integrated from various sources, a growing suite of analysis and visualization tools for data mining and hypothesis generation, personal workbench spaces for data storage and sharing, and active user community support."  
  • IRD has data coverage on both birds and mammals.

Usefulness

"The objective of the IRD resource is to provide a one-stop shop for influenza virus data and analysis tools to drive new discoveries about influenza virus transmission, virulence, host range and pathogenesis, and to develop novel strategies for diagnosis, prevention and therapeutic intervention."
  • This database can be used to answer questions about genetic sequences, animal surveillance, immune epitopes, 3D protein structures, phenotype, human clinical metadata, antiviral drugs, and much more.
  • It contains analysis tools that assist biologists in analyzing their own data.

Relevance

  • IRD content is very timely because there is a need in the scientific community for such a database; the influenza virus is a major global public threat with complex processes. IRD allows for an easily accessed compilation of all data corresponding to the influenza virus while at the same time assisting biologists in their own analysis of data.
  • IRD is not completely unique as NCBI covers similar content in their Influenza Virus Database.

Upkeep

  • It is unclear when the database was first released online. However, its most distant publication was in 2006 (IRD Resource Publications and Presentations).
  • While sequences are downloaded from GenBank, curated, and added to the IRD database daily, updates to generated and imported public data are made less often. The most recent updates for this information occurred between May 2017 and October 2017 (IRD Data Updates)

General utility of the database to the scientific community

  1. Are there links to other databases? Which ones?
    Although most of the data is generated by the IRD team, the database also imports data from the following databases:
  2. Is it convenient to browse the data?
    IRD has multiple ways to browse and search the data within their database. They also have a number of tools that make it convenient to refine and analyze the search as well as save the data you are working on to your "workbench" so you can come back to it later.
    1. You can use their quick search tool, to "search for sequence records using any text terms in key text fields and public IDs (e.g. accession numbers) of nucleotide and protein sequence records, strain data, surveillance data, and human clinical metadata."
    2. Or, you search using any of the following, focused, search tools:
      • Sequences & Strains
      • Animal Surveillance
      • 3D Protein Structure Files
      • Human Clinical Metadata
      • Serology Experiments (Beta)
      • Host Factor Data
      • Antiviral Drugs
      • Immune Epitopes
      • Phenotypes
      • PCR Primer Probe Data
      • Sequence Feature Variant Types
      • Human Clinical Studies and Lab Experiments (Beta)
    With all of these tools, you have many options to access any single piece of data. This provides a lot of convenience when it comes too trying to locate anything within their database.
  3. Is it convenient to download the data?
    Yes, once you've used one of their various search tools to find the data point that you need, you can download the data into one of the following formats:
    1. GFF3
    2. Segment FASTA
    3. Gene FASTA
    4. CDS FASTA
    5. Protein FASTA
    • In what file formats are the data provided?
    Listed above.
    • Are they standard or non-standard formats?
    All file formats that they provide are standard in bioinformatics.
  4. Evaluate the “user-friendliness” of the database: can a naive user quickly navigate the website and gather useful information?
    It is a pretty user-friendly database. They have done a good job making all of their search and download tools very obvious and easy to use. Obviously is someone who has no biological background would have a hard time searching data, but that doesn't make the site not user-friendly.
    • Is the website well-organized?
    For the most part everything is clearly labeled and organized well. They have five main sections right under their logo which help direct you to the part of the site that you are looking for. The dropdown menus under each of these sections has labels that aren't confusing, and therefore take you to exactly the part of the site that you think you are going to.
    • Does it have a help section or tutorial?
    Yes, one of the five main sections is labeled help (easily spotted and right at the top-center of the page) and has the following sub-sections:
    1. Help Manual
    2. Tutorials & Training Materials
    3. Frequently Asked Questions
    4. IRD Computational Protocols
    5. IRD Glossary
    6. Contact Us
    7. Cite IRD
    Their help manual is very extensive with detailed written instructions on how to access/use any part of the site. Their Tutorials and Training Materials page is also very helpful because it provides links to video instructions on how to do the most used tasks within IRD.
    • Are the search options sensible?
    Yes, look to above section for more.
    • Run a sample query. Do the results make sense?
  5. Access: Is there a license agreement or any restrictions on access to the database?

Summary judgment

  1. Would you direct a colleague unfamiliar with the field to use it?
  2. Is this a professional or hobby database?

Electronic notebook

For this assignment, we chose the Influenza Research Database after class and did individual research on it. We then met up in the library and began working on the database's wiki page. First, we titled the page "Influenza Research database" and linked both of our user pages to this page. We then made an outline with individual headings corresponding to each of the main sections (this allowed us to work on the wiki at the same time by just being able to edit individual sections). We read the corresponding paper in NAR to get a general background for usage of the database and then began exploring the database itself. Emma worked on the first two sections (general information and scientific quality) while Eddie focused on the final one (general utility of the database to the scientific community). Information was gathered by clicking on different tabs in the database and randomly browsing/exploring. Once we finished our individual sections for the analysis of the database, we started constructing our powerpoint presentation on google slides. We used the same basic format and layout in our powerpoint as we did in our wiki page.

Acknowledgements

  1. Emma and Eddie thank each other for working on this assignment together. We met outside of class to complete the assignment and divide up the presentation. We also plan to meet up in the future to practice.
  2. We would like to thank Dr. Dahlquistand Dr. Dionisio for their guidance and support
  3. While we worked with the people noted above, this journal entry was completed by Emma and Eddie and not copied from another source.

Emmatyrnauer (talk) 16:48, 4 October 2017 (PDT)

References

  1. LMU BioDB 2017. (2017). Week 5. Retrieved October 4, 2017, from https://xmlpipedb.cs.lmu.edu/biodb/fall2017/index.php/Week_5.
  2. Influenza Research Database: update 2017. Retrieved October 4, 2017, from https://www.fludb.org/brc/home.spg?decorator=influenza.
  3. Yun Zhang, Brian D. Aevermann, Tavis K. Anderson, David F. Burke, Gwenaelle Dauphin, Zhiping Gu, Sherry He, Sanjeev Kumar, Christopher N. Larsen, Alexandra J. Lee, Xiaomei Li, Catherine Macken, Colin Mahaffey, Brett E. Pickett, Brian Reardon, Thomas Smith, Lucy Stewart, Christian Suloway, Guangyu Sun, Lei Tong, Amy L. Vincent, Bryan Walters, Sam Zaremba, Hongtao Zhao, Liwei Zhou, Christian Zmasek, Edward B. Klem, Richard H. Scheuermann; Influenza Research Database: An integrated bioinformatics resource for influenza virus research, Nucleic Acids Research, Volume 45, Issue D1, 4 January 2017, Pages D466–D474. Retrieved October 4, 2017 from https://doi.org/10.1093/nar/gkw857.