IMG/VR Week 5
Contents
- 1 Presentation
- 2 General information about the database
- 2.1 1. What is the name of the database? (link to the home page)
- 2.2 2. What type (or types) of database is it? [https://img.jgi.doe.gov/cgi-bin/vr/main.cgi]
- 2.2.1 a. What biological information (type of data) does it contain? (sequence, structure, model organism, or specialty [what?])
- 2.2.2 b. What type of data source does it have? primary versus secondary ("meta")
- 2.2.3 c. Curated versus non-curated? If curated, is it electronic versus human curation? If human curation, is it in-house staff versus community curation?
- 2.3 3.What individual or organization maintains the database? [| https://img.jgi.doe.gov/cgi-bin/vr/main.cgi]
- 2.4 4. What is their funding source(s)? [| https://img.jgi.doe.gov/cgi-bin/vr/main.cgi]
- 3 Scientific quality of the database
- 3.1 1. Does the content appear to completely cover its content domain?
- 3.2 2. What species are covered in the database? (If it is a very long list, summarize.)
- 3.3 3. Is the database content useful? I.e., what biological questions can it be used to answer?
- 3.4 4.Is the database content timely?
- 3.5 5.How current is the database? [| https://img.jgi.doe.gov/cgi-bin/vr/main.cgi]
- 4 General utility of the database to the scientific community
- 4.1 1. Are there links to other databases? Which ones? https://img.jgi.doe.gov/
- 4.2 2. Is it convenient to browse the data? [| https://img.jgi.doe.gov/cgi-bin/m/main.cgi?section=TreeFile&page=domain&domain=all]
- 4.3 3. Is it convenient to download the data? In what file formats are the data provided? What type of files, indicated by the file extension (e.g., .txt, .xml., etc.)? Are they standard or non-standard formats? (i.e., are they following an approved standard for that type of data)? [| https://img.jgi.doe.gov/cgi-bin/vr/main.cgi]
- 4.4 4. Evaluate the “user-friendliness” of the database: can a naive user quickly navigate the website and gather useful information?Is the website well-organized? Does it have a help section or tutorial? Are the search options sensible? Run a sample query. Do the results make sense? [| https://img.jgi.doe.gov/cgi-bin/m/main.cgi?section=PhylogenProfiler&page=phyloProfileForm]
- 4.5 5.Access: Is there a license agreement or any restrictions on access to the database?
- 5 Summary judgment
- 6 Acknowledgments
- 7 References
Presentation
File:CDAMpresentationweek5.pdf
It is uploaded in both powerpoint and pdf form.
General information about the database
1. What is the name of the database? (link to the home page)
IMG/VR [https://img.jgi.doe.gov/cgi-bin/vr/main.cgi]
2. What type (or types) of database is it? [https://img.jgi.doe.gov/cgi-bin/vr/main.cgi]
a. What biological information (type of data) does it contain? (sequence, structure, model organism, or specialty [what?])
IMG/VR is a database of viral DNA for sequencing and analysis. The site organizes these DNA strands according to which part of the human body the virus infects, the ecosystem that the virus inhabits, and the host associated with the virus.
b. What type of data source does it have? primary versus secondary ("meta")
IMG/VR offers secondary source data, as the genomic information of this database was collected from outside sources that sequenced the DNA.
c. Curated versus non-curated? If curated, is it electronic versus human curation? If human curation, is it in-house staff versus community curation?
The database does appear to be curated as data needs to be submitted to and approved by the Regents of the University of California before it can be uploaded into IMG/VR.
3.What individual or organization maintains the database? [| https://img.jgi.doe.gov/cgi-bin/vr/main.cgi]
IMG/VR is a public database maintained by The Regents of the University of California.
4. What is their funding source(s)? [| https://img.jgi.doe.gov/cgi-bin/vr/main.cgi]
Since it is a public database, IMG/VR is publically funded by the state of California.
Scientific quality of the database
1. Does the content appear to completely cover its content domain?
Content domain: “annotation, analysis, and distribution” of genome and microbiome datasets [| https://img.jgi.doe.gov]
How many records does the database contain?
8389 cultivated reference virus [| https://img.jgi.doe.gov/cgi-bin/vr/main.cgi]
What claims do the database owners make about coverage in the corresponding paper?
GenBank are processed through the IMG submission system [[1]] and IMG annotation pipeline before being integrated into the IMG data warehouse.
2. What species are covered in the database? (If it is a very long list, summarize.)
Viruses [| https://img.jgi.doe.gov/cgi-bin/vr/main.cgi]
3. Is the database content useful? I.e., what biological questions can it be used to answer?
It can be used to answer questions of comparative analysis between different genome datasets. With thousands of datasets and millions of genes, its analytic tools can be used to answer how species are similar and different. There is also specificity in this as the database allows you to see characteristics of the genome that can be used for comparison.
4.Is the database content timely?
Is there a need in the scientific community for such a database at this time?
Yes. There is a need for public access in the scientific community of the genomes of different species. This can lead to more discovery about how these organisms are related to one another and can potentially lead to ground breaking research if a model organism is discovered.
Is the content covered by other databases already?
There are other websites that also have genome datasets of various microorganisms, including viruses. When searching virus genome databases, multiple databases appear that offer this information. However, IMG/VR offers the largest database for viral genomes. [[2]]
5.How current is the database? [| https://img.jgi.doe.gov/cgi-bin/vr/main.cgi]
When did the database first go online?
The database first went online in 2016. [| https://academic.oup.com/nar/article/45/D1/D457/2333907]
How often is the database updated?
It is updated on a quarterly basis. [[3]]
When was the last update?
The last update was in September 2019. [| scroll to bottom]
General utility of the database to the scientific community
1. Are there links to other databases? Which ones? https://img.jgi.doe.gov/
The database links to NCBI BLAST, IGM/M, IGM/M ER, and IGM/ABC for analyzing, sequencing, and comparing the genes and genomes of other organisms.
2. Is it convenient to browse the data? [| https://img.jgi.doe.gov/cgi-bin/m/main.cgi?section=TreeFile&page=domain&domain=all]
Given that the single gene database was malfunctioning, the data can be pretty difficult to navigate for new users, unless they are gathering information regarding entire genomes. The site offers creative ways to navigate the data, such as allowing users to isolate certain viral DNA fragments according to which part of the human body they infect with a map of the human body.
3. Is it convenient to download the data? In what file formats are the data provided? What type of files, indicated by the file extension (e.g., .txt, .xml., etc.)? Are they standard or non-standard formats? (i.e., are they following an approved standard for that type of data)? [| https://img.jgi.doe.gov/cgi-bin/vr/main.cgi]
Once the intended information has been gathered on the website, the data can be exported relatively easily as a standard excel document in a tab-delimited text format.
The website unloads a lot of unfamiliar and niche terms on new users and thus, we do not consider it to be very user friendly, given that we had a lot of difficulty navigating the database and simply trying to access the data that IMG/VR offers. Many of the tabs and pages on IMG/VR require the user to have an understanding of bioinformatics and DNA sequencings. The pages had many unfamiliar terms for us, and we found ourselves repeatedly turning to the "Help" tab for guidance. Nonetheless, the website is relatively organized. Each tab on the home page gives the user an idea of the functionality of the database and branches into subtabs that allow users to navigate options regarding how they would prefer to compare, analyze, or sequence certain genes. For new users, the website offers a document that details how to use the database. The database also allows users to search for genomes and genes according to the taxonomy and ecosystem of the organism in question. We ran a query to search for viral strains that that infected the nasal region of the human body and found data pertaining to the genomic information regarding various strains relatively easily. The single gene function on the website was not functioning and so, we could not run a query on a single gene.
5.Access: Is there a license agreement or any restrictions on access to the database?
While there is no license agreement for the database, the single gene browser was not functioning correctly, and thus, we were not able to access data regarding single genes within a genome. However, it says in their policy that if you publish any articles to do with the genome dataset, "the principal investigator" must give you permission. [| https://jgi.doe.gov/user-programs/pmo-overview/policies/]
Summary judgment
1. Would you direct a colleague unfamiliar with the field to use it?
I would not recommend this database to an unfamiliar colleague. While useful in analyzing and comparing DNA sequences, the database contains loads of niche information and pages that are not very comprehensible, particularly for someone who has little experience in bioinformatics and general biology. We had difficulty navigating the website because certain pages were not functioning and had to rely on the “Help” tab to understand how to use it.
2. Is this a professional or "hobby" database? The "hobby" analogy means that it was that person's hobby to make the database. It could mean that it is limited in scope, done by one or a few persons, and seems amateur.
IMG/VR seems to be a professional database that offers genomic information to scientists across the world. It encourages such researchers to analyze, distribute, and annotate the information they provide and requests that the website be properly cited, suggesting that it is meant for usage in scientific writings.
Acknowledgments
- Dr. Dahlquist; professor
- We worked together on answering the questions and met outside of class to go over the questions and work on our presentation.
- "Except for what is noted above, this individual journal entry was completed by me and not copied from another source."
Cdomin12 (talk) 16:50, 28 September 2019 (PDT)
Ymesfin (talk) 17:01, 28 September 2019 (PDT)
References
Integrated Microbial Genomes & Microbiomes/VR. (2019). Retrieved October 1, 2019, from https://img.jgi.doe.gov/cgi-bin/vr/main.cgi%7CIMG/VR
LMU BioDB 2019. (2019). Week 5. Retrieved October 1, 2019, from https://xmlpipedb.cs.lmu.edu/biodb/fall2019/index.php/Week_5