Difference between revisions of "IMG/VR Database"

From LMU BioDB 2017
Jump to: navigation, search
(Electronic Lab Notebook: added notes)
(Electronic Lab Notebook: more notes on procedure)
Line 37: Line 37:
 
#We chose IMG/VR because of our interest in viruses.
 
#We chose IMG/VR because of our interest in viruses.
 
#Found the name of the database and went through names of viruses to see what would come up.
 
#Found the name of the database and went through names of viruses to see what would come up.
 +
#We tried popular names e.g. ebola and nothing appeared.
 +
#We then tried names they suggested when using the search engine.
 +
#We then found it has both primary data from IMG and secondary from other databases such as UniProt.
 +
#It was difficult to validate how often the database was updated except for when the system was updated.
 +
#Seeing the date of the study allowed us to see there were genomes added in the year 2017 therefore confirming its curation.
 
#
 
#
  

Revision as of 05:21, 3 October 2017

Arash Lari

Antonio Porras

General Information

  1. Name: [IMG/VR]
  2. Type of database: Integrated & Primary Database
  3. Biological information: Domain (Microbiome, Bacteria, Archaea, Eukarya, Plasmids, GFragment, Viruses), Genome, Genome Composition, Habitat Type, DNA Nucleotide Composition, Ecosystem, Protein Coding Genes, Families, Chromosome Map
  4. Type of data source: Both primary and secondary data from other curated databases (e.g. [Untiprot]).
  5. Organization: [U.S. Department of Energy Joint Genome Institute] is a government (DOE) funded organization that provides publicly available information.
  6. Funding sources: Primarily by the DOE Office of Biological and Environmental Research
JGI's expenses breakdown

Scientific Quality

How data is typically represented on a page
  1. Does it appear to completely cover its content domain? It does appear to be comprehensive as they claim it to be the largest publicly available database of isolate reference DNA viruses along with identified viral contigs. The database contains 3908 viral isolates and 264,413 viral contigs (Paez-Espino, et al., 2016).
  2. What species are covered in the database? Viruses are categorized further into their host-associated organisms.
  3. Is the database content useful? In the correct hands, e.g. professional researchers, it can be useful as it provides in depth information and data from their own institute and other comprehensive databases. It can be used to answer questions regarding viral genomic information.
  4. Is the database content timely? The content itself appears to be updated regularly with new genomic viral data as recent as 2017. However, the most recent version of data management and analysis system Integrated Microbial Genomes (IMG) was released in 2008. It first went online in March, 2005. We believe there is a current need for this database in the scientific community as it importantly provides connections to putative hosts and habitat types. It also allows for visualization of meta and primary data on viruses. Content is covered by many other databases as it uses data from said databases in their database and provides data from their own research and makes it publicly available.

General Utility

  1. Are there links to other databases? Which ones? This database links to several other databases, the full list of which can be found here.
  2. Is it convenient to browse the data? While it's not difficult to browse the database, it is not intuitive for non specialized users.
  3. Is it convenient to download the data? It is convenient and easy to download once you register an account to JGI. It provides a plethora of files with different information and sequence data, typically in a multi-fasta or a tab-delimited format. This is detailed more on this page. Multi-Fasta format files are for genetic sequences, and tab-delimited files are simple text format files that stores data in a tabular structure. These aren't common file formats for the general public but they are standard in gene sequencing.
  4. Evaluate the “user-friendliness” of the database: can a naive user quickly navigate the website and gather useful information? The database is organized in such a way that experienced, knowledgeable users who know what they're looking for can find it with relative ease, but a naive user would have some trouble as it is quite technical and specific. Having said that, it does have help and tutorials to help users navigate the website, which helps. The search options are quite sensible and helpful but only if you know what you're looking for, this site was not meant for users that don't know very much about biology.
  5. Access: Is there a license agreement or any restrictions on access to the database? According to the IMG/M: "IMG/M can be accessed without a login and password for searching and analyzing public datasets; dataset downloads, data exports and other advanced tools are provided via IMG/M ER which can be accessed with login/password. "

Summary Judgement

  1. Would you direct a colleague unfamiliar with the field to use it? We would not direct a colleague that is unfamiliar with the field to use this database as it is a highly specific and technical database. For example Arash, the non biology partner in this assignment, had very little understanding of the content of the website and therefore was not able to make use of it. The other partner, Antonio, who understands biology much more still had some confusion about the relevance of certain data points as he does not have a background in viral genomic research.
  2. Is this a professional or hobby database? This is most definitely a professional database, there is very little possibility for this database to be used in any non professional or research work.

Electronic Lab Notebook

  1. We first looked at which databases had articles in the year 2017 and cross-referenced which databases we were interested in.
  2. We chose IMG/VR because of our interest in viruses.
  3. Found the name of the database and went through names of viruses to see what would come up.
  4. We tried popular names e.g. ebola and nothing appeared.
  5. We then tried names they suggested when using the search engine.
  6. We then found it has both primary data from IMG and secondary from other databases such as UniProt.
  7. It was difficult to validate how often the database was updated except for when the system was updated.
  8. Seeing the date of the study allowed us to see there were genomes added in the year 2017 therefore confirming its curation.

Acknowledgements

  1. We, Arash Lari and Antonio Porras met outside of class multiple times to assess the scientific and general utility of IMG/VR. Furthermore we worked together on the presentation and practiced together prior to class.

While we worked with the people noted above, this individual journal entry was completed by Arash Lari and Antonio Porras and not copied from another source.

Aporras1 (talk) 22:03, 2 October 2017 (PDT)

References

  1. DOE Joint Genome Institute. (2017). DOE Joint Genome Institute: A DOE Office of Science User Facility of Lawrence Berkeley National Laboratory. [online] Available at: https://jgi.doe.gov/ [Accessed 1 Oct. 2017].
  2. Img.jgi.doe.gov. (2017). JGI IMG Home. [online] Available at: https://img.jgi.doe.gov/cgi-bin/vr/main.cgi [Accessed 1 Oct. 2017].
  3. LMU BioDB 2017. (2017). Week 5. Retrieved October 01, 2017, from https://xmlpipedb.cs.lmu.edu/biodb/fall2017/index.php/Week_5
  4. Paez-Espino, D., Chen, I., Palaniappan, K., Ratner, A., Chu, K., Szeto, E., Pillay, M., Huang, J., Markowitz, V., Nielsen, T., Huntemann, M., K. Reddy, T., Pavlopoulos, G., Sullivan, M., Campbell, B., Chen, F., McMahon, K., Hallam, S., Denef, V., Cavicchioli, R., Caffrey, S., Streit, W., Webster, J., Handley, K., Salekdeh, G., Tsesmetzis, N., Setubal, J., Pope, P., Liu, W., Rivers, A., Ivanova, N. and Kyrpides, N. (2017). IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses.