Monarch Initiative Week 4
To User Page: User: Asandle1, User: Kmill104
To Assignment Page: Week 4
https://academic.oup.com/nar/article/52/D1/D938/7449493
Contents
Database Evaluation
Andrew doing 1 and 3, Katie 2 and 4 For your assignment, create a new wiki page to profile your database. For this week, there will be one page per set of partners; both partners will contribute content and notes for their electronic lab notebook to the same page; you do not need to have separate individual journal entries for this week.
- The name of your page should be "Database name Week 4".
Read the article about the database from the Nucleic Acids Research journal and then go online to the database itself. In keeping with Academic Honesty and citation practices, when you answer the questions below, provide a hyperlink to the page that you got the information from. There should be at least one hyperlink per answer.
General information about the database
- What is the name of the database? Monarch Initiative Monarch Initiative Front Page
- What type (or types) of database is it? The Monarch Initiative integrates gene, disease, and phenotype data. The database combines knowledge from across sources to reveal how they are connected. The database intends to show how these connections can tell us the causes and mechanisms of human disease. Nucleic Acids Research Article
- What biological information (type of data) does it contain? (sequence, structure, model organism, or specialty [what?]) Pages contain general descriptions of either a specific gene, disease, or phenotype, as well as a section for additional names that it may be referred to as. Each page has association links to the other types of data housed in the database, which are organized in association tables. Monarch Initiative Explore Page
- What type of data source does it have? The Monarch Initiative sources from 33 other databases
- primary versus secondary ("meta")? Secondary, because data comes from other databases that is then collected and organized into their specific format. Nucleic Acids Research ArticlePages have links showing where the data is sourced from. Monarch Initiative Explore Page
- curated versus non-curated? Curated Nucleic Acids Research Article
- if curated, is it electronic versus human curation? Electronically curated, the Monarch Initiative has several programs that work together to collect data from across sources. This data is collected in what is called the Monarch Knowledge Graph, or Monarch KG. Data that is ingested is transformed into the Monarch KG schema. When source files are first downloaded, they pass through a program called Koza, where they are transformed to Biolink and KGX format. The programs also involve entity mapping and merging, which mean that if multiple sources use different identifiers for the same data type, then these identifiers all merge and one data type is expressed. The KG then integrates the gene, disease, and phenotype data, which is then served to the user. Nucleic Acids Research Article
- if human curation, is it in-house staff versus community curation? N/A
- What individual or organization maintains the database?
- public versus private
- large national or multinational entity or small lab group
- What is their funding source(s)?
Scientific quality of the database
- Does the content appear to completely cover its content domain?
- How many records does the database contain?
- Does the content appear to completely cover its content domain?
It is not completely clear how many databases the database contains. This is something that could be better communicated. That being said when opening the search bar it shows 845539 results. It is possible that this is the total number of entries in the database. Search Page
- What claims do the database owners make about coverage in the corresponding paper?
- What species are covered in the database? (If it is a very long list, summarize.)
The database covers and connects "phenotypes to genotypes across species". About The exact species number and types are not abundantly clear but it definitely covers humans.
- Is the database content useful? I.e., what biological questions can it be used to answer?
It is useful, how useful is more difficult to determine without having a particular use case that requires it to test it out. The database helps merge data from different scientific research fields onto one platform to help inform research and improve data organization to make more informed clinical decisions. About
- Is the database content timely?
It is unclear. They provide information at the bottom banner of the page showing they have made updates to the actual site in 2024 and that this tool is a work in progress, but they do not provide any information about the actual data, other then that they have an API, or more accurately multiple API's pulling data from other databases. They mention a different number of databases in their research paper and the page on the site that shows the list. You can find the list of databases they mention on their page here.
- Is there a need in the scientific community for such a database at this time?
User: Kmill104
- Is the content covered by other databases already?
- How current is the database?
- When did the database first go online?
- How often is the database updated?
- When was the last update?
They are no very forthcoming or clear about which species/model organisms they are using other than humans.
General utility of the database to the scientific community
- Are there links to other databases? Which ones?
- Is it convenient to browse the data?
- Is it convenient to download the data?
- In what file formats are the data provided?
- What type of files, indicated by the file extension (e.g., .txt, .xml., etc.)?
- Are they standard or non-standard formats? (i.e., are they following an approved standard for that type of data)?
- In what file formats are the data provided?
- Evaluate the “user-friendliness” of the database: can a naive user quickly navigate the website and gather useful information?
- Is the website well-organized?
- Does it have a help section or tutorial?
- Are the search options sensible?
- Run a sample query. Do the results make sense?
- Access: Is there a license agreement or any restrictions on access to the database?
Summary judgment
- Would you direct a colleague unfamiliar with the field to use it?
- Is this a professional or "hobby" database? The "hobby" analogy means that it was that person's hobby to make the database. It could mean that it is limited in scope, done by one or a few persons, or seems amateur.
Some Definitions
- Electronic curation occurs when someone writes a program to add information to a database record from another database.
- Manual curation occurs when a human reviews the information being added to a record to validate it as true.
- In-house is when the human works for the database organization.
- Community is when the database allows members of the scientific community that don't work for the database organization to add information to the record.