Monarch Initiative Week 4

From LMU BioDB 2024
Revision as of 11:28, 7 February 2024 by Kmill104 (talk | contribs) (General information about the database: fixing formatting)
Jump to navigation Jump to search

To User Page: User: Asandle1, User: Kmill104

To Assignment Page: Week 4

https://academic.oup.com/nar/article/52/D1/D938/7449493


Database Evaluation

Andrew sections 2 and 4, Katie sections 1 and 3 For your assignment, create a new wiki page to profile your database. For this week, there will be one page per set of partners; both partners will contribute content and notes for their electronic lab notebook to the same page; you do not need to have separate individual journal entries for this week.

  • The name of your page should be "Database name Week 4".

Read the article about the database from the Nucleic Acids Research journal and then go online to the database itself. In keeping with Academic Honesty and citation practices, when you answer the questions below, provide a hyperlink to the page that you got the information from. There should be at least one hyperlink per answer.

General information about the database

    1. What is the name of the database? Monarch Initiative Monarch Initiative Front Page
    2. What type (or types) of database is it? The Monarch Initiative integrates gene, disease, and phenotype data. The database combines knowledge from across sources to reveal how they are connected. The database intends to show how these connections can tell us the causes and mechanisms of human disease. Nucleic Acids Research Article
      1. What biological information (type of data) does it contain? (sequence, structure, model organism, or specialty [what?]) Pages contain general descriptions of either a specific gene, disease, or phenotype, as well as a section for additional names that it may be referred to as. Each page has association links to the other types of data housed in the database, which are organized in association tables. Monarch Initiative Explore Page
      2. What type of data source does it have? The Monarch Initiative sources data from 33 other databases Nucleic Acids Research Article
        • primary versus secondary ("meta")? Secondary, because data comes from other databases that is then collected and organized into their specific format. Nucleic Acids Research Article

Pages have links showing where the data is sourced from. Monarch Initiative Explore Page

        • curated versus non-curated? Curated Nucleic Acids Research Article
          • if curated, is it electronic versus human curation? Electronically curated, the Monarch Initiative has several programs that work together to collect data from across sources. This data is collected in what is called the Monarch Knowledge Graph, or Monarch KG. Data that is ingested is transformed into the Monarch KG schema. When source files are first downloaded, they pass through a program called Koza, where they are transformed to Biolink and KGX format. The programs also involve entity mapping and merging, which mean that if multiple sources use different identifiers for the same data type, then these identifiers all merge and one data type is expressed. The KG then integrates the gene, disease, and phenotype data, which is then served to the user. Nucleic Acids Research Article
            • if human curation, is it in-house staff versus community curation? N/A
    1. What individual or organization maintains the database? The database is an "international consortium", meaning their team comes from several international organizations. In order to find the team the user has to navigate to the website's About section, and then click on their Team page. The Monarch Initiative credits their team as coming from 12 various schools and institutes. These are University of Colorado, Lawrence Berkeley National Laboratory, The Jackson Laboratory, European Bioinformatics Institute, Queen Mary University of London, John Hopkins University, C-Path, and Renaissance Computing Institute. They also credit their consultants, scientific advisory board, alumni members, alumni groups. Monarch Initiative Team The website is much more clear about who is on their team, while the article only has their team members in the Authors section and and never directly mention any of these organizations. Nucleic Acids Research Article
      • public versus private. The database is public, there is no requirement to pay or to have some kind of membership to access their data Nucleic Acids Research Article
      • large national or multinational entity or small lab group? The database is ran by multinational schools and institutes. Monarch Initiative Team
    2. What is their funding source(s)? The website credits the Office of the Director National Institute of Health, the NIH National Human Genome Research Institute, and the NIH National Library of Medicine. Monarch Initiative Team The article credits these but also says that several of the authors were partly supported by the Director, Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy. It also states that the Critical Path Institute is supported by the Food and Drug Administration (FDA) of the U.S. Department of Health and Human Services. This shows that the website is not clearly disclosing where their funding is coming from. Nucleic Acids Research Article

Scientific quality of the database

    1. Does the content appear to completely cover its content domain?
      • How many records does the database contain?

It is not completely clear how many databases the database contains. This is something that could be better communicated. That being said when opening the search bar it shows 845539 results. It is possible that this is the total number of entries in the database. Search Page

      • What claims do the database owners make about coverage in the corresponding paper?
    1. What species are covered in the database? (If it is a very long list, summarize.)

The database covers and connects "phenotypes to genotypes across species". About The exact species number and types are not abundantly clear but it definitely covers humans.

    1. Is the database content useful? I.e., what biological questions can it be used to answer?

It is useful, how useful is more difficult to determine without having a particular use case that requires it to test it out. The database helps merge data from different scientific research fields onto one platform to help inform research and improve data organization to make more informed clinical decisions. About


    1. Is the database content timely?

It is unclear. They provide information at the bottom banner of the page showing they have made updates to the actual site in 2024 and that this tool is a work in progress, but they do not provide any information about the actual data, other then that they have an API, or more accurately multiple API's pulling data from other databases. They mention a different number of databases in their research paper and the page on the site that shows the list. You can find the list of databases they mention on their page here. Here is another outdated list from the old system Old Data Sources

      • Is there a need in the scientific community for such a database at this time? We would say there is a need in the scientific community for such a database. There is a multitude of data across scientific fields, but it can be difficult to spot associations between data types if they are comprised in different databases. The goal of the database is to combine and compare data in order to evaluate what causes disease and aid in disease diagnosis. Being able to easily identify these associations is a way to reduce time spent on determining what exactly causes a disease, as well as identifying the trends of certain genes and their phenotypic outcomes that may not have been previously recognized. The downside to this database is that it doesn't really introduce any new data, only organizes it. In this way, it could be considered non-essential because its information is already out there. We would still argue that it is essential because it compiles thousands of data types and shows how they are associated into a format that someone from any field could understand.
      • Is the content covered by other databases already?

Yes and no, it is importing content from other databases into itself and providing a different organizational format. Previous Website Source List

    1. How current is the database?

It was last updated in 2024, in terms of individual entries it is not shown.

      • When did the database first go online?

It is not abundantly clear, the website shows it existed in 2020, the earliest paper related to the Monarch Initiative they have directly linked is from 2016. First Journal Paper

      • How often is the database updated?

Again, it is not abundantly clear, the only real mention is that is it constantly under work and being improved, in terms of the website. It seems it is just the API so it is dependent on the databases it sources from.

      • When was the last update?

Again it is not completely clear, but the site was updated this year (2024).

They are no very forthcoming or clear about which species/model organisms they are using other than humans.

General utility of the database to the scientific community

    1. Are there links to other databases? Which ones?
    2. Is it convenient to browse the data?
    3. Is it convenient to download the data?
      • In what file formats are the data provided?
        • What type of files, indicated by the file extension (e.g., .txt, .xml., etc.)?
        • Are they standard or non-standard formats? (i.e., are they following an approved standard for that type of data)?
    4. Evaluate the “user-friendliness” of the database: can a naive user quickly navigate the website and gather useful information?
      • Is the website well-organized?
      • Does it have a help section or tutorial?
      • Are the search options sensible?
      • Run a sample query. Do the results make sense?
    5. Access: Is there a license agreement or any restrictions on access to the database?

Summary judgment

    1. Would you direct a colleague unfamiliar with the field to use it?

Yes I would, I think it does provide really good information and at worst the data might be a couple of years old in some cases, although it may all be constantly updated, it is just unclear. In terms of clinicians which seems to be one of the groups they thought about when building this tool, I think it actually could be immensely helpful providing they watch the short video on using the tool. Video Here

    1. Is this a professional or "hobby" database? The "hobby" analogy means that it was that person's hobby to make the database. It could mean that it is limited in scope, done by one or a few persons, or seems amateur.

Some Definitions

  • Electronic curation occurs when someone writes a program to add information to a database record from another database.
  • Manual curation occurs when a human reviews the information being added to a record to validate it as true.
    • In-house is when the human works for the database organization.
    • Community is when the database allows members of the scientific community that don't work for the database organization to add information to the record.