Difference between revisions of "Monarch Initiative Week 4"

From LMU BioDB 2024
Jump to navigation Jump to search
(General information about the database: sectioning)
(Acknowledgements: fixing links)
 
(51 intermediate revisions by 2 users not shown)
Line 3: Line 3:
 
To Assignment Page: [[Week 4]]
 
To Assignment Page: [[Week 4]]
  
https://academic.oup.com/nar/article/52/D1/D938/7449493
+
[https://academic.oup.com/nar/article/52/D1/D938/7449493 Nucleic Acids Research Article]
 +
 
 +
[https://docs.google.com/presentation/d/1C74eHcvpTo0rnbQFcUnDsmG_AIPpIqLvM0t-DV95Obk/edit#slide=id.g1f0192797b8_0_30 Slides]
  
 
[[Category:Shared]]
 
[[Category:Shared]]
Line 9: Line 11:
  
 
=== Database Evaluation ===
 
=== Database Evaluation ===
Andrew doing 1 and 3, Katie 2 and 4
+
Andrew did sections 2 and 4, Katie did sections 1 and 3
 +
 
 
For your assignment, create a new wiki page to profile your database.  For this week, there will be one page per set of partners; both partners will contribute content and notes for their electronic lab notebook to the same page; you do not need to have separate individual journal entries for this week.
 
For your assignment, create a new wiki page to profile your database.  For this week, there will be one page per set of partners; both partners will contribute content and notes for their electronic lab notebook to the same page; you do not need to have separate individual journal entries for this week.
 
* The name of your page should be "Database name Week 4".
 
* The name of your page should be "Database name Week 4".
  
 
Read the article about the database from the ''Nucleic Acids Research'' journal and then go online to the database itself.  In keeping with Academic Honesty and citation practices, when you answer the questions below, provide a hyperlink to the page that you got the information from.  <u>There should be at least one hyperlink per answer.</u>
 
Read the article about the database from the ''Nucleic Acids Research'' journal and then go online to the database itself.  In keeping with Academic Honesty and citation practices, when you answer the questions below, provide a hyperlink to the page that you got the information from.  <u>There should be at least one hyperlink per answer.</u>
 +
 
=== '''General information about the database'''===
 
=== '''General information about the database'''===
*# What is the name of the database? Monarch Initiative [[https://monarchinitiative.org/ Monarch Initiative]]
+
*# What is the name of the database? Monarch Initiative [https://monarchinitiative.org/ Monarch Initiative Front Page]
*# What type (or types) of database is it?
+
*# What type (or types) of database is it? The Monarch Initiative integrates gene, disease, and phenotype data. The database combines knowledge from across sources to reveal how they are connected. The database intends to show how these connections can tell us the causes and mechanisms of human disease. [https://academic.oup.com/nar/article/52/D1/D938/7449493?login=true Nucleic Acids Research Article]
*## What biological information (type of data) does it contain? (sequence, structure, model organism, or specialty [what?])
+
*## What biological information (type of data) does it contain? (sequence, structure, model organism, or specialty [what?]) Pages contain general descriptions of either a specific gene, disease, or phenotype, as well as a section for additional names that it may be referred to as. Each page has association links to the other types of data housed in the database, which are organized in association tables. [https://monarchinitiative.org/explore Monarch Initiative Explore Page]
*## What type of data source does it have?
+
*## What type of data source does it have? The Monarch Initiative sources data from 33 other databases [https://academic.oup.com/nar/article/52/D1/D938/7449493?login=true Nucleic Acids Research Article]
*##* primary versus secondary ("meta")?
+
*##* primary versus secondary ("meta")? Secondary, because data comes from other databases that is then collected and organized into their specific format. [https://academic.oup.com/nar/article/52/D1/D938/7449493?login=true Nucleic Acids Research Article]
*##* curated versus non-curated?
+
*##**Each node has a link showing where the data is sourced from. [https://monarchinitiative.org/explore Monarch Initiative Explore Page]
*##** if curated, is it electronic versus human curation?
+
*##* curated versus non-curated? Curated [https://academic.oup.com/nar/article/52/D1/D938/7449493?login=true Nucleic Acids Research Article]
*##*** if human curation, is it in-house staff versus community curation?
+
*##** if curated, is it electronic versus human curation? Electronically curated, the Monarch Initiative has several programs that work together to collect data from across sources. This data is collected in what is called the Monarch Knowledge Graph, or Monarch KG. Data that is ingested is transformed into the Monarch KG schema. When source files are first downloaded, they pass through a program called Koza, where they are transformed to Biolink and KGX format. The programs also involve entity mapping and merging, which mean that if multiple sources use different identifiers for the same data type, then these identifiers all merge and one data type is expressed. The KG then integrates the gene, disease, and phenotype data, which is then served to the user. [https://academic.oup.com/nar/article/52/D1/D938/7449493?login=true Nucleic Acids Research Article]
*# What individual or organization maintains the database?
+
*##*** if human curation, is it in-house staff versus community curation? N/A
*#* public versus private
+
*# What individual or organization maintains the database? The database is an "international consortium", meaning their team comes from several international organizations. In order to find the team the user has to navigate to the website's About section, and then click on their Team page. The Monarch Initiative credits their team as coming from 12 various schools and institutes. These are University of Colorado, Lawrence Berkeley National Laboratory, The Jackson Laboratory, European Bioinformatics Institute, Queen Mary University of London, John Hopkins University, C-Path, and Renaissance Computing Institute. They also credit their consultants, scientific advisory board, alumni members, alumni groups. [https://monarchinitiative.org/team#scientific-advisory-board Monarch Initiative Team] The website is much more clear about who is on their team, while the article only has their team members in the Authors section and and never directly mention any of these organizations. [https://academic.oup.com/nar/article/52/D1/D938/7449493?login=true Nucleic Acids Research Article]
*#* large national or multinational entity or small lab group
+
*#* public versus private. The database is public, there is no requirement to pay or to have some kind of membership to access their data [https://academic.oup.com/nar/article/52/D1/D938/7449493?login=true Nucleic Acids Research Article]
*# What is their funding source(s)?
+
*#* large national or multinational entity or small lab group? The database is ran by multinational schools and institutes. [https://monarchinitiative.org/team#scientific-advisory-board Monarch Initiative Team]
 +
*# What is their funding source(s)? The website credits the Office of the Director National Institute of Health, the NIH National Human Genome Research Institute, and the NIH National Library of Medicine. [https://monarchinitiative.org/team#scientific-advisory-board Monarch Initiative Team Page] The article credits these but also says that several of the authors were partly supported by the Director, Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy. It also states that the Critical Path Institute is supported by the Food and Drug Administration (FDA) of the U.S. Department of Health and Human Services. This shows that the website is not clearly disclosing where their funding is coming from. [https://academic.oup.com/nar/article/52/D1/D938/7449493?login=true Nucleic Acids Research Article]
 +
 
 
===Scientific quality of the database===
 
===Scientific quality of the database===
 
*# Does the content appear to completely cover its content domain?
 
*# Does the content appear to completely cover its content domain?
 
*#* How many records does the database contain?
 
*#* How many records does the database contain?
 +
 +
It is not completely clear how many databases the database contains. This is something that could be better communicated. That being said when opening the search bar it shows 845539 results. It is possible that this is the total number of entries in the database. [http://monarchinitiative.org/explore Search Page]
 +
 
*#* What claims do the database owners make about coverage in the corresponding paper?   
 
*#* What claims do the database owners make about coverage in the corresponding paper?   
*# What species are covered in the database? (If it is a ''very'' long list, summarize.)
+
 
*# Is the database content useful? I.e., what biological questions can it be used to answer?
+
Directly quotes from this paper, "The Monarch Initiative" ... "aims to harmonize data across scientific disciplines to reveal disease mechanisms and aid disease diagnosis." "The scope of Monarch is phonemic knowledge in all its forms, including human and model organisms phenotypic data..."
*# Is the database content timely?
+
[https://academic.oup.com/nar/article/52/D1/D938/7449493 Most Recent Paper]
*#* Is there a need in the scientific community for such a database at this time?
+
 
 +
*#* What species are covered in the database? (If it is a ''very'' long list, summarize.)  
 +
 
 +
The database covers and connects "phenotypes to genotypes across species". [http://monarchinitiative.org/about About] The exact species number and types are not abundantly clear but it definitely covers humans.
 +
 
 +
*#* Is the database content useful? I.e., what biological questions can it be used to answer?
 +
 
 +
It is useful, how useful is more difficult to determine without having a particular use case that requires it to test it out. The database helps merge data from different scientific research fields onto one platform to help inform research and improve data organization to make more informed clinical decisions. [http://monarchinitiative.org/about About]
 +
 
 +
 
 +
*#* Is the database content timely?
 +
 
 +
It is unclear. They provide information at the bottom banner of the page showing they have made updates to the actual site in 2024 and that this tool is a work in progress, but they do not provide any information about the actual data, other then that they have an API, or more accurately multiple API's pulling data from other databases. They mention a different number of databases in their research paper and the page on the site that shows the list. You can find the list of databases they mention on their page [https://monarch-initiative.github.io/monarch-documentation/#standards-documentation here]. Here is another outdated list from the old system [https://previous.monarchinitiative.org/about/data-sources Old Data Sources]
 +
 
 +
*#* Is there a need in the scientific community for such a database at this time?  
 +
 
 +
We would say there is a need in the scientific community for such a database. There is a multitude of data across scientific fields, but it can be difficult to spot associations between data types if they are comprised in different databases. The goal of the database is to combine and compare data in order to evaluate what causes disease and aid in disease diagnosis. Being able to easily identify these associations is a way to reduce time spent on determining what exactly causes a disease, as well as identifying the trends of certain genes and their phenotypic outcomes that may not have been previously recognized. The downside to this database is that it doesn't really introduce any new data, only organizes it. In this way, it could be considered non-essential because its information is already out there. We would still argue that it is essential because it compiles thousands of data types and shows how they are associated into a format that someone from any field could understand.
 +
 
 
*#* Is the content covered by other databases already?
 
*#* Is the content covered by other databases already?
*# How ''current'' is the database?
+
 
 +
Yes and no, it is importing content from other databases into itself and providing a different organizational format. [https://previous.monarchinitiative.org/about/data-sources Previous Website Source List]
 +
 
 +
*#* How ''current'' is the database?
 +
 
 +
It was last updated in 2024, in terms of individual entries it is not shown. See this link at the bottom of the page to see the update banner at the bottom. [http://monarchinitiative.org/about 2024 Banner]
 +
 
 
*#* When did the database first go online?
 
*#* When did the database first go online?
 +
 +
It is not abundantly clear, the website shows it existed in 2020, the earliest paper related to the Monarch Initiative they have directly linked is from 2016. [https://pubmed.ncbi.nlm.nih.gov/27899636/ First Journal Paper]
 +
 
*#* How often is the database updated?
 
*#* How often is the database updated?
 +
 +
Again, it is not abundantly clear, the only real mention is that is it under work and being improved, in terms of the website. It seems it is just the API so it is dependent on the databases it sources from. I can't link this because it only showed up as a notification the first time I accessed the website and I closed it out and now it is gone.
 +
 
*#* When was the last update?
 
*#* When was the last update?
* '''General utility of the database to the scientific community'''
+
Again it is not completely clear, but the site was updated this year (2024).
*# Are there links to other databases?  Which ones?
+
 
*# Is it convenient to browse the data?
+
===General utility of the database to the scientific community===
*# Is it convenient to download the data?
+
*# Are there links to other databases?  Which ones? The article credits 33 other databases for their sources, but in the Data Sources section it only mentions 16 of the 33. [https://academic.oup.com/nar/article/52/D1/D938/7449493?login=true Nucleic Acids Research Article] A GitHub link is provided to access a full list of their sources, but when I tried to access it the link was non-functional. From just clicking through nodes on the database, I'm able to see that for a disease node there is a link to the Ontology Search (OLS), OMIM, Disease Ontology (DOID), and NPE databases. [https://monarchinitiative.org/MONDO:0009279 MI Triple-A Syndrome Node] Gene nodes link to the HUGO Gene Nomenclature Committee (HGNC), OMIM, NCBI, Ensembl, and UniProt databases. [https://monarchinitiative.org/HGNC:13666 MI AAAS Gene Node] Phenotypic feature nodes typically only have one link to the Human Phenotype Ontology. [https://monarchinitiative.org/HP:0002571 MI Achalasia Node] The website doesn't seem to have a place where every data source is listed, so it is difficult to know where any additional data is being retrieved from.
*#* In what file formats are the data provided?
+
*# Is it convenient to browse the data? Yes, it is easy to search specific phenotypic features, diseases, and gene sequences. Having the description on each page makes it easy for the user to understand what they are viewing. Association links and tables make it easy to switch between these pages. Even though I was not able to find all of the databases that they use, on each node with links to other databases they are clearly visible and explained. The Breadcrumbs feature at the bottom shows how you got to current page from page history, outlining the connections that the user may have found between the types of data. [https://monarchinitiative.org/explore Monarch Initiative Explore Page]
*#** What type of files, indicated by the file extension (e.g., .txt, .xml., etc.)?
+
*# Is it convenient to download the data? If working directly from the website, it seems the only data a user can download is the association table. The resulting table is huge and difficult to understand when compared to the table seen on the website. [https://monarchinitiative.org/MONDO:0009279 MI Triple-A Syndrome Node] Specific code is available to be downloaded, but this is more concerned with how the database runs rather than the data it contains. [https://academic.oup.com/nar/article/52/D1/D938/7449493?login=true Nucleic Acids Research Article]
*#** Are they standard or non-standard formats?  (i.e., are they following an approved standard for that type of data)?
+
*#* In what file formats are the data provided? The file is a large table [https://monarchinitiative.org/MONDO:0009279 MI Triple-A Syndrome Node]
*# Evaluate the “user-friendliness” of the database: can a naive user quickly navigate the website and gather useful information?
+
*#** What type of files, indicated by the file extension (e.g., .txt, .xml., etc.)? The file extension is .tsv [https://monarchinitiative.org/MONDO:0009279 MI Triple-A Syndrome Node]
*#* Is the website well-organized?
+
*#** Are they standard or non-standard formats?  (i.e., are they following an approved standard for that type of data)? ?
*#* Does it have a help section or tutorial?   
+
*# Evaluate the “user-friendliness” of the database: can a naive user quickly navigate the website and gather useful information? Yes, the introduction page clearly explains what Monarch is and how to access the type of data a user may want. [https://monarchinitiative.org/ Monarch Initiative Front Page] Every node/page is clearly titled as to what it will contain, description is then given, and then any additional links are below. Depending on the data type, they have different additional links which may be confusing, but each link has a title above it to explain its purpose. The associations table shows how everything is interconnected which is easy to understand and then compare. The Breadcrumbs feature is especially helpful in tracking connections and how you got to your current page. [https://monarchinitiative.org/explore Monarch Initiative Explore Page]
*#* Are the search options sensible?
+
*#* Is the website well-organized? Yes, front page tells you what you can do with their website and what kind of data it contains. The search bar is at the top of the page, which can then take you to specific “nodes”. [https://monarchinitiative.org/ Monarch Initiative Front Page] Each node is well-organized so that you first see its description, helpful links, and associations it has with other nodes of the database. [https://monarchinitiative.org/explore Monarch Initiative Explore Page]
*#* Run a sample query.  Do the results make sense?
+
*#* Does it have a help section or tutorial?  Yes, there are several video tutorials on the first page. [https://monarchinitiative.org/ Monarch Initiative Front Page] There is also a help page, where you can provide feedback or contact the help desk through GitHub. The help desk does require a GitHub account, which is limiting to those who do not have an account and can therefore not ask specific questions. [https://monarchinitiative.org/help Monarch Initiative Help Page]
*# Access:  Is there a license agreement or any restrictions on access to the database?
+
*#* Are the search options sensible? Yes, in the search bar it says “Gene, disease, phenotype, etc.” It tells the user what they should search so they are not confused on what the database contains. [https://monarchinitiative.org/explore Monarch Initiative Explore Page]
 +
*#* Run a sample query.  Do the results make sense? Yes, I searched triple-A syndrome and the page I ended up on gave me a description along with other useful information like the type of heritability it is associated with, causal genes, other names, and additional links to its URI and a clinical synopsis from another database. It also provided the number of associations to phenotypes, causal genes, and correlated genes. The Association table showed which phenotypic features the disease has, which can be clicked on to compare to other diseases that also have that phenotypic feature. [https://monarchinitiative.org/MONDO:0009279 MI Triple-A Syndrome Node]
 +
*# Access:  Is there a license agreement or any restrictions on access to the database? No, all Monarch resources are available to everyone. You do not need to register for the website, and there are multiple ways to access and reuse their data. The article contains links to their data resources, which seem to primarily be on GitHub. Their code is openly shared on GitHub. I did click on one link associated with the Zenodo Deposit, but it it is currently non-functioning. [https://academic.oup.com/nar/article/52/D1/D938/7449493?login=true Nucleic Acids Research Article]
  
 
===Summary judgment===
 
===Summary judgment===
 
*# Would you direct a colleague unfamiliar with the field to use it?
 
*# Would you direct a colleague unfamiliar with the field to use it?
 +
Yes I would, I think it does provide really good information and at worst the data might be a couple of years old in some cases, although it may all be constantly updated, it is just unclear. In terms of clinicians which seems to be one of the groups they thought about when building this tool, I think it actually could be immensely helpful providing they watch the short video on using the tool.
 +
[https://www.youtube.com/watch?time_continue=39&v=SuUKqG2tbx0&embeds_referring_euri=http%3A%2F%2Fmonarchinitiative.org%2F&feature=emb_title Video Here]
 +
 
*# Is this a professional or "hobby" database?  The "hobby" analogy means that it was that person's hobby to make the database. It could mean that it is limited in scope, done by one or a few persons, or seems amateur.
 
*# Is this a professional or "hobby" database?  The "hobby" analogy means that it was that person's hobby to make the database. It could mean that it is limited in scope, done by one or a few persons, or seems amateur.
  
==== Some Definitions ====
+
This is clearly intended to be a professional database. It is a work in progress and not at the level of some of the databases we looked at for last week's assignment. That being said they are receiving funding from:
 +
 
 +
"Office of the Director National Institute of Health [5R24OD011883-12, 3R24OD011883-11S1]; NIH National Human Genome Research Institute [5RM1HG010860-04, 3RM1HG010860-03S1, 5U24HG011449-03]; NIH National Library of Medicine [T15LM009451]; J.R., S.C., S.M., H.H., N.L.H., C.J.M. and J.H.C. were partly supported by the Director, Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy [DE-AC02-05CH11231]; Critical Path Institute is supported by the Food and Drug Administration (FDA) of the U.S. Department of Health and Human Services (HHS) and FDA/HHS (56.5%), totaling $16749891, and by non-government sources (43.5%), totaling $12895366. Funding for open access charge: NIH [5R24OD011883-12]."
 +
 
 +
This is a direct quote from the [https://academic.oup.com/nar/article/52/D1/D938/7449493 Funding] section of the paper.
 +
 
 +
=== Some Definitions ===
  
 
* Electronic curation occurs when someone writes a program to add information to a database record from another database.
 
* Electronic curation occurs when someone writes a program to add information to a database record from another database.
Line 64: Line 113:
 
** In-house is when the human works for the database organization.
 
** In-house is when the human works for the database organization.
 
** Community is when the database allows members of the scientific community that don't work for the database organization to add information to the record.
 
** Community is when the database allows members of the scientific community that don't work for the database organization to add information to the record.
 +
 +
=== Acknowledgements ===
 +
The homework partners [[User:Asandle1]] and [[User:Kmill104]] worked together to complete this week’s journal assignment and the presentation slides. We texted several times over the past few days to ask questions about our database, and met in person once to practice our presentation.
 +
 +
[[User:Kmill104|Kmill104]] ([[User talk:Kmill104|talk]]) 13:11, 8 February 2024 (PST)
 +
 +
All work was my own or Katie's except where otherwise specified [[User:Asandle1|Asandle1]] ([[User talk:Asandle1|talk]]) 13:11, 8 February 2024 (PST)
 +
 +
=== References ===
 +
*Monarch Initiative. Monarch Initiative. (n.d.). https://monarchinitiative.org/
 +
*Putman, T. E., Schaper, K., Matentzoglu, N., Rubinetti, V. P., Alquaddoomi, F. S., Cox, C., Caufield, J. H., Elsarboukh, G., Gehrke, S., Hegde, H., Reese, J. T., Braun, I., Bruskiewich, R. M., Cappelletti, L., Carbon, S., Caron, A. R., Chan, L. E., Chute, C. G., Cortes, K. G., … Munoz-Torres, M. C. (2024, January 5). The Monarch Initiative in 2024: An Analytic platform integrating phenotypes, genes and diseases across species. OUP Academic. https://academic.oup.com/nar/article/52/D1/D938/7449493?login=true
 +
*Dahlquist , K. (n.d.). Week 4. Week 4 - LMU BioDB 2024. https://xmlpipedb.cs.lmu.edu/biodb/spring2024/index.php/Week_4
 +
 +
 +
[[User:Asandle1|Asandle1]] ([[User talk:Asandle1|talk]]) 13:11, 8 February 2024 (PST)

Latest revision as of 13:12, 8 February 2024

To User Page: User: Asandle1, User: Kmill104

To Assignment Page: Week 4

Nucleic Acids Research Article

Slides


Database Evaluation

Andrew did sections 2 and 4, Katie did sections 1 and 3

For your assignment, create a new wiki page to profile your database. For this week, there will be one page per set of partners; both partners will contribute content and notes for their electronic lab notebook to the same page; you do not need to have separate individual journal entries for this week.

  • The name of your page should be "Database name Week 4".

Read the article about the database from the Nucleic Acids Research journal and then go online to the database itself. In keeping with Academic Honesty and citation practices, when you answer the questions below, provide a hyperlink to the page that you got the information from. There should be at least one hyperlink per answer.

General information about the database

    1. What is the name of the database? Monarch Initiative Monarch Initiative Front Page
    2. What type (or types) of database is it? The Monarch Initiative integrates gene, disease, and phenotype data. The database combines knowledge from across sources to reveal how they are connected. The database intends to show how these connections can tell us the causes and mechanisms of human disease. Nucleic Acids Research Article
      1. What biological information (type of data) does it contain? (sequence, structure, model organism, or specialty [what?]) Pages contain general descriptions of either a specific gene, disease, or phenotype, as well as a section for additional names that it may be referred to as. Each page has association links to the other types of data housed in the database, which are organized in association tables. Monarch Initiative Explore Page
      2. What type of data source does it have? The Monarch Initiative sources data from 33 other databases Nucleic Acids Research Article
        • primary versus secondary ("meta")? Secondary, because data comes from other databases that is then collected and organized into their specific format. Nucleic Acids Research Article
        • curated versus non-curated? Curated Nucleic Acids Research Article
          • if curated, is it electronic versus human curation? Electronically curated, the Monarch Initiative has several programs that work together to collect data from across sources. This data is collected in what is called the Monarch Knowledge Graph, or Monarch KG. Data that is ingested is transformed into the Monarch KG schema. When source files are first downloaded, they pass through a program called Koza, where they are transformed to Biolink and KGX format. The programs also involve entity mapping and merging, which mean that if multiple sources use different identifiers for the same data type, then these identifiers all merge and one data type is expressed. The KG then integrates the gene, disease, and phenotype data, which is then served to the user. Nucleic Acids Research Article
            • if human curation, is it in-house staff versus community curation? N/A
    3. What individual or organization maintains the database? The database is an "international consortium", meaning their team comes from several international organizations. In order to find the team the user has to navigate to the website's About section, and then click on their Team page. The Monarch Initiative credits their team as coming from 12 various schools and institutes. These are University of Colorado, Lawrence Berkeley National Laboratory, The Jackson Laboratory, European Bioinformatics Institute, Queen Mary University of London, John Hopkins University, C-Path, and Renaissance Computing Institute. They also credit their consultants, scientific advisory board, alumni members, alumni groups. Monarch Initiative Team The website is much more clear about who is on their team, while the article only has their team members in the Authors section and and never directly mention any of these organizations. Nucleic Acids Research Article
      • public versus private. The database is public, there is no requirement to pay or to have some kind of membership to access their data Nucleic Acids Research Article
      • large national or multinational entity or small lab group? The database is ran by multinational schools and institutes. Monarch Initiative Team
    4. What is their funding source(s)? The website credits the Office of the Director National Institute of Health, the NIH National Human Genome Research Institute, and the NIH National Library of Medicine. Monarch Initiative Team Page The article credits these but also says that several of the authors were partly supported by the Director, Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy. It also states that the Critical Path Institute is supported by the Food and Drug Administration (FDA) of the U.S. Department of Health and Human Services. This shows that the website is not clearly disclosing where their funding is coming from. Nucleic Acids Research Article

Scientific quality of the database

    1. Does the content appear to completely cover its content domain?
      • How many records does the database contain?

It is not completely clear how many databases the database contains. This is something that could be better communicated. That being said when opening the search bar it shows 845539 results. It is possible that this is the total number of entries in the database. Search Page

      • What claims do the database owners make about coverage in the corresponding paper?

Directly quotes from this paper, "The Monarch Initiative" ... "aims to harmonize data across scientific disciplines to reveal disease mechanisms and aid disease diagnosis." "The scope of Monarch is phonemic knowledge in all its forms, including human and model organisms phenotypic data..." Most Recent Paper

      • What species are covered in the database? (If it is a very long list, summarize.)

The database covers and connects "phenotypes to genotypes across species". About The exact species number and types are not abundantly clear but it definitely covers humans.

      • Is the database content useful? I.e., what biological questions can it be used to answer?

It is useful, how useful is more difficult to determine without having a particular use case that requires it to test it out. The database helps merge data from different scientific research fields onto one platform to help inform research and improve data organization to make more informed clinical decisions. About


      • Is the database content timely?

It is unclear. They provide information at the bottom banner of the page showing they have made updates to the actual site in 2024 and that this tool is a work in progress, but they do not provide any information about the actual data, other then that they have an API, or more accurately multiple API's pulling data from other databases. They mention a different number of databases in their research paper and the page on the site that shows the list. You can find the list of databases they mention on their page here. Here is another outdated list from the old system Old Data Sources

      • Is there a need in the scientific community for such a database at this time?

We would say there is a need in the scientific community for such a database. There is a multitude of data across scientific fields, but it can be difficult to spot associations between data types if they are comprised in different databases. The goal of the database is to combine and compare data in order to evaluate what causes disease and aid in disease diagnosis. Being able to easily identify these associations is a way to reduce time spent on determining what exactly causes a disease, as well as identifying the trends of certain genes and their phenotypic outcomes that may not have been previously recognized. The downside to this database is that it doesn't really introduce any new data, only organizes it. In this way, it could be considered non-essential because its information is already out there. We would still argue that it is essential because it compiles thousands of data types and shows how they are associated into a format that someone from any field could understand.

      • Is the content covered by other databases already?

Yes and no, it is importing content from other databases into itself and providing a different organizational format. Previous Website Source List

      • How current is the database?

It was last updated in 2024, in terms of individual entries it is not shown. See this link at the bottom of the page to see the update banner at the bottom. 2024 Banner

      • When did the database first go online?

It is not abundantly clear, the website shows it existed in 2020, the earliest paper related to the Monarch Initiative they have directly linked is from 2016. First Journal Paper

      • How often is the database updated?

Again, it is not abundantly clear, the only real mention is that is it under work and being improved, in terms of the website. It seems it is just the API so it is dependent on the databases it sources from. I can't link this because it only showed up as a notification the first time I accessed the website and I closed it out and now it is gone.

      • When was the last update?

Again it is not completely clear, but the site was updated this year (2024).

General utility of the database to the scientific community

    1. Are there links to other databases? Which ones? The article credits 33 other databases for their sources, but in the Data Sources section it only mentions 16 of the 33. Nucleic Acids Research Article A GitHub link is provided to access a full list of their sources, but when I tried to access it the link was non-functional. From just clicking through nodes on the database, I'm able to see that for a disease node there is a link to the Ontology Search (OLS), OMIM, Disease Ontology (DOID), and NPE databases. MI Triple-A Syndrome Node Gene nodes link to the HUGO Gene Nomenclature Committee (HGNC), OMIM, NCBI, Ensembl, and UniProt databases. MI AAAS Gene Node Phenotypic feature nodes typically only have one link to the Human Phenotype Ontology. MI Achalasia Node The website doesn't seem to have a place where every data source is listed, so it is difficult to know where any additional data is being retrieved from.
    2. Is it convenient to browse the data? Yes, it is easy to search specific phenotypic features, diseases, and gene sequences. Having the description on each page makes it easy for the user to understand what they are viewing. Association links and tables make it easy to switch between these pages. Even though I was not able to find all of the databases that they use, on each node with links to other databases they are clearly visible and explained. The Breadcrumbs feature at the bottom shows how you got to current page from page history, outlining the connections that the user may have found between the types of data. Monarch Initiative Explore Page
    3. Is it convenient to download the data? If working directly from the website, it seems the only data a user can download is the association table. The resulting table is huge and difficult to understand when compared to the table seen on the website. MI Triple-A Syndrome Node Specific code is available to be downloaded, but this is more concerned with how the database runs rather than the data it contains. Nucleic Acids Research Article
      • In what file formats are the data provided? The file is a large table MI Triple-A Syndrome Node
        • What type of files, indicated by the file extension (e.g., .txt, .xml., etc.)? The file extension is .tsv MI Triple-A Syndrome Node
        • Are they standard or non-standard formats? (i.e., are they following an approved standard for that type of data)? ?
    4. Evaluate the “user-friendliness” of the database: can a naive user quickly navigate the website and gather useful information? Yes, the introduction page clearly explains what Monarch is and how to access the type of data a user may want. Monarch Initiative Front Page Every node/page is clearly titled as to what it will contain, description is then given, and then any additional links are below. Depending on the data type, they have different additional links which may be confusing, but each link has a title above it to explain its purpose. The associations table shows how everything is interconnected which is easy to understand and then compare. The Breadcrumbs feature is especially helpful in tracking connections and how you got to your current page. Monarch Initiative Explore Page
      • Is the website well-organized? Yes, front page tells you what you can do with their website and what kind of data it contains. The search bar is at the top of the page, which can then take you to specific “nodes”. Monarch Initiative Front Page Each node is well-organized so that you first see its description, helpful links, and associations it has with other nodes of the database. Monarch Initiative Explore Page
      • Does it have a help section or tutorial? Yes, there are several video tutorials on the first page. Monarch Initiative Front Page There is also a help page, where you can provide feedback or contact the help desk through GitHub. The help desk does require a GitHub account, which is limiting to those who do not have an account and can therefore not ask specific questions. Monarch Initiative Help Page
      • Are the search options sensible? Yes, in the search bar it says “Gene, disease, phenotype, etc.” It tells the user what they should search so they are not confused on what the database contains. Monarch Initiative Explore Page
      • Run a sample query. Do the results make sense? Yes, I searched triple-A syndrome and the page I ended up on gave me a description along with other useful information like the type of heritability it is associated with, causal genes, other names, and additional links to its URI and a clinical synopsis from another database. It also provided the number of associations to phenotypes, causal genes, and correlated genes. The Association table showed which phenotypic features the disease has, which can be clicked on to compare to other diseases that also have that phenotypic feature. MI Triple-A Syndrome Node
    5. Access: Is there a license agreement or any restrictions on access to the database? No, all Monarch resources are available to everyone. You do not need to register for the website, and there are multiple ways to access and reuse their data. The article contains links to their data resources, which seem to primarily be on GitHub. Their code is openly shared on GitHub. I did click on one link associated with the Zenodo Deposit, but it it is currently non-functioning. Nucleic Acids Research Article

Summary judgment

    1. Would you direct a colleague unfamiliar with the field to use it?

Yes I would, I think it does provide really good information and at worst the data might be a couple of years old in some cases, although it may all be constantly updated, it is just unclear. In terms of clinicians which seems to be one of the groups they thought about when building this tool, I think it actually could be immensely helpful providing they watch the short video on using the tool. Video Here

    1. Is this a professional or "hobby" database? The "hobby" analogy means that it was that person's hobby to make the database. It could mean that it is limited in scope, done by one or a few persons, or seems amateur.

This is clearly intended to be a professional database. It is a work in progress and not at the level of some of the databases we looked at for last week's assignment. That being said they are receiving funding from:

"Office of the Director National Institute of Health [5R24OD011883-12, 3R24OD011883-11S1]; NIH National Human Genome Research Institute [5RM1HG010860-04, 3RM1HG010860-03S1, 5U24HG011449-03]; NIH National Library of Medicine [T15LM009451]; J.R., S.C., S.M., H.H., N.L.H., C.J.M. and J.H.C. were partly supported by the Director, Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy [DE-AC02-05CH11231]; Critical Path Institute is supported by the Food and Drug Administration (FDA) of the U.S. Department of Health and Human Services (HHS) and FDA/HHS (56.5%), totaling $16749891, and by non-government sources (43.5%), totaling $12895366. Funding for open access charge: NIH [5R24OD011883-12]."

This is a direct quote from the Funding section of the paper.

Some Definitions

  • Electronic curation occurs when someone writes a program to add information to a database record from another database.
  • Manual curation occurs when a human reviews the information being added to a record to validate it as true.
    • In-house is when the human works for the database organization.
    • Community is when the database allows members of the scientific community that don't work for the database organization to add information to the record.

Acknowledgements

The homework partners User:Asandle1 and User:Kmill104 worked together to complete this week’s journal assignment and the presentation slides. We texted several times over the past few days to ask questions about our database, and met in person once to practice our presentation.

Kmill104 (talk) 13:11, 8 February 2024 (PST)

All work was my own or Katie's except where otherwise specified Asandle1 (talk) 13:11, 8 February 2024 (PST)

References


Asandle1 (talk) 13:11, 8 February 2024 (PST)