Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan 5;52(D1):D938-D949.
doi: 10.1093/nar/gkad1082.

The Monarch Initiative in 2024: an analytic platform integrating phenotypes, genes and diseases across species

Affiliations

The Monarch Initiative in 2024: an analytic platform integrating phenotypes, genes and diseases across species

Tim E Putman et al. Nucleic Acids Res. .

Abstract

Bridging the gap between genetic variations, environmental determinants, and phenotypic outcomes is critical for supporting clinical diagnosis and understanding mechanisms of diseases. It requires integrating open data at a global scale. The Monarch Initiative advances these goals by developing open ontologies, semantic data models, and knowledge graphs for translational research. The Monarch App is an integrated platform combining data about genes, phenotypes, and diseases across species. Monarch's APIs enable access to carefully curated datasets and advanced analysis tools that support the understanding and diagnosis of disease for diverse applications such as variant prioritization, deep phenotyping, and patient profile-matching. We have migrated our system into a scalable, cloud-based infrastructure; simplified Monarch's data ingestion and knowledge graph integration systems; enhanced data mapping and integration standards; and developed a new user interface with novel search and graph navigation features. Furthermore, we advanced Monarch's analytic tools by developing a customized plugin for OpenAI's ChatGPT to increase the reliability of its responses about phenotypic data, allowing us to interrogate the knowledge in the Monarch graph using state-of-the-art Large Language Models. The resources of the Monarch Initiative can be found at monarchinitiative.org and its corresponding code repository at github.com/monarch-initiative/monarch-app.

PubMed Disclaimer

Figures

Graphical Abstract
Graphical Abstract
Figure 1.
Figure 1.
Example User Interface page. This interactive table of Ehlers-Danlos syndrome (EDS) Phenotypes (the currently selected association type) includes association types, supporting evidence, and provenance in the Details section (which gets expanded below the association table) and Taxon of the association object when appropriate (e.g. Genes), with summary, table views, and breadcrumb navigation.
Figure 2.
Figure 2.
Overview of the Monarch data model. The Monarch Subset of the Biolink Data Model centered around Gene, Disease and Phenotype Associations. Colors indicate which ontology the category of node comes from. Gene and Pathway are ingested data types, not ontology concepts.
Figure 3.
Figure 3.
Data harmonization within the Monarch KG. The three primary data types in the Monarch KG are genes, diseases and phenotypes (A).This image details their entity (node) and link (edge) counts and the unifying ontologies (D) by which the source data (B) and ontologies (C) are harmonized. Cross-species inference (E) is accomplished via gene orthology, homology and phenotype similarity. Content dissemination (F) is via API, the Monarch UI and within the clinical application Exomiser. Note that the figure expresses only a portion of the integrated ontologies (column C). For a comprehensive list see PHENIO documentation (linked below). In Column D, GO: Gene Ontology; BP: Biological Process; MF: Molecular Function; CC: Cellular Component.
Figure 4.
Figure 4.
Monarch KG Construction Workflow. Source files are downloaded, passed through Koza for transformation to Biolink and KGX format, Cat-Merge for merging and node normalization, and finally served to the user through various access points.
Figure 5.
Figure 5.
Expanded view of deploying the Monarch KG to the end user through the Monarch Python package, API, file server and web interface.
Figure 6.
Figure 6.
Extract-Load-Transform (ETL) with Koza. (A) Data flow in the Koza ingest for Human Phenotype Ontology disease-to-phenotype annotations (HPOA). Raw source files are transformed into KGX formatted tabular data. (B) The Biolink Model association for disease-to-phenotype relationships used in the HPOA Koza ingest.
Figure 7.
Figure 7.
The Monarch ChatGPT plugin allows the Monarch KG to be queried and to return answers in natural language.

References

    1. Shefchek K.A., Harris N.L., Gargano M., Matentzoglu N., Unni D., Brush M., Keith D., Conlin T., Vasilevsky N., Zhang X.A.et al. .. The Monarch Initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species. Nucleic Acids Res. 2020; 48:D704–D715. - PMC - PubMed
    1. Moxon D., Hannestad S., Mungall L., Bruskiewich C., Bruskiewicz R., Schaper K., Owen K., Solbrig P., Harshad H.et al. .. 2023; biolink/kgx: v2.2.1.
    1. Thomas P.D., Ebert D., Muruganujan A., Mushayahama T., Albou L.-P., Mi H.. PANTHER: making genome-scale phylogenetics accessible to all. Protein Sci. 2022; 31:8–22. - PMC - PubMed
    1. Köhler S., Gargano M., Matentzoglu N., Carmody L.C., Lewis-Smith D., Vasilevsky N.A., Danis D., Balagura G., Baynam G., Brower A.M.et al. .. The Human phenotype ontology in 2021. Nucleic Acids Res. 2021; 49:D1207–D1217. - PMC - PubMed
    1. Unni D.R., Moxon S.A.T., Bada M., Brush M., Bruskiewich R., Caufield J.H., Clemons P.A., Dancik V., Dumontier M., Fecho K.et al. .. Biolink Model: a universal schema for knowledge graphs in clinical, biomedical, and translational science. Clin. Transl. Sci. 2022; 15:1848–1855. - PMC - PubMed