Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan 5;52(D1):D1305-D1314.
doi: 10.1093/nar/gkad1051.

The DO-KB Knowledgebase: a 20-year journey developing the disease open science ecosystem

Affiliations

The DO-KB Knowledgebase: a 20-year journey developing the disease open science ecosystem

J Allen Baron et al. Nucleic Acids Res. .

Abstract

In 2003, the Human Disease Ontology (DO, https://disease-ontology.org/) was established at Northwestern University. In the intervening 20 years, the DO has expanded to become a highly-utilized disease knowledge resource. Serving as the nomenclature and classification standard for human diseases, the DO provides a stable, etiology-based structure integrating mechanistic drivers of human disease. Over the past two decades the DO has grown from a collection of clinical vocabularies, into an expertly curated semantic resource of over 11300 common and rare diseases linking disease concepts through more than 37000 vocabulary cross mappings (v2023-08-08). Here, we introduce the recently launched DO Knowledgebase (DO-KB), which expands the DO's representation of the diseaseome and enhances the findability, accessibility, interoperability and reusability (FAIR) of disease data through a new SPARQL service and new Faceted Search Interface. The DO-KB is an integrated data system, built upon the DO's semantic disease knowledge backbone, with resources that expose and connect the DO's semantic knowledge with disease-related data across Open Linked Data resources. This update includes descriptions of efforts to assess the DO's global impact and improvements to data quality and content, with emphasis on changes in the last two years.

PubMed Disclaimer

Figures

Graphical Abstract
Graphical Abstract
Figure 1.
Figure 1.
The DO-KB SPARQL Sandbox and Faceted Search Interface tools.
Figure 2.
Figure 2.
The DO-KB ‘OWL Flattener’ prepares data, from the doid-merged.owl file, for search with DO-KB Faceted Search Interface. An example with the disease ‘latex allergy’: (light blue) is shown. The OWL Flattener traverses up the rdfs:subClassOf relationships (black arrows) of the disease hierarchy extracting all imported terms from logical relationships (dashed outlines). It also traverses up the hierarchies of imported ontologies (anatomy: black; symptom; orange, and ncbitaxon; green) to the root node (multi-node traversal represented by black dashed arrows). All imported terms and ancestors identified are ‘flattened’ into a JSON key-value list representation for the disease (bottom of figure), essentially flattening hierarchically represented relationships. Only asserted rdfs:subClassOf relationships are shown but inferred relationships are also traversed.
Figure 3.
Figure 3.
The DO-KB Faceted Search Interface enables discovery of disease-to-disease connections. Latex allergy, caused by allergic reaction to the tree Hevea brasiliensis, is among search results for diseases with the ‘allergic reaction’ symptom related to organisms in the kingdom ‘Eukaryota’.
Figure 4.
Figure 4.
The updated infrastructure supporting the DO-KB. The Apache web front-end server integrates the original disease-ontology.org site (using a Python-Django web app) with the new DO-KB pages powered by a Python-Flask framework, including pages for the DO-KB SPARQL Sandbox and Faceted Search Interface, and seamlessly serves these pages to users of disease-ontology.org (dark blue lines). The back-end server provides data and search capabilities (orange lines) to the front-end pages and the SPARQL endpoint using the three depicted services. The original hierarchical trees, search, and term pages of disease-ontology.org are powered by a neo4j database (light blue database icon), the new Elasticsearch search engine (orange database icon) serves data to the DO-KB Faceted Search Interface and an Apache Jena Fuseki server (purple database icon) powers the SPARQL Sandbox and Endpoint.
Figure 5.
Figure 5.
The semi-automated ‘Assessing Resource Use’ workflow utilized to more fully capture resource usage, consists of three major steps. The first two steps of the workflow, automated with the R package DO.utils includes the identification and collection of scientific publications (section 1) from citing literature (dark blue arrow & boxes), manual identification (gray arrow & boxes) and tailored searches (yellow arrow & boxes), and the deduplication and simplification of these records into an easy to view tabular format (section 2). The last step (section 3) is curation and evaluation of these publication records. Results of applying this workflow for the DO are shown as proportional colored boxes with total numbers. Section 3 shows the result of curation and evaluation with a histogram displaying the number of scientific publications that cited or used the DO each year from 2007 to July 2023 identified by our prior manual approach (grey) versus the total from use of this workflow (dark blue), excluding those identified by tailored searches (yellow - Section 1), and the proportion of publications using the DO over the last year binned by etiological research (genetic, environmental, host, and epigenetic factors).
Figure 6.
Figure 6.
The global distribution of biomedical resources using the DO grouped by sub-continent. Total subcontinents = 15; total countries = 43; total biomedical resources = 376 (July 2023).

References

    1. Schriml L.M., Munro J.B., Schor M., Olley D., McCracken C., Felix V., Baron J.A., Jackson R., Bello S.M., Bearer C.et al.. The Human Disease Ontology 2022 update. Nucleic Acids Res. 2022; 50:D1255–D1261. - PMC - PubMed
    1. Ong E., Xiang Z., Zhao B., Liu Y., Lin Y., Zheng J., Mungall C., Courtot M., Ruttenberg A., He Y.. Ontobee: a linked ontology data server to support ontology term dereferencing, linkage, query and integration. Nucleic Acids Res. 2017; 45:D347–D352. - PMC - PubMed
    1. Giglio M., Tauber R., Nadendla S., Munro J., Olley D., Ball S., Mitraka E., Schriml L.M., Gaudet P., Hobbs E.T.et al.. ECO, the Evidence & conclusion ontology: community standard for evidence information. Nucleic Acids Res. 2019; 47:D1186–D1194. - PMC - PubMed
    1. Schriml L.M., Lichenstein R., Bisordi K., Bearer C., Baron J.A., Greene C.. Modeling the enigma of complex disease etiology. J. Transl. Med. 2023; 21:148. - PMC - PubMed
    1. Krysiak K., Danos A.M., Saliba J., McMichael J.F., Coffman A.C., Kiwala S., Barnell E.K., Sheta L., Grisdale C.J., Kujan L.et al.. CIViCdb 2022: evolution of an open-access cancer variant interpretation knowledgebase. Nucleic Acids Res. 2023; 51:D1230–D1241. - PMC - PubMed