Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul 1;39(7):btad418.
doi: 10.1093/bioinformatics/btad418.

KG-Hub-building and exchanging biological knowledge graphs

Affiliations

KG-Hub-building and exchanging biological knowledge graphs

J Harry Caufield et al. Bioinformatics. .

Abstract

Motivation: Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of KGs is lacking.

Results: Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of KGs. Features include a simple, modular extract-transform-load pattern for producing graphs compliant with Biolink Model (a high-level data model for standardizing biological data), easy integration of any OBO (Open Biological and Biomedical Ontologies) ontology, cached downloads of upstream data sources, versioned and automatically updated builds with stable URLs, web-browsable storage of KG artifacts on cloud infrastructure, and easy reuse of transformed subgraphs across projects. Current KG-Hub projects span use cases including COVID-19 research, drug repurposing, microbial-environmental interactions, and rare disease research. KG-Hub is equipped with tooling to easily analyze and manipulate KGs. KG-Hub is also tightly integrated with graph machine learning (ML) tools which allow automated graph ML, including node embeddings and training of models for link prediction and node classification.

Availability and implementation: https://kghub.org.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
Integration of instance data and ontologies into knowledge graphs using KG-Hub ETL (extract, transform, and load) tooling to create new, emergent knowledge that is not present in any one data source. KG-Hub tooling comprises download (kghub-downloader), transform (Koza), and merge (KGX) components. When combined, data can provide new knowledge such as indirect relationships between patient phenotypes and drugs.
Figure 2.
Figure 2.
KG projects currently included in KG-Hub. KG-Hub currently hosts seven KG projects, which integrate disease, drug/chemical, gene/protein, phenotype, and other data. Graph projects may contain both ontology and instance data. Many KG-Hub projects are constructed around a core set of ontologies related to biomedicine: GO (gene ontology/gene function), Mondo (human diseases), HPO (human disease phenotypes), and ChEBI (drugs/chemicals).
Figure 3.
Figure 3.
Schematic of tooling integrated into KG-Hub. Software developers store ETL code on GitHub. Automated builds are orchestrated on a KG-Hub server using Jenkins. Optionally, graph ML tasks can be specified for each build using NEAT yaml, and are executed using GRAPE. KGs can be directly loaded into graph databases (Neo4j, Blazegraph) using KGX, a Python library for working with graphs. Graph builds, graph ML output, provenance, and other artifacts are stored on the cloud (S3). Project summary data can be browsed on KGHub.org, and graphs and other artifacts can be browsed and downloaded on KGHub.io. A dashboard (https://kghub.org/kg-hub-dashboard/) displays detailed graph statistics for KG projects.

References

    1. Auer S, Bizer C, Kobilarov G. et al. DBpedia: a nucleus for a web of open data. In: The Semantic Web. Berlin Heidelberg: Springer, 2007, 722–35.
    1. Badal VD, Wright D, Katsis Y. et al. Challenges in the construction of knowledge bases for human microbiome–disease associations. Microbiome 2019;7:129. - PMC - PubMed
    1. Bennett TD, Moffitt RA, Hajagos JG. et al. The national COVID cohort collaborative: clinical characterization and early severity prediction. medRxiv, 2021, preprint: not peer reviewed.
    1. Callahan TJ, Tripodi IJ, Hunter LE. et al. A framework for automated construction of heterogeneous Large-Scale biomedical knowledge graphs. bioRxiv, 2020, preprint: not peer reviewed.
    1. Cappelletti L, Fontana T, Casiraghi E. et al. GRAPE for fast and scalable graph processing and random-walk-based embedding Nature Comp Sci 2023;3:552–568 (2023). 10.1038/s43588-023-00465-8. - DOI - PMC - PubMed

Publication types