Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2020 Jun 2:18:1414-1428.
doi: 10.1016/j.csbj.2020.05.017. eCollection 2020.

Constructing knowledge graphs and their biomedical applications

Affiliations
Review

Constructing knowledge graphs and their biomedical applications

David N Nicholson et al. Comput Struct Biotechnol J. .

Abstract

Knowledge graphs can support many biomedical applications. These graphs represent biomedical concepts and relationships in the form of nodes and edges. In this review, we discuss how these graphs are constructed and applied with a particular focus on how machine learning approaches are changing these processes. Biomedical knowledge graphs have often been constructed by integrating databases that were populated by experts via manual curation, but we are now seeing a more robust use of automated systems. A number of techniques are used to represent knowledge graphs, but often machine learning methods are used to construct a low-dimensional representation that can support many different applications. This representation is designed to preserve a knowledge graph's local and/or global structure. Additional machine learning methods can be applied to this representation to make predictions within genomic, pharmaceutical, and clinical domains. We frame our discussion first around knowledge graph construction and then around unifying representational learning techniques and unifying applications. Advances in machine learning for biomedicine are creating new opportunities across many domains, and we note potential avenues for future work with knowledge graphs that appear particularly promising.

Keywords: Lterature review; Machine learning; Natural language processing; Network embeddings; Text mining; knowledge graphs.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Fig. 1
Fig. 1
The metagraph (i.e., schema) of the knowledge graph used in the Rephetio project . The authors of this project refer to their resource as a heterogenous network (i.e., hetnet), and this network meets our definition of a knowledge graph. This resource depicts pharmacological and biomedical information in the form of nodes and edges. The nodes (circles) represent entities and edges (lines) represent relationships that are shared between two entities. The majority of edges in this metagraph are depicted as unidirectional, but some relationships can be considered bidirectional.
Fig. 2
Fig. 2
A visualization of a constituency parse tree using the following sentence: “BRCA1 is associated with breast cancer” . This type of tree has the root start at the beginning of the sentence. Each word is grouped into subphrases depending on its correlating part of speech tag. For example, the word “associated” is a past participle verb (VBN) that belongs to the verb phrase (VP) subgroup.
Fig. 3
Fig. 3
A visualization of a dependency parse tree using the following sentence: “BRCA1 is associated with breast cancer” . For these types of trees, the root begins with the main verb of the sentence. Each arrow represents the dependency shared between two words. For example, the dependency between BRCA1 and associated is nsubjpass, which stands for passive nominal subject. This means that “BRCA1” is the subject of the sentence and it is being referred to by the word “associated”.
Fig. 4
Fig. 4
Pipeline for representing knowledge graphs in a low dimensional space. Starting with a knowledge graph, this space can be generated using one of the following options: Matrix Factorization (a), Translational Models (b) or Neural Network Models (c). The output of this pipeline is an embedding space that clusters similar node types together.
Fig. 5
Fig. 5
Overview of various biomedical applications that make use of knowledge graphs. Categories consist of: (a) Multi-Omic applications, (b) Pharmaceutical Applications and (c) Clinical Applications.

References

    1. Node Classification in Social Networks Smriti Bhagat, Graham Cormode, S. Muthukrishnan Social Network Data Analytics (2011) https://doi.org/fjj48w DOI: 10.1007/978-1-4419-8462-3_5
    1. Network Embedding Based Recommendation Method in Social Networks Yufei Wen, Lei Guo, Zhumin Chen, Jun Ma Companion of the The Web Conference 2018 on The Web Conference 2018 - WWW ’18 (2018) https://doi.org/gf6rtt DOI: 10.1145/3184558.3186904
    1. Open Question Answering with Weakly Supervised Embedding Models Antoine Bordes, Jason Weston, Nicolas Usunier arXiv (2014-04-16) https://arxiv.org/abs/1404.4326v1
    1. Neural Network-based Question Answering over Knowledge Graphs on Word and Character Level Denis Lukovnikov, Asja Fischer, Jens Lehmann, Sören Auer Proceedings of the 26th International Conference on World Wide Web (2017-04-03) https://doi.org/gfv8hp DOI: 10.1145/3038912.3052675
    1. Towards integrative gene prioritization in Alzheimer’s disease. Jang H Lee, Graciela H Gonzalez Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing (2011) https://www.ncbi.nlm.nih.gov/pubmed/21121028 DOI: 10.1142/9789814335058_0002 · PMID: 21121028 - PubMed

LinkOut - more resources