Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb 18:7:e341.
doi: 10.7717/peerj-cs.341. eCollection 2021.

Application and evaluation of knowledge graph embeddings in biomedical data

Affiliations

Application and evaluation of knowledge graph embeddings in biomedical data

Mona Alshahrani et al. PeerJ Comput Sci. .

Abstract

Linked data and bio-ontologies enabling knowledge representation, standardization, and dissemination are an integral part of developing biological and biomedical databases. That is, linked data and bio-ontologies are employed in databases to maintain data integrity, data organization, and to empower search capabilities. However, linked data and bio-ontologies are more recently being used to represent information as multi-relational heterogeneous graphs, "knowledge graphs". The reason being, entities and relations in the knowledge graph can be represented as embedding vectors in semantic space, and these embedding vectors have been used to predict relationships between entities. Such knowledge graph embedding methods provide a practical approach to data analytics and increase chances of building machine learning models with high prediction accuracy that can enhance decision support systems. Here, we present a comparative assessment and a standard benchmark for knowledge graph-based representation learning methods focused on the link prediction task for biological relations. We systematically investigated and compared state-of-the-art embedding methods based on the design settings used for training and evaluation. We further tested various strategies aimed at controlling the amount of information related to each relation in the knowledge graph and its effects on the final performance. We also assessed the quality of the knowledge graph features through clustering and visualization and employed several evaluation metrics to examine their uses and differences. Based on this systematic comparison and assessments, we identify and discuss the limitations of knowledge graph-based representation learning methods and suggest some guidelines for the development of more improved methods.

Keywords: Bio-ontologies; Biomedicine; Comparative evaluation; Embeddings methods; Knowledge graphs; Linked data; Performance studies.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1
Figure 1. Illustration of the general workflow of our experiments.
Figure 2
Figure 2. The 2-D t-SNE plot of Disease ontology top categories according to each embedding method.
(A) Walking RDF/OWL, (B) TransE embeddings, (C) poincare embeddings, (D) rescal embeddings, (E) simple embeddings, (F) R-GCN embeddings.

References

    1. Agibetov A, Samwald M. Fast and scalable learning of neuro-symbolic representations of biomedical knowledge. 2018a. https://arxiv.org/abs/1804.11105 https://arxiv.org/abs/1804.11105
    1. Agibetov A, Samwald M. Global and local evaluation of link prediction tasks with neural embeddings. 2018b. https://arxiv.org/abs/1807.10511 https://arxiv.org/abs/1807.10511
    1. AlShahrani M. Knowledge graph representation learning: approaches and applications in biomedicine. 2019. Phd thesis, King Abdullah University of Science and Technology, Thuwal, Kingdom of Saudi Arabia.
    1. Alshahrani M, Hoehndorf R. Drug repurposing through joint learning on knowledge graphs and literature. Biorxiv. 2018a doi: 10.1101/385617. - DOI
    1. Alshahrani M, Hoehndorf R. Semantic disease gene embeddings (smudge): phenotype-based disease gene prioritization without phenotypes. Bioinformatics. 2018b;34(17):i901–i907. doi: 10.1093/bioinformatics/bty559. - DOI - PMC - PubMed

LinkOut - more resources