Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Aug 14;14(1):11.
doi: 10.1186/s13326-023-00291-x.

Multi-domain knowledge graph embeddings for gene-disease association prediction

Affiliations

Multi-domain knowledge graph embeddings for gene-disease association prediction

Susana Nunes et al. J Biomed Semantics. .

Abstract

Background: Predicting gene-disease associations typically requires exploring diverse sources of information as well as sophisticated computational approaches. Knowledge graph embeddings can help tackle these challenges by creating representations of genes and diseases based on the scientific knowledge described in ontologies, which can then be explored by machine learning algorithms. However, state-of-the-art knowledge graph embeddings are produced over a single ontology or multiple but disconnected ones, ignoring the impact that considering multiple interconnected domains can have on complex tasks such as gene-disease association prediction.

Results: We propose a novel approach to predict gene-disease associations using rich semantic representations based on knowledge graph embeddings over multiple ontologies linked by logical definitions and compound ontology mappings. The experiments showed that considering richer knowledge graphs significantly improves gene-disease prediction and that different knowledge graph embeddings methods benefit more from distinct types of semantic richness.

Conclusions: This work demonstrated the potential for knowledge graph embeddings across multiple and interconnected biomedical ontologies to support gene-disease prediction. It also paved the way for considering other ontologies or tackling other tasks where multiple perspectives over the data can be beneficial. All software and data are freely available.

Keywords: Gene-disease association prediction; Knowledge graph; Knowledge graph embeddings; Machine learning; Ontologies.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Example of a direct relationship between hearing loss and the EPS8LA Gene
Fig. 2
Fig. 2
Overview of the methodology with four basic steps: 1) build the knowledge graph with ontologies and annotations; 2) create embeddings to represent each gene and disease; 3) produce a final vector of the pairs in the dataset; 4) gene-disease association prediction
Fig. 3
Fig. 3
Example of a logical definition of the class Human Phenotype ontology class for “Hearing impairment” (HP:0000365): ’Hearing impairment’ EquivalentTo ’has part’ some (’decreased rate’ and (’inheres in’ some ’sensory perception of sound’) and (’has modifier’ some ’abnormal’))
Fig. 4
Fig. 4
Example of a Logical definition simplified with a more direct relation between two classes. The HP term for “Hearing impairment” (HP:0000365) is related to a restriction that involves the GO term “Sensory perception of sound” (GO:0007605)
Fig. 5
Fig. 5
ROC curves and AUC values obtained for different vector operators with RF classifier for the HP-simple + LD + GO
Fig. 6
Fig. 6
Recall-Precision diagram including f-measure values as height-lines. The diagram uses all knowledge graphs for OPA2Vec and RDF2Vec with RF using a 70-30 split
Fig. 7
Fig. 7
Computational time for each embedding method with two knowledge graphs where the smallest size corresponds to removing the main branch of the human phenotype ontology (Phenotypic abnormality)

References

    1. Amberger J, Bocchini C, Schiettecatte F, Scott A, Hamosh A. OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an Online catalog of human genes and genetic disorders. Nucleic Acids Res. 2014;43. 10.1093/nar/gku1205. - PMC - PubMed
    1. Asif M, Martiniano H, Couto F. Identifying disease genes using machine learning and gene functional similarities, assessed through Gene Ontology. PLoS ONE. 2018;12(13):e0208626. doi: 10.1371/journal.pone.0208626. - DOI - PMC - PubMed
    1. Piñero J, Ramírez-Anguita JM, Saüch-Pitarch J, Ronzano F, Centeno E, Sanz F, et al. The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Res. 2019;48:D845–D855. doi: 10.1093/nar/gkz1021. - DOI - PMC - PubMed
    1. Sherry ST, Ward M, Sirotkin K. dbSNP-database for single nucleotide polymorphisms and other classes of minor genetic variation. Genome Res. 1999;9(8):677–679. doi: 10.1101/gr.9.8.677. - DOI - PubMed
    1. Opap K, Mulder N. Recent advances in predicting gene-disease associations. F1000Research. 2017;6:578. 10.12688/f1000research.10788.1. - PMC - PubMed

Publication types

LinkOut - more resources