. 2024 May 2;40(5):btae301.

doi: 10.1093/bioinformatics/btae301.

Prioritizing genomic variants through neuro-symbolic, knowledge-enhanced learning

Azza Althagafi^{1

2

3}, Fernando Zhapa-Camacho^{1

2}, Robert Hoehndorf^{1

2

4}

Affiliations

¹ Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia.
² Computer Science Program, Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia.
³ Computer Science Department, College of Computers and Information Technology, Taif University, Taif 26571, Saudi Arabia.
⁴ SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence, King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia.

PMID: 38696757
PMCID: PMC11132820
DOI: 10.1093/bioinformatics/btae301

Prioritizing genomic variants through neuro-symbolic, knowledge-enhanced learning

Azza Althagafi et al. Bioinformatics. 2024.

. 2024 May 2;40(5):btae301.

doi: 10.1093/bioinformatics/btae301.

Authors

Azza Althagafi^{1

2

3}, Fernando Zhapa-Camacho^{1

2}, Robert Hoehndorf^{1

2

4}

Affiliations

¹ Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia.
² Computer Science Program, Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia.
³ Computer Science Department, College of Computers and Information Technology, Taif University, Taif 26571, Saudi Arabia.
⁴ SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence, King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia.

PMID: 38696757
PMCID: PMC11132820
DOI: 10.1093/bioinformatics/btae301

Abstract

Motivation: Whole-exome and genome sequencing have become common tools in diagnosing patients with rare diseases. Despite their success, this approach leaves many patients undiagnosed. A common argument is that more disease variants still await discovery, or the novelty of disease phenotypes results from a combination of variants in multiple disease-related genes. Interpreting the phenotypic consequences of genomic variants relies on information about gene functions, gene expression, physiology, and other genomic features. Phenotype-based methods to identify variants involved in genetic diseases combine molecular features with prior knowledge about the phenotypic consequences of altering gene functions. While phenotype-based methods have been successfully applied to prioritizing variants, such methods are based on known gene-disease or gene-phenotype associations as training data and are applicable to genes that have phenotypes associated, thereby limiting their scope. In addition, phenotypes are not assigned uniformly by different clinicians, and phenotype-based methods need to account for this variability.

Results: We developed an Embedding-based Phenotype Variant Predictor (EmbedPVP), a computational method to prioritize variants involved in genetic diseases by combining genomic information and clinical phenotypes. EmbedPVP leverages a large amount of background knowledge from human and model organisms about molecular mechanisms through which abnormal phenotypes may arise. Specifically, EmbedPVP incorporates phenotypes linked to genes, functions of gene products, and the anatomical site of gene expression, and systematically relates them to their phenotypic effects through neuro-symbolic, knowledge-enhanced machine learning. We demonstrate EmbedPVP's efficacy on a large set of synthetic genomes and genomes matched with clinical information.

Availability and implementation: EmbedPVP and all evaluation experiments are freely available at https://github.com/bio-ontology-research-group/EmbedPVP.

PubMed Disclaimer

Conflict of interest statement

The authors declare that no conflicts of interest exist.

Figures

**Figure 1.**
EmbedPVP Model Workflow. (A) Generates background knowledge from different ontologies. (B) Generates embeddings for diseases (*D_i*) and genes (*G_i*) using various embedding methods. (C) Calculates phenotype–genotype similarity using the scoring function associated with the selected embedding method, considering the maximum similarity score for multiple genes associated with the phenotype, and then averages this similarity with pathogenicity prediction. *V_i* represents variant i, *MS_vi* is max phenotype similarity for variants, *GP_vi* is genotype prediction (CADD), and *S_vi* is the final weighted score of phenotypes and genotypes for the variants.

See this image and copyright information in PMC

Cited by

GeOKG: geometry-aware knowledge graph embedding for Gene Ontology and genes.
Jeong CU, Kim J, Kim D, Sohn KA. Jeong CU, et al. Bioinformatics. 2025 Mar 29;41(4):btaf160. doi: 10.1093/bioinformatics/btaf160. Bioinformatics. 2025. PMID: 40217132 Free PMC article.
Computational strategies for cross-species knowledge transfer and translational biomedicine.
Yuan H, Mancuso CA, Johnson K, Braasch I, Krishnan A. Yuan H, et al. ArXiv [Preprint]. 2024 Aug 16:arXiv:2408.08503v1. ArXiv. 2024. PMID: 39184546 Free PMC article. Preprint.
The Unified Phenotype Ontology (uPheno): A framework for cross-species integrative phenomics.
Matentzoglu N, Bello SM, Stefancsik R, Alghamdi SM, Anagnostopoulos AV, Balhoff JP, Balk MA, Bradford YM, Bridges Y, Callahan TJ, Caufield H, Cuzick A, Carmody LC, Caron AR, de Souza V, Engel SR, Fey P, Fisher M, Gehrke S, Grove C, Hansen P, Harris NL, Harris MA, Harris L, Ibrahim A, Jacobsen JOB, Köhler S, McMurry JA, Munoz-Fuentes V, Munoz-Torres MC, Parkinson H, Pendlington ZM, Pilgrim C, Robb SM, Robinson PN, Seager J, Segerdell E, Smedley D, Sollis E, Toro S, Vasilevsky N, Wood V, Haendel MA, Mungall CJ, McLaughlin JA, Osumi-Sutherland D. Matentzoglu N, et al. bioRxiv [Preprint]. 2024 Sep 22:2024.09.18.613276. doi: 10.1101/2024.09.18.613276. bioRxiv. 2024. Update in: Genetics. 2025 Mar 17;229(3):iyaf027. doi: 10.1093/genetics/iyaf027. PMID: 39345458 Free PMC article. Updated. Preprint.
The Unified Phenotype Ontology : a framework for cross-species integrative phenomics.
Matentzoglu N, Bello SM, Stefancsik R, Alghamdi SM, Anagnostopoulos AV, Balhoff JP, Balk MA, Bradford YM, Bridges Y, Callahan TJ, Caufield H, Cuzick A, Carmody LC, Caron AR, de Souza V, Engel SR, Fey P, Fisher M, Gehrke S, Grove C, Hansen P, Harris NL, Harris MA, Harris L, Ibrahim A, Jacobsen JOB, Köhler S, McMurry JA, Munoz-Fuentes V, Munoz-Torres MC, Parkinson H, Pendlington ZM, Pilgrim C, Robb SMC, Robinson PN, Seager J, Segerdell E, Smedley D, Sollis E, Toro S, Vasilevsky N, Wood V, Haendel MA, Mungall CJ, McLaughlin JA, Osumi-Sutherland D. Matentzoglu N, et al. Genetics. 2025 Mar 17;229(3):iyaf027. doi: 10.1093/genetics/iyaf027. Genetics. 2025. PMID: 40048704 Free PMC article.

References

1. Adzhubei I, Jordan DM, Sunyaev SR. et al. Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protoc Hum Genet 2013;Chapter 7:Unit 7.20. - PMC - PubMed
1. Alghamdi SM, Schofield PN, Hoehndorf R. et al. Contribution of model organism phenotypes to the computational identification of human disease genes. Dis Model Mech 2022;15:dmm049441. - PMC - PubMed
1. Ali M, Berrendorf M, Hoyt CT. et al. Bringing light into the dark: a large-scale evaluation of knowledge graph embedding models under a unified framework. IEEE Trans Pattern Anal Mach Intell 2021a;44:8825–45. - PubMed
1. Ali M, Berrendorf M, Hoyt CT. et al. PyKEEN 1.0: a Python library for training and evaluating knowledge graph embeddings. J Mach Learn Res 2021b;22:3723–8.
1. Amberger J, Bocchini C, Hamosh A. et al. A new face and new challenges for Online Mendelian Inheritance in Man (OMIM®). Hum Mutat 2011;32:564–7. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

REI/1/5659-01-01/Abdullah University of Science and Technology

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Prioritizing genomic variants through neuro-symbolic, knowledge-enhanced learning

Affiliations

Prioritizing genomic variants through neuro-symbolic, knowledge-enhanced learning

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources