Anc2vec: embedding gene ontology terms by preserving ancestors relationships
- PMID: 35136916
- DOI: 10.1093/bib/bbac003
Anc2vec: embedding gene ontology terms by preserving ancestors relationships
Abstract
The gene ontology (GO) provides a hierarchical structure with a controlled vocabulary composed of terms describing functions and localization of gene products. Recent works propose vector representations, also known as embeddings, of GO terms that capture meaningful information about them. Significant performance improvements have been observed when these representations are used on diverse downstream tasks, such as the measurement of semantic similarity between GO terms and functional similarity between proteins. Despite the success shown by these approaches, existing embeddings of GO terms still fail to capture crucial structural features of the GO. Here, we present anc2vec, a novel protocol based on neural networks for constructing vector representations of GO terms by preserving three important ontological features: its ontological uniqueness, ancestors hierarchy and sub-ontology membership. The advantages of using anc2vec are demonstrated by systematic experiments on diverse tasks: visualization, sub-ontology prediction, inference of structurally related terms, retrieval of terms from aggregated embeddings, and prediction of protein-protein interactions. In these tasks, experimental results show that the performance of anc2vec representations is better than those of recent approaches. This demonstrates that higher performances on diverse tasks can be achieved by embeddings when the structure of the GO is better represented. Full source code and data are available at https://github.com/sinc-lab/anc2vec.
Keywords: gene ontology; neural networks; protein–protein interactions; semantic similarity.
© The Author(s) 2022. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Similar articles
-
HiG2Vec: hierarchical representations of Gene Ontology and genes in the Poincaré ball.Bioinformatics. 2021 Sep 29;37(18):2971-2980. doi: 10.1093/bioinformatics/btab193. Bioinformatics. 2021. PMID: 33760022 Free PMC article.
-
A relation based measure of semantic similarity for Gene Ontology annotations.BMC Bioinformatics. 2008 Nov 4;9:468. doi: 10.1186/1471-2105-9-468. BMC Bioinformatics. 2008. PMID: 18983678 Free PMC article.
-
TANGO: A GO-Term Embedding Based Method for Protein Semantic Similarity Prediction.IEEE/ACM Trans Comput Biol Bioinform. 2023 Jan-Feb;20(1):694-706. doi: 10.1109/TCBB.2022.3143480. Epub 2023 Feb 3. IEEE/ACM Trans Comput Biol Bioinform. 2023. PMID: 35030084
-
From ontology to semantic similarity: calculation of ontology-based semantic similarity.ScientificWorldJournal. 2013;2013:793091. doi: 10.1155/2013/793091. Epub 2013 Feb 28. ScientificWorldJournal. 2013. PMID: 23533360 Free PMC article. Review.
-
A Literature Review of Gene Function Prediction by Modeling Gene Ontology.Front Genet. 2020 Apr 24;11:400. doi: 10.3389/fgene.2020.00400. eCollection 2020. Front Genet. 2020. PMID: 32391061 Free PMC article. Review.
Cited by
-
Functional kinome profiling reveals brain protein kinase signaling pathways and gene networks altered by acute voluntary exercise in rats.PLoS One. 2025 Apr 15;20(4):e0321596. doi: 10.1371/journal.pone.0321596. eCollection 2025. PLoS One. 2025. PMID: 40233052 Free PMC article.
-
GeOKG: geometry-aware knowledge graph embedding for Gene Ontology and genes.Bioinformatics. 2025 Mar 29;41(4):btaf160. doi: 10.1093/bioinformatics/btaf160. Bioinformatics. 2025. PMID: 40217132 Free PMC article.
-
Bridging Large Language Models and Single-Cell Transcriptomics in Dissecting Selective Motor Neuron Vulnerability.ArXiv [Preprint]. 2025 May 12:arXiv:2505.07896v1. ArXiv. 2025. PMID: 40463696 Free PMC article. Preprint.
-
Developmental pyrethroid exposure disrupts molecular pathways for MAP kinase and circadian rhythms in mouse brain.bioRxiv [Preprint]. 2024 Mar 11:2023.08.28.555113. doi: 10.1101/2023.08.28.555113. bioRxiv. 2024. Update in: Physiol Genomics. 2025 Apr 01;57(4):240-253. doi: 10.1152/physiolgenomics.00033.2024. PMID: 37745438 Free PMC article. Updated. Preprint.
-
GOPhage: protein function annotation for bacteriophages by integrating the genomic context.Brief Bioinform. 2024 Nov 22;26(1):bbaf014. doi: 10.1093/bib/bbaf014. Brief Bioinform. 2024. PMID: 39838963 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Miscellaneous