Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jul 1;34(13):i52-i60.
doi: 10.1093/bioinformatics/bty259.

Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations

Affiliations

Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations

Fatima Zohra Smaili et al. Bioinformatics. .

Abstract

Motivation: Biological knowledge is widely represented in the form of ontology-based annotations: ontologies describe the phenomena assumed to exist within a domain, and the annotations associate a (kind of) biological entity with a set of phenomena within the domain. The structure and information contained in ontologies and their annotations make them valuable for developing machine learning, data analysis and knowledge extraction algorithms; notably, semantic similarity is widely used to identify relations between biological entities, and ontology-based annotations are frequently used as features in machine learning applications.

Results: We propose the Onto2Vec method, an approach to learn feature vectors for biological entities based on their annotations to biomedical ontologies. Our method can be applied to a wide range of bioinformatics research problems such as similarity-based prediction of interactions between proteins, classification of interaction types using supervised learning, or clustering. To evaluate Onto2Vec, we use the gene ontology (GO) and jointly produce dense vector representations of proteins, the GO classes to which they are annotated, and the axioms in GO that constrain these classes. First, we demonstrate that Onto2Vec-generated feature vectors can significantly improve prediction of protein-protein interactions in human and yeast. We then illustrate how Onto2Vec representations provide the means for constructing data-driven, trainable semantic similarity measures that can be used to identify particular relations between proteins. Finally, we use an unsupervised clustering approach to identify protein families based on their Enzyme Commission numbers. Our results demonstrate that Onto2Vec can generate high quality feature vectors from biological entities and ontologies. Onto2Vec has the potential to significantly outperform the state-of-the-art in several predictive applications in which ontologies are involved.

Availability and implementation: https://github.com/bio-ontology-research-group/onto2vec.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Onto2Vec Workflow. The blue-shaded part illustrates the steps to obtain vector representation for classes from the ontology. The purple-shaded part shows the steps to obtain vector representations of ontology classes and the entities annotated to these classes
Fig. 2.
Fig. 2.
ROC curves for PPI prediction for the unsupervised learning methods
Fig. 3.
Fig. 3.
ROC curves for PPI prediction for the supervised learning methods, in addition to Resnik’s semantic similarity measure for comparison
Fig. 4.
Fig. 4.
t-SNE visualization of 10, 000 enzyme vectors color-coded by their first level EC category (1, 2, 3, 4, 5 or 6)

Similar articles

Cited by

References

    1. Alshahrani M. et al. (2017) Neuro-symbolic representation learning on biological knowledge graphs. Bioinformatics, 33, 2723–2730. - PMC - PubMed
    1. Ashburner M. et al. (2000) Gene ontology: tool for the unification of biology. Nat. Genet., 25, 25–29. - PMC - PubMed
    1. Azuaje F. et al. (2005) Ontology-driven similarity approaches to supporting gene functional assessment. In Proceedings of the ISMB’2005 SIG meeting on Bio-ontologies, pp. 9–10.
    1. Bergadano F. (1991) The problem of induction and machine learning. In Proceedings of the 12th International Joint Conference on Artificial Intelligence - Volume 2, IJCAI’91, pp. 1073–1078. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
    1. Besold T.R. et al. (2017) Neural-symbolic learning and reasoning: a survey and interpretation. CoRR, abs/1711.03902.

Publication types