Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 26;38(11):3051-3061.
doi: 10.1093/bioinformatics/btac304.

deepSimDEF: deep neural embeddings of gene products and gene ontology terms for functional analysis of genes

Affiliations

deepSimDEF: deep neural embeddings of gene products and gene ontology terms for functional analysis of genes

Ahmad Pesaranghader et al. Bioinformatics. .

Abstract

Motivation: There is a plethora of measures to evaluate functional similarity (FS) of genes based on their co-expression, protein-protein interactions and sequence similarity. These measures are typically derived from hand-engineered and application-specific metrics to quantify the degree of shared information between two genes using their Gene Ontology (GO) annotations.

Results: We introduce deepSimDEF, a deep learning method to automatically learn FS estimation of gene pairs given a set of genes and their GO annotations. deepSimDEF's key novelty is its ability to learn low-dimensional embedding vector representations of GO terms and gene products and then calculate FS using these learned vectors. We show that deepSimDEF can predict the FS of new genes using their annotations: it outperformed all other FS measures by >5-10% on yeast and human reference datasets on protein-protein interactions, gene co-expression and sequence homology tasks. Thus, deepSimDEF offers a powerful and adaptable deep neural architecture that can benefit a wide range of problems in genomics and proteomics, and its architecture is flexible enough to support its extension to any organism.

Availability and implementation: Source code and data are available at https://github.com/ahmadpgh/deepSimDEF.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Pearson’s correlation results for the prediction of gene–gene co-expressions in yeast data
Fig. 2.
Fig. 2.
Pearson’s correlation results for the prediction of gene–gene co-expressions in human data
Fig. 3.
Fig. 3.
Definition-based embedding model of the Gene Ontology terms
Fig. 4.
Fig. 4.
Paired single-channel deepSimDEF network architecture for BP
Fig. 5.
Fig. 5.
Paired multi-channel deepSimDEF network architecture

References

    1. Altschul S.F. et al. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403–410. - PubMed
    1. Asgari E., Mofrad M.R. (2015) Continuous distributed representation of biological sequences for deep proteomics and genomics. PLoS One, 10, e0141287. - PMC - PubMed
    1. Ashburner M. et al. (2000) Gene ontology: tool for the unification of biology. Nat. Genet., 25, 25–29. - PMC - PubMed
    1. Ben Ali W. et al. (2021) Implementing machine learning in interventional cardiology: the benefits are worth the trouble. Front. Cardiovasc. Med., 8, 711401. https://doi.org/10.3389/fcvm.2021.711401. - PMC - PubMed
    1. Bible P.W. et al. (2017) The effects of shared information on semantic calculations in the gene ontology. Comput. Struct. Biotechnol. J., 15, 195–211. - PMC - PubMed

Publication types