Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Dec 2;20(Suppl 16):506.
doi: 10.1186/s12859-019-3076-y.

DeepEP: a deep learning framework for identifying essential proteins

Affiliations

DeepEP: a deep learning framework for identifying essential proteins

Min Zeng et al. BMC Bioinformatics. .

Abstract

Background: Essential proteins are crucial for cellular life and thus, identification of essential proteins is an important topic and a challenging problem for researchers. Recently lots of computational approaches have been proposed to handle this problem. However, traditional centrality methods cannot fully represent the topological features of biological networks. In addition, identifying essential proteins is an imbalanced learning problem; but few current shallow machine learning-based methods are designed to handle the imbalanced characteristics.

Results: We develop DeepEP based on a deep learning framework that uses the node2vec technique, multi-scale convolutional neural networks and a sampling technique to identify essential proteins. In DeepEP, the node2vec technique is applied to automatically learn topological and semantic features for each protein in protein-protein interaction (PPI) network. Gene expression profiles are treated as images and multi-scale convolutional neural networks are applied to extract their patterns. In addition, DeepEP uses a sampling method to alleviate the imbalanced characteristics. The sampling method samples the same number of the majority and minority samples in a training epoch, which is not biased to any class in training process. The experimental results show that DeepEP outperforms traditional centrality methods. Moreover, DeepEP is better than shallow machine learning-based methods. Detailed analyses show that the dense vectors which are generated by node2vec technique contribute a lot to the improved performance. It is clear that the node2vec technique effectively captures the topological and semantic properties of PPI network. The sampling method also improves the performance of identifying essential proteins.

Conclusion: We demonstrate that DeepEP improves the prediction performance by integrating multiple deep learning techniques and a sampling method. DeepEP is more effective than existing methods.

Keywords: Deep learning; Identifying essential proteins; Imbalanced learning; Multi-scale convolutional neural networks; Protein-protein interaction network; node2vec.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
The architecture of our deep learning framework for identifying essential proteins
Fig. 2
Fig. 2
Illustration of the used sampling method
Fig. 3
Fig. 3
Performance of DeepEP, DC, BC, CC, EC, NC, LAC, PeC, and WDC
Fig. 4
Fig. 4
ROC and PR curves of DeepEP and models which use gene expression data combined with different central indexes (DC, CC, EC, BC, NC and LAC)
Fig. 5
Fig. 5
ROC and PR curves of DeepEP, our deep learning framework using different ratios of essential proteins to non-essential proteins (1: 1, 1: 1.5, 1: 2, 1: 2.5 and 1: 3), and using raw dataset. Note: RU refers to random undersampling

Similar articles

Cited by

References

    1. Glass JI, Hutchison CA, Smith HO, Venter JC. A systems biology tour de force for a near-minimal bacterium. Mol Syst Biol. 2009;5(1):330. doi: 10.1038/msb.2009.89. - DOI - PMC - PubMed
    1. Clatworthy AE, Pierson E, Hung DT. Targeting virulence: a new paradigm for antimicrobial therapy. Nat Chem Biol. 2007;3(9):541. doi: 10.1038/nchembio.2007.24. - DOI - PubMed
    1. Roemer T, Jiang B, Davison J, Ketela T, Veillette K, Breton A, Tandia F, Linteau A, Sillaots S, Marta C. Large-scale essential gene identification in Candida albicans and applications to antifungal drug discovery. Mol Microbiol. 2003;50(1):167–181. doi: 10.1046/j.1365-2958.2003.03697.x. - DOI - PubMed
    1. Cullen LM, Arndt GM. Genome-wide screening for gene function using RNAi in mammalian cells. Immunol Cell Biol. 2005;83(3):217–223. doi: 10.1111/j.1440-1711.2005.01332.x. - DOI - PubMed
    1. Giaever G, Chu AM, Ni L, Connelly C, Riles L, Veronneau S, Dow S, Lucau-Danila A, Anderson K, Andre B. Functional profiling of the Saccharomyces cerevisiae genome. Nature. 2002;418(6896):387. doi: 10.1038/nature00935. - DOI - PubMed

LinkOut - more resources