Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug 4;23(1):318.
doi: 10.1186/s12859-022-04868-8.

A deep learning framework for identifying essential proteins based on multiple biological information

Affiliations

A deep learning framework for identifying essential proteins based on multiple biological information

Yi Yue et al. BMC Bioinformatics. .

Abstract

Background: Essential Proteins are demonstrated to exert vital functions on cellular processes and are indispensable for the survival and reproduction of the organism. Traditional centrality methods perform poorly on complex protein-protein interaction (PPI) networks. Machine learning approaches based on high-throughput data lack the exploitation of the temporal and spatial dimensions of biological information.

Results: We put forward a deep learning framework to predict essential proteins by integrating features obtained from the PPI network, subcellular localization, and gene expression profiles. In our model, the node2vec method is applied to learn continuous feature representations for proteins in the PPI network, which capture the diversity of connectivity patterns in the network. The concept of depthwise separable convolution is employed on gene expression profiles to extract properties and observe the trends of gene expression over time under different experimental conditions. Subcellular localization information is mapped into a long one-dimensional vector to capture its characteristics. Additionally, we use a sampling method to mitigate the impact of imbalanced learning when training the model. With experiments carried out on the data of Saccharomyces cerevisiae, results show that our model outperforms traditional centrality methods and machine learning methods. Likewise, the comparative experiments have manifested that our process of various biological information is preferable.

Conclusions: Our proposed deep learning framework effectively identifies essential proteins by integrating multiple biological data, proving a broader selection of subcellular localization information significantly improves the results of prediction and depthwise separable convolution implemented on gene expression profiles enhances the performance.

Keywords: Deep learning; Essential protein; Gene expression; Protein–protein interaction network; Subcellular localization.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1
Fig. 1
General structure of our framework. Conv1D 1-dimensional convolution, BN batch normalization, MP max pooling, PW pointwise convolution, GMP global max pooling
Fig. 2
Fig. 2
The PCC between replicate samples and different conditions in the gene expression profiles. C control group, O observation group, R1 replicate sample 1, R2 replicate sample 2, R3 replicate sample 3
Fig. 3
Fig. 3
Visualization of gene expression profile processing. Conv1D 1-dimensional convolution, BN batch normalization, MP max pooling, PW pointwise convolution, GMP global max pooling
Fig. 4
Fig. 4
The process of subcellular localization data for protein YLR308W. ABmEE complex Apolipoprotein B mRNA editing enzyme complex, CPKAK complex cyclin-dependent protein kinase activating kinase holoenzyme complex, 6PFK complex 6-phospho fructose kinase complex
Fig. 5
Fig. 5
ROC and PR curve of different processes on gene expression profiles
Fig. 6
Fig. 6
ROC and PR curve of different selections of subcellular localization
Fig. 7
Fig. 7
ROC and PR curve of node2vec technique and centrality methods
Fig. 8
Fig. 8
ROC and PR curve of different feature combinations. S subcellular localization features, N network embedding features, G gene expression profile features, N + G network embedding features plus gene expression profile features, N + S network embedding features plus subcellular localization features, S + G subcellular localization features plus gene expression profile features, S + N + G subcellular localization features plus network embedding features and gene expression profile features
Fig. 9
Fig. 9
ROC and PR curve of different approaches to the unbalanced dataset. CW class weight
Fig. 10
Fig. 10
Performance of our method and centrality methods
Fig. 11
Fig. 11
ROC and PR curve of our method and other machine learning methods. SVM support vector machine, LR logistic regression, NB naive bayes, RF random forest, DT decision tree
Fig. 12
Fig. 12
The process of GSE41828
Fig. 13
Fig. 13
Compare with centrality methods on Homo sapiens data
Fig. 14
Fig. 14
ROC and PR curve of our method and other machine learning methods. SVM Support Vector Machine, AB AdaBoost, LR Logistic Regression, NB Naive Bayes, RF Random Forest, DT Decision Tree

Similar articles

Cited by

References

    1. Jeong H, Mason SP, Barabási AL, Oltvai ZN. Lethality and centrality in protein networks. Nature. 2001;411:41–42. doi: 10.1038/35075138. - DOI - PubMed
    1. Furney SJ, Albà MM, López-Bigas N. Differences in the evolutionary history of disease genes affected by dominant or recessive mutations. BMC Genomics. 2006;7:165. doi: 10.1186/1471-2164-7-165. - DOI - PMC - PubMed
    1. Becker SA, Palsson BØ. Genome-scale reconstruction of the metabolic network in Staphylococcus aureus N315: an initial draft to the two-dimensional annotation. BMC Microbiol. 2005;5:8. doi: 10.1186/1471-2180-5-8. - DOI - PMC - PubMed
    1. Giaever G, Chu AM, Ni L, Connelly C, Riles L, Véronneau S, et al. Functional profiling of the Saccharomyces cerevisiae genome. Nature. 2002;418:387–391. doi: 10.1038/nature00935. - DOI - PubMed
    1. Cullen LM, Arndt GM. Genome-wide screening for gene function using RNAi in mammalian cells. Immunol Cell Biol. 2005;83:217–223. doi: 10.1111/j.1440-1711.2005.01332.x. - DOI - PubMed

MeSH terms

Grants and funding

LinkOut - more resources