PScL-DDCFPred: an ensemble deep learning-based approach for characterizing multiclass subcellular localization of human proteins from bioimage data
- PMID: 35771606
- PMCID: PMC9890309
- DOI: 10.1093/bioinformatics/btac432
PScL-DDCFPred: an ensemble deep learning-based approach for characterizing multiclass subcellular localization of human proteins from bioimage data
Abstract
Motivation: Characterization of protein subcellular localization has become an important and long-standing task in bioinformatics and computational biology, which provides valuable information for elucidating various cellular functions of proteins and guiding drug design.
Results: Here, we develop a novel bioimage-based computational approach, termed PScL-DDCFPred, to accurately predict protein subcellular localizations in human tissues. PScL-DDCFPred first extracts multiview image features, including global and local features, as base or pure features; next, it applies a new integrative feature selection method based on stepwise discriminant analysis and generalized discriminant analysis to identify the optimal feature sets from the extracted pure features; Finally, a classifier based on deep neural network (DNN) and deep-cascade forest (DCF) is established. Stringent 10-fold cross-validation tests on the new protein subcellular localization training dataset, constructed from the human protein atlas databank, illustrates that PScL-DDCFPred achieves a better performance than several existing state-of-the-art methods. Moreover, the independent test set further illustrates the generalization capability and superiority of PScL-DDCFPred over existing predictors. In-depth analysis shows that the excellent performance of PScL-DDCFPred can be attributed to three critical factors, namely the effective combination of the DNN and DCF models, complementarity of global and local features, and use of the optimal feature sets selected by the integrative feature selection algorithm.
Availability and implementation: https://github.com/csbio-njust-edu/PScL-DDCFPred.
Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author(s) 2022. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Figures






Similar articles
-
PScL-2LSAESM: bioimage-based prediction of protein subcellular localization by integrating heterogeneous features with the two-level SAE-SM and mean ensemble method.Bioinformatics. 2023 Jan 1;39(1):btac727. doi: 10.1093/bioinformatics/btac727. Bioinformatics. 2023. PMID: 36413068 Free PMC article.
-
PScL-HDeep: image-based prediction of protein subcellular location in human tissue using ensemble learning of handcrafted and deep learned features with two-layer feature selection.Brief Bioinform. 2021 Nov 5;22(6):bbab278. doi: 10.1093/bib/bbab278. Brief Bioinform. 2021. PMID: 34337652 Free PMC article.
-
PScL-SDNNMAE: Protein Subcellular Localization Prediction Using Classical and Masked Autoencoder-Based Multi-View Features With Ensemble Feature Selection.IEEE Trans Comput Biol Bioinform. 2025 Jul-Aug;22(4):1606-1614. doi: 10.1109/TCBBIO.2025.3562809. IEEE Trans Comput Biol Bioinform. 2025. PMID: 40811330
-
Deep learning model for protein multi-label subcellular localization and function prediction based on multi-task collaborative training.Brief Bioinform. 2024 Sep 23;25(6):bbae568. doi: 10.1093/bib/bbae568. Brief Bioinform. 2024. PMID: 39489606 Free PMC article.
-
A review from biological mapping to computation-based subcellular localization.Mol Ther Nucleic Acids. 2023 Apr 20;32:507-521. doi: 10.1016/j.omtn.2023.04.015. eCollection 2023 Jun 13. Mol Ther Nucleic Acids. 2023. PMID: 37215152 Free PMC article. Review.
Cited by
-
Pixel-level multimodal fusion deep networks for predicting subcellular organelle localization from label-free live-cell imaging.Front Genet. 2022 Oct 26;13:1002327. doi: 10.3389/fgene.2022.1002327. eCollection 2022. Front Genet. 2022. PMID: 36386823 Free PMC article.
-
SaPt-CNN-LSTM-AR-EA: a hybrid ensemble learning framework for time series-based multivariate DNA sequence prediction.PeerJ. 2023 Oct 4;11:e16192. doi: 10.7717/peerj.16192. eCollection 2023. PeerJ. 2023. PMID: 37810796 Free PMC article.
-
DeepAIPs-SFLA: Deep Convolutional Model for Prediction of Anti-Inflammatory Peptides Using Binary Pattern Decomposition of Novel Multiview Descriptors with an SFLA Approach.ACS Omega. 2025 Aug 5;10(32):35747-35762. doi: 10.1021/acsomega.5c02422. eCollection 2025 Aug 19. ACS Omega. 2025. PMID: 40852276 Free PMC article.
-
Machine Learning Empowering Microbial Cell Factory: A Comprehensive Review.Appl Biochem Biotechnol. 2025 Aug;197(8):4897-4913. doi: 10.1007/s12010-025-05260-x. Epub 2025 May 21. Appl Biochem Biotechnol. 2025. PMID: 40397295 Review.
-
PScL-2LSAESM: bioimage-based prediction of protein subcellular localization by integrating heterogeneous features with the two-level SAE-SM and mean ensemble method.Bioinformatics. 2023 Jan 1;39(1):btac727. doi: 10.1093/bioinformatics/btac727. Bioinformatics. 2023. PMID: 36413068 Free PMC article.
References
-
- Baudat G., Anouar F. (2000) Generalized discriminant analysis using a kernel approach. Neural Comput., 12, 2385–2404. - PubMed
-
- Boland M.V., Murphy R.F. (2001) A neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope images of HeLa cells. Bioinformatics, 17, 1213–1223. - PubMed
-
- Breiman L. (2001) Random forests. Mach. Learn., 45, 5–32.
-
- Chen T., Guestrin C. (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, USA. Association for Computing Machinery, New York, NY, USA, pp. 785–794.
-
- Chen C. et al. (2020) Improving protein–protein interactions prediction accuracy using XGBoost feature selection and stacked ensemble classifier. Comput. Biol. Med., 123, 103899. - PubMed