PScL-HDeep: image-based prediction of protein subcellular location in human tissue using ensemble learning of handcrafted and deep learned features with two-layer feature selection
- PMID: 34337652
- PMCID: PMC8574991
- DOI: 10.1093/bib/bbab278
PScL-HDeep: image-based prediction of protein subcellular location in human tissue using ensemble learning of handcrafted and deep learned features with two-layer feature selection
Abstract
Protein subcellular localization plays a crucial role in characterizing the function of proteins and understanding various cellular processes. Therefore, accurate identification of protein subcellular location is an important yet challenging task. Numerous computational methods have been proposed to predict the subcellular location of proteins. However, most existing methods have limited capability in terms of the overall accuracy, time consumption and generalization power. To address these problems, in this study, we developed a novel computational approach based on human protein atlas (HPA) data, referred to as PScL-HDeep, for accurate and efficient image-based prediction of protein subcellular location in human tissues. We extracted different handcrafted and deep learned (by employing pretrained deep learning model) features from different viewpoints of the image. The step-wise discriminant analysis (SDA) algorithm was applied to generate the optimal feature set from each original raw feature set. To further obtain a more informative feature subset, support vector machine-based recursive feature elimination with correlation bias reduction (SVM-RFE + CBR) feature selection algorithm was applied to the integrated feature set. Finally, the classification models, namely support vector machine with radial basis function (SVM-RBF) and support vector machine with linear kernel (SVM-LNR), were learned on the final selected feature set. To evaluate the performance of the proposed method, a new gold standard benchmark training dataset was constructed from the HPA databank. PScL-HDeep achieved the maximum performance on 10-fold cross validation test on this dataset and showed a better efficacy over existing predictors. Furthermore, we also illustrated the generalization ability of the proposed method by conducting a stringent independent validation test.
Keywords: bioimage analysis; deep learned features; feature selection; handcrafted features; protein subcellular location.
© The Author(s) 2021. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Figures








Similar articles
-
PScL-DDCFPred: an ensemble deep learning-based approach for characterizing multiclass subcellular localization of human proteins from bioimage data.Bioinformatics. 2022 Aug 10;38(16):4019-4026. doi: 10.1093/bioinformatics/btac432. Bioinformatics. 2022. PMID: 35771606 Free PMC article.
-
PScL-SDNNMAE: Protein Subcellular Localization Prediction Using Classical and Masked Autoencoder-Based Multi-View Features With Ensemble Feature Selection.IEEE Trans Comput Biol Bioinform. 2025 Jul-Aug;22(4):1606-1614. doi: 10.1109/TCBBIO.2025.3562809. IEEE Trans Comput Biol Bioinform. 2025. PMID: 40811330
-
PScL-2LSAESM: bioimage-based prediction of protein subcellular localization by integrating heterogeneous features with the two-level SAE-SM and mean ensemble method.Bioinformatics. 2023 Jan 1;39(1):btac727. doi: 10.1093/bioinformatics/btac727. Bioinformatics. 2023. PMID: 36413068 Free PMC article.
-
Computer-assisted lip diagnosis on Traditional Chinese Medicine using multi-class support vector machines.BMC Complement Altern Med. 2012 Aug 16;12:127. doi: 10.1186/1472-6882-12-127. BMC Complement Altern Med. 2012. PMID: 22898352 Free PMC article.
-
Combining handcrafted features with latent variables in machine learning for prediction of radiation-induced lung damage.Med Phys. 2019 May;46(5):2497-2511. doi: 10.1002/mp.13497. Epub 2019 Apr 8. Med Phys. 2019. PMID: 30891794 Free PMC article.
Cited by
-
Recent Advances in the Prediction of Subcellular Localization of Proteins and Related Topics.Front Bioinform. 2022 May 19;2:910531. doi: 10.3389/fbinf.2022.910531. eCollection 2022. Front Bioinform. 2022. PMID: 36304291 Free PMC article. Review.
-
Combining lipidomics and machine learning to identify lipid biomarkers for nonsyndromic cleft lip with palate.JCI Insight. 2025 May 8;10(9):e186629. doi: 10.1172/jci.insight.186629. eCollection 2025 May 8. JCI Insight. 2025. PMID: 40337862 Free PMC article.
-
Dual-Signal Feature Spaces Map Protein Subcellular Locations Based on Immunohistochemistry Image and Protein Sequence.Sensors (Basel). 2023 Nov 7;23(22):9014. doi: 10.3390/s23229014. Sensors (Basel). 2023. PMID: 38005402 Free PMC article.
-
PSL-LCCL: a resource for subcellular protein localization in liver cancer cell line SK_HEP1.Database (Oxford). 2022 Feb 4;2022:baab087. doi: 10.1093/database/baab087. Database (Oxford). 2022. PMID: 35134877 Free PMC article.
-
StackTTCA: a stacking ensemble learning-based framework for accurate and high-throughput identification of tumor T cell antigens.BMC Bioinformatics. 2023 Jul 28;24(1):301. doi: 10.1186/s12859-023-05421-x. BMC Bioinformatics. 2023. PMID: 37507654 Free PMC article.
References
-
- Yang F, Xu Y-Y, Wang S-T, et al. . Image-based classification of protein subcellular location patterns in human reproductive tissue by ensemble learning global and local features. Neurocomputing 2014;131:113–23.
-
- Hung M-C, Link W. Protein localization in disease and therapy. J Cell Sci 2011;124:3381–92. - PubMed
-
- Kajiwara D, Minamiguchi K, Seki M, et al. . Effect of a new type androgen receptor antagonist, TAS3681, on ligand-independent AR activation through its AR downregulation activity. J Clin Oncol 2016;34:199–9. - PubMed
-
- Thul PJ, Akesson L, Wiking M, et al. . A subcellular map of the human proteome. Science 2017;356:eaal3321. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Miscellaneous