PScL-SDNNMAE: Protein Subcellular Localization Prediction Using Classical and Masked Autoencoder-Based Multi-View Features With Ensemble Feature Selection
- PMID: 40811330
- DOI: 10.1109/TCBBIO.2025.3562809
PScL-SDNNMAE: Protein Subcellular Localization Prediction Using Classical and Masked Autoencoder-Based Multi-View Features With Ensemble Feature Selection
Abstract
Accurate prediction of protein subcellular localization is critical for understanding cellular functions and guiding drug design. However, current computational methods have limited and insufficient performance and as such, there exist few efficient vision learners based on self-supervised learning for extracting deep and informative features. To address it, we propose a novel bioimage-based method, termed PScL-SDNNMAE, to effectively predict the subcellular localizations of proteins in human cells. PScL-SDNNMAE first extracts classical features using traditional image descriptors. Next, the masked autoencoder (MAE) is first trained using the training image data and then used to extract the MAE-based deep features. In the feature selection phase, PScL-SDNNMAE applies the Analysis of Variance (ANOVA), Mutual Information (MI) and stepwise discriminant analysis (SDA) to select the optimal features from the classical feature sets. Finally, PScL-SDNNMAE trains the deep neural network (DNN) classifier using the super feature set generated by integrating all the classical optimal and MAE-based deep features. Extensive benchmark experiments including 10-fold cross-validation on the training dataset and independent test on the independent dataset illustrate more advanced performance and generalization capability of PScL-SDNNMAE than other existing state-of-the-art predictors. Moreover, the experiments also demonstrate the effectiveness of self-supervised learning methods in learning representations of IHC images, as well as the significant potential for pre-training on massive unlabeled datasets in the future.
Similar articles
-
Prescription of Controlled Substances: Benefits and Risks.2025 Jul 6. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan–. 2025 Jul 6. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan–. PMID: 30726003 Free Books & Documents.
-
PScL-DDCFPred: an ensemble deep learning-based approach for characterizing multiclass subcellular localization of human proteins from bioimage data.Bioinformatics. 2022 Aug 10;38(16):4019-4026. doi: 10.1093/bioinformatics/btac432. Bioinformatics. 2022. PMID: 35771606 Free PMC article.
-
Classification of finger movements through optimal EEG channel and feature selection.Front Hum Neurosci. 2025 Jul 16;19:1633910. doi: 10.3389/fnhum.2025.1633910. eCollection 2025. Front Hum Neurosci. 2025. PMID: 40741296 Free PMC article.
-
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340. Health Technol Assess. 2006. PMID: 16959170
-
Artificial intelligence for diagnosing exudative age-related macular degeneration.Cochrane Database Syst Rev. 2024 Oct 17;10(10):CD015522. doi: 10.1002/14651858.CD015522.pub2. Cochrane Database Syst Rev. 2024. PMID: 39417312