An Ensemble Classifiers for Improved Prediction of Native-Non-Native Protein-Protein Interaction
- PMID: 38892144
- PMCID: PMC11172808
- DOI: 10.3390/ijms25115957
An Ensemble Classifiers for Improved Prediction of Native-Non-Native Protein-Protein Interaction
Abstract
In this study, we present an innovative approach to improve the prediction of protein-protein interactions (PPIs) through the utilization of an ensemble classifier, specifically focusing on distinguishing between native and non-native interactions. Leveraging the strengths of various base models, including random forest, gradient boosting, extreme gradient boosting, and light gradient boosting, our ensemble classifier integrates these diverse predictions using a logistic regression meta-classifier. Our model was evaluated using a comprehensive dataset generated from molecular dynamics simulations. While the gains in AUC and other metrics might seem modest, they contribute to a model that is more robust, consistent, and adaptable. To assess the effectiveness of various approaches, we compared the performance of logistic regression to four baseline models. Our results indicate that logistic regression consistently underperforms across all evaluated metrics. This suggests that it may not be well-suited to capture the complex relationships within this dataset. Tree-based models, on the other hand, appear to be more effective for problems involving molecular dynamics simulations. Extreme gradient boosting (XGBoost) and light gradient boosting (LightGBM) are optimized for performance and speed, handling datasets effectively and incorporating regularizations to avoid over-fitting. Our findings indicate that the ensemble method enhances the predictive capability of PPIs, offering a promising tool for computational biology and drug discovery by accurately identifying potential interaction sites and facilitating the understanding of complex protein functions within biological systems.
Keywords: computational biology; drug discovery; ensemble classifiers; machine learning; protein–protein interaction.
Conflict of interest statement
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Figures




Similar articles
-
Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13. Med Phys. 2018. PMID: 29763967 Free PMC article.
-
Minimalist ensemble algorithms for genome-wide protein localization prediction.BMC Bioinformatics. 2012 Jul 3;13:157. doi: 10.1186/1471-2105-13-157. BMC Bioinformatics. 2012. PMID: 22759391 Free PMC article.
-
DeepStack-DTIs: Predicting Drug-Target Interactions Using LightGBM Feature Selection and Deep-Stacked Ensemble Classifier.Interdiscip Sci. 2022 Jun;14(2):311-330. doi: 10.1007/s12539-021-00488-7. Epub 2021 Nov 3. Interdiscip Sci. 2022. PMID: 34731411
-
Application of Machine Learning Approaches for Protein-protein Interactions Prediction.Med Chem. 2017;13(6):506-514. doi: 10.2174/1573406413666170522150940. Med Chem. 2017. PMID: 28530547 Review.
-
A survey on computational models for predicting protein-protein interactions.Brief Bioinform. 2021 Sep 2;22(5):bbab036. doi: 10.1093/bib/bbab036. Brief Bioinform. 2021. PMID: 33693513 Review.
Cited by
-
Recent advances in deep learning for protein-protein interaction: a review.BioData Min. 2025 Jun 16;18(1):43. doi: 10.1186/s13040-025-00457-6. BioData Min. 2025. PMID: 40524189 Free PMC article. Review.
References
MeSH terms
Substances
LinkOut - more resources
Full Text Sources