An Ensemble Classifiers for Improved Prediction of Native-Non-Native Protein-Protein Interaction

Nor Kumalasari Caecar Pratiwi^{1

2}, Hilal Tayara³, Kil To Chong^{1

4}

Affiliations

¹ Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea.
² Department of Electrical Engineering, Telkom University, Bandung 40257, West Java, Indonesia.
³ School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, Republic of Korea.
⁴ Advances Electronics and Information Research Centre, Jeonbuk National University, Jeonju 54896, Republic of Korea.

PMID: 38892144
PMCID: PMC11172808
DOI: 10.3390/ijms25115957

An Ensemble Classifiers for Improved Prediction of Native-Non-Native Protein-Protein Interaction

Nor Kumalasari Caecar Pratiwi et al. Int J Mol Sci. 2024.

. 2024 May 29;25(11):5957.

doi: 10.3390/ijms25115957.

Authors

Nor Kumalasari Caecar Pratiwi^{1

2}, Hilal Tayara³, Kil To Chong^{1

4}

Affiliations

¹ Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea.
² Department of Electrical Engineering, Telkom University, Bandung 40257, West Java, Indonesia.
³ School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, Republic of Korea.
⁴ Advances Electronics and Information Research Centre, Jeonbuk National University, Jeonju 54896, Republic of Korea.

PMID: 38892144
PMCID: PMC11172808
DOI: 10.3390/ijms25115957

Abstract

In this study, we present an innovative approach to improve the prediction of protein-protein interactions (PPIs) through the utilization of an ensemble classifier, specifically focusing on distinguishing between native and non-native interactions. Leveraging the strengths of various base models, including random forest, gradient boosting, extreme gradient boosting, and light gradient boosting, our ensemble classifier integrates these diverse predictions using a logistic regression meta-classifier. Our model was evaluated using a comprehensive dataset generated from molecular dynamics simulations. While the gains in AUC and other metrics might seem modest, they contribute to a model that is more robust, consistent, and adaptable. To assess the effectiveness of various approaches, we compared the performance of logistic regression to four baseline models. Our results indicate that logistic regression consistently underperforms across all evaluated metrics. This suggests that it may not be well-suited to capture the complex relationships within this dataset. Tree-based models, on the other hand, appear to be more effective for problems involving molecular dynamics simulations. Extreme gradient boosting (XGBoost) and light gradient boosting (LightGBM) are optimized for performance and speed, handling datasets effectively and incorporating regularizations to avoid over-fitting. Our findings indicate that the ensemble method enhances the predictive capability of PPIs, offering a promising tool for computational biology and drug discovery by accurately identifying potential interaction sites and facilitating the understanding of complex protein functions within biological systems.

Keywords: computational biology; drug discovery; ensemble classifiers; machine learning; protein–protein interaction.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

**Figure 1**
Schematic representation of a two-tiered machine learning framework for classifying protein–protein interactions as native or non-native. The training data are used to build and optimize several base learners, including random forest, gradient boosting, XGBoost, and LightGBM, through grid search optimization. A meta-learner, Logistic Regression, takes these models’ predictions to generate the final classification results.

**Figure 2**
Comparative performance of machine learning models for protein–protein interaction prediction across different trajectory intervals.

**Figure 3**
The performance ensemble classifier for each trajectory interval.

**Figure 4**
Distinguishing native from non-native protein–protein interactions: two different input proteins (distinguished by two different colors proteins) are first processed through protein docking using HADDOCK, generating potential protein complex models (illustrate as overlapping content). These complexes are then subjected to molecular dynamics (MD) simulations with GROMACS. The resulting MD trajectory data are used to rank the poses, identifying native and non-native PPIs.

See this image and copyright information in PMC

References

1. Mazmanian K., Sargsyan K., Lim C. How the local environment of functional sites regulates protein function. J. Am. Chem. Soc. 2020;142:9861–9871. doi: 10.1021/jacs.0c02430. - DOI - PubMed
1. Peng X., Wang J., Peng W., Wu F.X., Pan Y. Protein–protein interactions: Detection, reliability assessment and applications. Briefings Bioinform. 2017;18:798–819. doi: 10.1093/bib/bbw066. - DOI - PubMed
1. Xiang H., Zhou M., Li Y., Zhou L., Wang R. Drug discovery by targeting the protein–protein interactions involved in autophagy. Acta Pharm. Sin. B. 2023 doi: 10.1016/j.apsb.2023.07.016. - DOI - PMC - PubMed
1. Morris R., Black K.A., Stollar E.J. Uncovering protein function: From classification to complexes. Essays Biochem. 2022;66:255–285. doi: 10.1042/EBC20200108. - DOI - PMC - PubMed
1. Keskin O., Gursoy A., Ma B., Nussinov R. Principles of protein- protein interactions: What are the preferred ways for proteins to interact? Chem. Rev. 2008;108:1225–1244. doi: 10.1021/cr040409x. - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

An Ensemble Classifiers for Improved Prediction of Native-Non-Native Protein-Protein Interaction

Affiliations

An Ensemble Classifiers for Improved Prediction of Native-Non-Native Protein-Protein Interaction

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources