Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 May 29;25(11):5957.
doi: 10.3390/ijms25115957.

An Ensemble Classifiers for Improved Prediction of Native-Non-Native Protein-Protein Interaction

Affiliations

An Ensemble Classifiers for Improved Prediction of Native-Non-Native Protein-Protein Interaction

Nor Kumalasari Caecar Pratiwi et al. Int J Mol Sci. .

Abstract

In this study, we present an innovative approach to improve the prediction of protein-protein interactions (PPIs) through the utilization of an ensemble classifier, specifically focusing on distinguishing between native and non-native interactions. Leveraging the strengths of various base models, including random forest, gradient boosting, extreme gradient boosting, and light gradient boosting, our ensemble classifier integrates these diverse predictions using a logistic regression meta-classifier. Our model was evaluated using a comprehensive dataset generated from molecular dynamics simulations. While the gains in AUC and other metrics might seem modest, they contribute to a model that is more robust, consistent, and adaptable. To assess the effectiveness of various approaches, we compared the performance of logistic regression to four baseline models. Our results indicate that logistic regression consistently underperforms across all evaluated metrics. This suggests that it may not be well-suited to capture the complex relationships within this dataset. Tree-based models, on the other hand, appear to be more effective for problems involving molecular dynamics simulations. Extreme gradient boosting (XGBoost) and light gradient boosting (LightGBM) are optimized for performance and speed, handling datasets effectively and incorporating regularizations to avoid over-fitting. Our findings indicate that the ensemble method enhances the predictive capability of PPIs, offering a promising tool for computational biology and drug discovery by accurately identifying potential interaction sites and facilitating the understanding of complex protein functions within biological systems.

Keywords: computational biology; drug discovery; ensemble classifiers; machine learning; protein–protein interaction.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Figure 1
Figure 1
Schematic representation of a two-tiered machine learning framework for classifying protein–protein interactions as native or non-native. The training data are used to build and optimize several base learners, including random forest, gradient boosting, XGBoost, and LightGBM, through grid search optimization. A meta-learner, Logistic Regression, takes these models’ predictions to generate the final classification results.
Figure 2
Figure 2
Comparative performance of machine learning models for protein–protein interaction prediction across different trajectory intervals.
Figure 3
Figure 3
The performance ensemble classifier for each trajectory interval.
Figure 4
Figure 4
Distinguishing native from non-native protein–protein interactions: two different input proteins (distinguished by two different colors proteins) are first processed through protein docking using HADDOCK, generating potential protein complex models (illustrate as overlapping content). These complexes are then subjected to molecular dynamics (MD) simulations with GROMACS. The resulting MD trajectory data are used to rank the poses, identifying native and non-native PPIs.

Similar articles

Cited by

References

    1. Mazmanian K., Sargsyan K., Lim C. How the local environment of functional sites regulates protein function. J. Am. Chem. Soc. 2020;142:9861–9871. doi: 10.1021/jacs.0c02430. - DOI - PubMed
    1. Peng X., Wang J., Peng W., Wu F.X., Pan Y. Protein–protein interactions: Detection, reliability assessment and applications. Briefings Bioinform. 2017;18:798–819. doi: 10.1093/bib/bbw066. - DOI - PubMed
    1. Xiang H., Zhou M., Li Y., Zhou L., Wang R. Drug discovery by targeting the protein–protein interactions involved in autophagy. Acta Pharm. Sin. B. 2023 doi: 10.1016/j.apsb.2023.07.016. - DOI - PMC - PubMed
    1. Morris R., Black K.A., Stollar E.J. Uncovering protein function: From classification to complexes. Essays Biochem. 2022;66:255–285. doi: 10.1042/EBC20200108. - DOI - PMC - PubMed
    1. Keskin O., Gursoy A., Ma B., Nussinov R. Principles of protein- protein interactions: What are the preferred ways for proteins to interact? Chem. Rev. 2008;108:1225–1244. doi: 10.1021/cr040409x. - DOI - PubMed

LinkOut - more resources