Quantification of biases in predictions of protein-protein binding affinity changes upon mutations

doi:10.1093/bib/bbad491

. 2023 Nov 22;25(1):bbad491.

doi: 10.1093/bib/bbad491.

Quantification of biases in predictions of protein-protein binding affinity changes upon mutations

Matsvei Tsishyn^{1

2}, Fabrizio Pucci^{1

2}, Marianne Rooman^{1

2}

Affiliations

¹ Computational Biology and Bioinformatics, Université Libre de Bruxelles, Roosevelt Ave, 1050, Brussels, Belgium.
² Interuniversity Institute of Bioinformatics in Brussels, Brussels, Belgium.

PMID: 38197311
PMCID: PMC10777193
DOI: 10.1093/bib/bbad491

Quantification of biases in predictions of protein-protein binding affinity changes upon mutations

Matsvei Tsishyn et al. Brief Bioinform. 2023.

. 2023 Nov 22;25(1):bbad491.

doi: 10.1093/bib/bbad491.

Authors

Matsvei Tsishyn^{1

2}, Fabrizio Pucci^{1

2}, Marianne Rooman^{1

2}

Affiliations

¹ Computational Biology and Bioinformatics, Université Libre de Bruxelles, Roosevelt Ave, 1050, Brussels, Belgium.
² Interuniversity Institute of Bioinformatics in Brussels, Brussels, Belgium.

PMID: 38197311
PMCID: PMC10777193
DOI: 10.1093/bib/bbad491

Abstract

Understanding the impact of mutations on protein-protein binding affinity is a key objective for a wide range of biotechnological applications and for shedding light on disease-causing mutations, which are often located at protein-protein interfaces. Over the past decade, many computational methods using physics-based and/or machine learning approaches have been developed to predict how protein binding affinity changes upon mutations. They all claim to achieve astonishing accuracy on both training and test sets, with performances on standard benchmarks such as SKEMPI 2.0 that seem overly optimistic. Here we benchmarked eight well-known and well-used predictors and identified their biases and dataset dependencies, using not only SKEMPI 2.0 as a test set but also deep mutagenesis data on the severe acute respiratory syndrome coronavirus 2 spike protein in complex with the human angiotensin-converting enzyme 2. We showed that, even though most of the tested methods reach a significant degree of robustness and accuracy, they suffer from limited generalizability properties and struggle to predict unseen mutations. Interestingly, the generalizability problems are more severe for pure machine learning approaches, while physics-based methods are less affected by this issue. Moreover, undesirable prediction biases toward specific mutation properties, the most marked being toward destabilizing mutations, are also observed and should be carefully considered by method developers. We conclude from our analyses that there is room for improvement in the prediction models and suggest ways to check, assess and improve their generalizability and robustness.

Keywords: machine learning; prediction biases; protein complex structure; protein–protein binding affinity; protein–protein interactions; symmetry principle.

PubMed Disclaimer

Figures

**Figure 1**
Characteristics of the S dataset. (A) Number of occurrences of mutation types; (B) Distribution of the experimental values (in kcal/mol).

formula image — **Figure 1**
Characteristics of the S dataset. (A) Number of occurrences of mutation types; (B) Distribution of the experimental values (in kcal/mol).

**Figure 2**
Pearson correlations between experimental and predicted values on direct (in blue) and reverse (in orange) mutations of S (left) and C (right).

**Figure 3**
Predicted values as a function of experimental values (in kcal/mol) for the datasets S-D (blue dots) and S-R (orange dots). Predictions are obtained with mCSM-PPI2, MutaBind2, BeAtMuSiC, SSIPe, SAAMBE-3D, NetTree, BindProfX and FoldX.

**Figure 4**
Relation between the covering ratio and the Pearson correlation between predicted and experimental values on the S-D set for six benchmarked predictors. The linear regression line (dashed) and coefficient of determination () are indicated.

**Figure 5**
Distribution of the shift (in kcal/mol) for the eight benchmarked predictors calculated for mutations from C. The vertical blue dashed lines indicate and the vertical red dashed lines, the value of .

**Figure 6**
Normalized RMSE () of the eight predictors on subsets of S-D. Subsets were defined based on (a) mutation type: mutation toward Ala (A) versus other mutations (nA); (b) mutation location: mutations at the interface (I) versus other mutations (nI). (c) complex type: mutation on dimeric complexes (D) versus mutations on multi-n-meric complexes () (nD).

See this image and copyright information in PMC

Cited by

AlphaFold2-Enabled Atomistic Modeling of Structure, Conformational Ensembles, and Binding Energetics of the SARS-CoV-2 Omicron BA.2.86 Spike Protein with ACE2 Host Receptor and Antibodies: Compensatory Functional Effects of Binding Hotspots in Modulating Mechanisms of Receptor Binding and Immune Escape.
Raisinghani N, Alshahrani M, Gupta G, Xiao S, Tao P, Verkhivker G. Raisinghani N, et al. J Chem Inf Model. 2024 Mar 11;64(5):1657-1681. doi: 10.1021/acs.jcim.3c01857. Epub 2024 Feb 19. J Chem Inf Model. 2024. PMID: 38373700 Free PMC article.
Graph masked self-distillation learning for prediction of mutation impact on protein-protein interactions.
Zhang Y, Dong M, Deng J, Wu J, Zhao Q, Gao X, Xiong D. Zhang Y, et al. Commun Biol. 2024 Oct 26;7(1):1400. doi: 10.1038/s42003-024-07066-9. Commun Biol. 2024. PMID: 39462102 Free PMC article.
Deciphering GB1's Single Mutational Landscape: Insights from MuMi Analysis.
Guclu TF, Atilgan AR, Atilgan C. Guclu TF, et al. J Phys Chem B. 2024 Aug 22;128(33):7987-7996. doi: 10.1021/acs.jpcb.4c04916. Epub 2024 Aug 8. J Phys Chem B. 2024. PMID: 39115184 Free PMC article.
MPA-MutPred: a novel strategy for accurately predicting the binding affinity change upon mutation in membrane protein complexes.
Ridha F, Gromiha MM. Ridha F, et al. Brief Bioinform. 2024 Sep 23;25(6):bbae598. doi: 10.1093/bib/bbae598. Brief Bioinform. 2024. PMID: 39550225 Free PMC article.
Guidelines for releasing a variant effect predictor.
Livesey BJ, Badonyi M, Dias M, Frazer J, Kumar S, Lindorff-Larsen K, McCandlish DM, Orenbuch R, Shearer CA, Muffley L, Foreman J, Glazer AM, Lehner B, Marks DS, Roth FP, Rubin AF, Starita LM, Marsh JA. Livesey BJ, et al. ArXiv [Preprint]. 2024 Apr 16:arXiv:2404.10807v1. ArXiv. 2024. Update in: Genome Biol. 2025 Apr 15;26(1):97. doi: 10.1186/s13059-025-03572-z. PMID: 38699161 Free PMC article. Updated. Preprint.

See all "Cited by" articles

References

1. Sahni N, Yi S, Taipale M, et al. . Widespread macromolecular interaction perturbations in human genetic disorders. Cell 2015;161(3):647–60. - PMC - PubMed
1. Cheng F, Zhao J, Wang Y, et al. . Comprehensive characterization of protein–protein interactions perturbed by disease mutations. Nat Genet 2021;53(3):342–53. - PMC - PubMed
1. Yadav A, Vidal M, Luck K. Precision medicine–networks to the rescue. Curr Opin Biotechnol 2020;63:177–89. - PMC - PubMed
1. Cui H, Zhao N, Korkin D. Multilayer view of pathogenic SNVs in human interactome through in silico edgetic profiling. J Mol Biol 2018;430(18):2974–92. - PubMed
1. Nevola L, Giralt E. Modulating protein–protein interactions: the potential of peptides. Chem Commun 2015;51(16):3302–15. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

[1] Sahni N, Yi S, Taipale M, et al. . Widespread macromolecular interaction perturbations in human genetic disorders. Cell 2015;161(3):647–60. - PMC - PubMed

[2] Sahni N, Yi S, Taipale M, et al. . Widespread macromolecular interaction perturbations in human genetic disorders. Cell 2015;161(3):647–60. - PMC - PubMed

[3] Cheng F, Zhao J, Wang Y, et al. . Comprehensive characterization of protein–protein interactions perturbed by disease mutations. Nat Genet 2021;53(3):342–53. - PMC - PubMed

[4] Cheng F, Zhao J, Wang Y, et al. . Comprehensive characterization of protein–protein interactions perturbed by disease mutations. Nat Genet 2021;53(3):342–53. - PMC - PubMed

[5] Yadav A, Vidal M, Luck K. Precision medicine–networks to the rescue. Curr Opin Biotechnol 2020;63:177–89. - PMC - PubMed

[6] Yadav A, Vidal M, Luck K. Precision medicine–networks to the rescue. Curr Opin Biotechnol 2020;63:177–89. - PMC - PubMed

[7] Cui H, Zhao N, Korkin D. Multilayer view of pathogenic SNVs in human interactome through in silico edgetic profiling. J Mol Biol 2018;430(18):2974–92. - PubMed

[8] Cui H, Zhao N, Korkin D. Multilayer view of pathogenic SNVs in human interactome through in silico edgetic profiling. J Mol Biol 2018;430(18):2974–92. - PubMed

[9] Nevola L, Giralt E. Modulating protein–protein interactions: the potential of peptides. Chem Commun 2015;51(16):3302–15. - PubMed

[10] Nevola L, Giralt E. Modulating protein–protein interactions: the potential of peptides. Chem Commun 2015;51(16):3302–15. - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Quantification of biases in predictions of protein-protein binding affinity changes upon mutations

Affiliations

Quantification of biases in predictions of protein-protein binding affinity changes upon mutations

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Miscellaneous

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Miscellaneous