. 2023 Sep 20;24(5):bbad310.

doi: 10.1093/bib/bbad310.

MpbPPI: a multi-task pre-training-based equivariant approach for the prediction of the effect of amino acid mutations on protein-protein interactions

Yang Yue¹, Shu Li², Lingling Wang², Huanxiang Liu², Henry H Y Tong², Shan He³

Affiliations

¹ School of Computer Science from the University of Birmingham, UK.
² Centre for Artificial Intelligence Driven Drug Discovery at Macao Polytechnic University.
³ School of Computer Science, the University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK.

PMID: 37651610
PMCID: PMC10516393
DOI: 10.1093/bib/bbad310

MpbPPI: a multi-task pre-training-based equivariant approach for the prediction of the effect of amino acid mutations on protein-protein interactions

Yang Yue et al. Brief Bioinform. 2023.

. 2023 Sep 20;24(5):bbad310.

doi: 10.1093/bib/bbad310.

Authors

Yang Yue¹, Shu Li², Lingling Wang², Huanxiang Liu², Henry H Y Tong², Shan He³

Affiliations

¹ School of Computer Science from the University of Birmingham, UK.
² Centre for Artificial Intelligence Driven Drug Discovery at Macao Polytechnic University.
³ School of Computer Science, the University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK.

PMID: 37651610
PMCID: PMC10516393
DOI: 10.1093/bib/bbad310

Abstract

The accurate prediction of the effect of amino acid mutations for protein-protein interactions (PPI $\Delta \Delta G$) is a crucial task in protein engineering, as it provides insight into the relevant biological processes underpinning protein binding and provides a basis for further drug discovery. In this study, we propose MpbPPI, a novel multi-task pre-training-based geometric equivariance-preserving framework to predict PPI $\Delta \Delta G$. Pre-training on a strictly screened pre-training dataset is employed to address the scarcity of protein-protein complex structures annotated with PPI $\Delta \Delta G$ values. MpbPPI employs a multi-task pre-training technique, forcing the framework to learn comprehensive backbone and side chain geometric regulations of protein-protein complexes at different scales. After pre-training, MpbPPI can generate high-quality representations capturing the effective geometric characteristics of labeled protein-protein complexes for downstream $\Delta \Delta G$ predictions. MpbPPI serves as a scalable framework supporting different sources of mutant-type (MT) protein-protein complexes for flexible application. Experimental results on four benchmark datasets demonstrate that MpbPPI is a state-of-the-art framework for PPI $\Delta \Delta G$ predictions. The data and source code are available at https://github.com/arantir123/MpbPPI.

Keywords: equivariant neural network; multi-task pre-training; protein binding affinity change prediction; protein engineering.

PubMed Disclaimer

Figures

**Figure 1**
The flowchart of the MpbPPI framework. For each pre-training and downstream sample point, MpbPPI generates the residue-level KNN and radius contact graphs, which contain different-scale residue backbone and side chain geometric information of the corresponding protein–protein complex structure (see Methods section for details). In the pre-training phase (A), the proposed GEE encoder learns the geometric regulations of protein–protein complexes through our defined four geometric property-related denoising/recovery tasks. After that, MpbPPI uses a GBT-based decoder to predict PPI for a WT–MT complex pair based on their encoded representations (B).

formula image — **Figure 1**
The flowchart of the MpbPPI framework. For each pre-training and downstream sample point, MpbPPI generates the residue-level KNN and radius contact graphs, which contain different-scale residue backbone and side chain geometric information of the corresponding protein–protein complex structure (see Methods section for details). In the pre-training phase (A), the proposed GEE encoder learns the geometric regulations of protein–protein complexes through our defined four geometric property-related denoising/recovery tasks. After that, MpbPPI uses a GBT-based decoder to predict PPI for a WT–MT complex pair based on their encoded representations (B).

**Figure 2**
Panel (A) illustrates the MpbPPI data flow in the pre-training phase. In this phase, the pre-training protein–protein complex (residue number: N) represented by the KNN (edge number: EK) and radius contact graphs (edge number: ER) is sent to a five-layer GEE encoder. Based on message propagation, the encoder outputs updated embeddings of every residue node in current complex, which will be sent to four multi-layer perceptions (MLPs) specific to different pre-training tasks simultaneously, for guiding the model optimization. The input/output dimensions of each intermediate layer are shown around this layer. For downstream prediction phase (B), WT and mutant PPI structures represented by the same type of contact graphs as above are sent to the trained GEE to produce separate residue node embedding sets, which are then sent to the GBT-based decoder to predict the final for current sample point (see Methods section). Panel (C) illustrates the basic message propagation scheme in each GEE layer, in which the similar operations will be performed to each (central) residue node in the protein–protein complex.

**Figure 3**
MpbPPI outperformed other involved methods for PPI prediction under the five-time WT protein–protein complex-based cross-validations. We reported the experimental results on each dataset based on the main evaluation metrics . For the machine learning-based methods, the results were expressed as mean ± SD, while for the empirical energy-based methods, the results were expressed as the mean value. MpbPPI (Backb+Sidec+SASA+AA) and MpbPPI (Backb+SASA+AA) were abbreviated as MpbPPI_BSSA and MpbPPI_BSA.

**Figure 4**
Comparison of mutant PPI structures from various mutant generation tools. An example of the structural differences between WT and mutant structures (PDB ID: 1AK4). The WT structure and mutant structures generated by FoldX, MODELLER and AlphaFold2 are shown in different colours for better identification. The mutant amino acid and its neighboring amino acids’ backbone and side chain are represented as sticks, with the mutant amino acid highlighted in surface style. The Cα RMSD values between the WT and mutant structures were 0 Å for FoldX, 0.3 Å for MODELLER and 3.8 Å for AlphaFold2.

See this image and copyright information in PMC

References

1. Wang M, Cang Z, Wei GW. A topology-based network tree for the prediction of protein–protein binding affinity changes following mutation. Nat Mach Intell 2020;2(2):116–23. - PMC - PubMed
1. Braun P, Gingras AC. History of protein–protein interactions: from egg-white to complex networks. Proteomics 2012;12(10):1478–98. - PubMed
1. Feng T, Chen F, Kang Y, et al. HawkRank: a new scoring function for protein–protein docking based on weighted energy terms. J Chem 2017;9:1–15. - PMC - PubMed
1. Porta-Pardo E, Garcia-Alonso L, Hrabe T, et al. A pan-cancer catalogue of cancer driver protein interaction interfaces. PLoS Comput Biol 2015;11(10):e1004518. - PMC - PubMed
1. Barouch DH, Whitney JB, Moldt B, et al. Therapeutic efficacy of potent neutralizing HIV-1-specific monoclonal antibodies in SHIV-infected rhesus monkeys. Nature 2013;503(7475):224–8. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

MpbPPI: a multi-task pre-training-based equivariant approach for the prediction of the effect of amino acid mutations on protein-protein interactions

Affiliations

MpbPPI: a multi-task pre-training-based equivariant approach for the prediction of the effect of amino acid mutations on protein-protein interactions

Authors

Affiliations

Abstract

Figures

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources