Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Feb;87(2):110-119.
doi: 10.1002/prot.25630. Epub 2018 Dec 3.

iSEE: Interface structure, evolution, and energy-based machine learning predictor of binding affinity changes upon mutations

Affiliations

iSEE: Interface structure, evolution, and energy-based machine learning predictor of binding affinity changes upon mutations

Cunliang Geng et al. Proteins. 2019 Feb.

Abstract

Quantitative evaluation of binding affinity changes upon mutations is crucial for protein engineering and drug design. Machine learning-based methods are gaining increasing momentum in this field. Due to the limited number of experimental data, using a small number of sensitive predictive features is vital to the generalization and robustness of such machine learning methods. Here we introduce a fast and reliable predictor of binding affinity changes upon single point mutation, based on a random forest approach. Our method, iSEE, uses a limited number of interface Structure, Evolution, and Energy-based features for the prediction. iSEE achieves, using only 31 features, a high prediction performance with a Pearson correlation coefficient (PCC) of 0.80 and a root mean square error of 1.41 kcal/mol on a diverse training dataset consisting of 1102 mutations in 57 protein-protein complexes. It competes with existing state-of-the-art methods on two blind test datasets. Predictions for a new dataset of 487 mutations in 56 protein complexes from the recently published SKEMPI 2.0 database reveals that none of the current methods perform well (PCC < 0.42), although their combination does improve the predictions. Feature analysis for iSEE underlines the significance of evolutionary conservations for quantitative prediction of mutation effects. As an application example, we perform a full mutation scanning of the interface residues in the MDM2-p53 complex.

Keywords: binding affinity; full mutation scanning; machine learning; protein-protein interactions; single point mutation.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The workflow of iSEE predictor. Only the 3D structure of wildtype complex and the mutation information are necessary input for iSEE. We first model the mutated structure using HADDOCK (the water refinement web service). Then we extract features related to the evolutionary conservation and to changes in structure and energetics caused by the mutation. A random forest algorithm is then optimized and cross validated on a training dataset, resulting in our final ΔΔG predictor iSEE. Finally, iSEE is evaluated on two blind test datasets and compared with other current leading ΔΔG predictors
Figure 2
Figure 2
Correlations between predicted and experimental ΔΔG values for the training dataset consisting of 1102 single point mutations from the SKEMPI14/DACUM28 database. Ten times 10‐fold cross‐validation (CV) was applied during training, and the average of the CV predicted ΔΔG values are shown here for all mutations (A) and mutations classified as loop or non‐loop (B), type of mutated amino acid (C), and change in amino acid size (D). The diagonal indicates an ideal prediction. PCC is the Pearson's correlation coefficient and RMSE represents root mean squared error
Figure 3
Figure 3
Predicted versus experimental ΔΔG for various ΔΔG predictors tested on a subset of the Benedix et al dataset8 consisting of 19 mutations for one complex, non‐overlapping with our training set. This subset was not used in any of the predictors, except for CC/PBSA. PCC is the Pearson's correlation coefficient, P is two tailed P value of PCC, and RMSE represents root mean squared error
Figure 4
Figure 4
Correlations between predicted and experimental ΔΔG for various ΔΔG predictors tested on 487 mutations of SKEMPI 2.0. PCC is the Pearson's correlation coefficient, P is two tailed P value of PCC, and RMSE represents root mean squared error
Figure 5
Figure 5
iSEE feature importance analysis. The importance value is measured as the decrease of mean squared prediction error when splitting on a given feature, averaged over all trees. The higher its value, the more important is the corresponding feature. The PSSM profile scores for the 20 amino acids are presented as a group in “PSSM_AA”
Figure 6
Figure 6
Full computational mutation scanning of the MDM2–p53 interface using iSEE. A, Heat map of ΔΔG values for the mutation of each residue in the MDM2–p53 interface to all other amino acid types. The sites with at least one experimental mutation are indicated in bold. Mutations from one amino acid to the same amino acid were assigned a value of zero. The right panel shows the distribution of ΔΔG values for each site with the vertical solid line and dashed lines showing the average and standard deviations of all predicted ΔΔG values, respectively. Three residues have their median above the average + one standard deviations showing more sensitivity to mutations. Two of those are experimentally validated hot‐spots (W23 and F19). B, the three predicted key binding sites are represented in sticks and all 38 interface sites in ball in the 3D structure of MDM2–p53 complex (PDB ID: 1YCR). MDM2 is represented in cartoon and surface and p53 in cartoon. Each interface site is colored by the median of full mutational predictions

Similar articles

Cited by

References

    1. Wang X, Wei X, Thijssen B, Das J, Lipkin SM, Yu H. Three‐dimensional reconstruction of protein networks provides insight into human genetic disease. Nat Biotechnol. 2012;30(2):159‐164. - PMC - PubMed
    1. Zhou M, Li Q, Wang R. Current experimental methods for characterizing protein–protein interactions. Chem Med Chem. 2016;11(8):738‐756. - PMC - PubMed
    1. Kastritis PL, Bonvin AMJJ. On the binding affinity of macromolecular interactions: daring to ask why proteins interact. J R Soc Interface. 2012;10(79):20120835‐20120835. - PMC - PubMed
    1. Steinbrecher T, Abel R, Clark A, Friesner R. Free energy perturbation calculations of the thermodynamics of protein side‐chain mutations. J Mol Biol. 2017;429(7):923‐929. - PubMed
    1. Perthold JW, Oostenbrink C. Simulation of reversible protein‐protein binding and calculation of binding free energies using perturbed distance restraints. J Chem Theory Comput. 2017;13(11):5697‐5708. - PMC - PubMed

Publication types