Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Apr 18:7:46622.
doi: 10.1038/srep46622.

Predicting Protein-protein Association Rates using Coarse-grained Simulation and Machine Learning

Affiliations

Predicting Protein-protein Association Rates using Coarse-grained Simulation and Machine Learning

Zhong-Ru Xie et al. Sci Rep. .

Abstract

Protein-protein interactions dominate all major biological processes in living cells. We have developed a new Monte Carlo-based simulation algorithm to study the kinetic process of protein association. We tested our method on a previously used large benchmark set of 49 protein complexes. The predicted rate was overestimated in the benchmark test compared to the experimental results for a group of protein complexes. We hypothesized that this resulted from molecular flexibility at the interface regions of the interacting proteins. After applying a machine learning algorithm with input variables that accounted for both the conformational flexibility and the energetic factor of binding, we successfully identified most of the protein complexes with overestimated association rates and improved our final prediction by using a cross-validation test. This method was then applied to a new independent test set and resulted in a similar prediction accuracy to that obtained using the training set. It has been thought that diffusion-limited protein association is dominated by long-range interactions. Our results provide strong evidence that the conformational flexibility also plays an important role in regulating protein association. Our studies provide new insights into the mechanism of protein association and offer a computationally efficient tool for predicting its rate.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interests.

Figures

Figure 1
Figure 1. The association of the proteins barnase and barstar was first used as a test system.
The complex was separated into two monomers and randomly placed in a 10 × 10 × 10 nm cubic simulation box. In total, 104 simulation trajectories with a maximal duration of 1000 ns were generated, and each trajectory was terminated upon the formation of an encounter complex. Three representative trajectories are plotted to illustrate how the distance between the centers of mass for the two monomers (a) and the RMSD from the native complex (b) changed with the simulation time.
Figure 2
Figure 2
(a) Effect of changing the maximal duration of each simulation trajectory on the success rate (ρ). Simulations were performed in a 10 × 10 × 10 nm cubic box. (b) Effect of changing the size of the simulation box on the success rate. The maximal duration of the simulation time for each trajectory was 1000 ns.
Figure 3
Figure 3. Testing of the effect of the ionic strength on the association of the barnase/barstar complex by changing the Coulomb Debye length in the simulations.
The derived kon values (striped bars with standard deviations) are plotted against different values of ionic strength. Experimental measurements under different values of ionic strength are shown as gray bars. To calculate the standard deviations, 104 KMC simulation trajectories were generated for each value of the specific ionic strength. We randomly divided these trajectories into 10 groups, each containing 103 trajectories. We estimated kon from the 103 trajectories of each group and derived 10 individual kon values. The standard deviation was calculated from the group of kon values.
Figure 4
Figure 4. Testing the effect of mutations on the protein association rate (kon).
The test set consisted of the wild type of the barnase/barstar protein complex and 11 mutants, in which the indicated residue in barnase (single mutants or the first indicated residue in the double mutants) or barstar (second indicated residue in the double mutants) was mutated to alanine; the mutants are shown below the figure. The experimental measurements are shown as gray bars, and the calculated values as striped bars (with standard deviations).
Figure 5
Figure 5. Testing of the KMC simulations on a large benchmark set of 47 protein complexes by comparing the calculated and observed log10 kon values (white circles), giving a Pearson’s correlation coefficient of 0.66.
However, the calculated association rates for a large percentage of the protein complexes were significantly overestimated, so a machine learning algorithm was used to recognize these overestimated cases and correct the corresponding kon values by an adjustment factor. After applying a leave-one-out cross-validation test, the Pearson’s correlation coefficient between the log10 values for the adjusted kon values and their experimental values (black circles) was 0.79. The dashed red line is from linear regression fit between simulated and observed log10 kon values, with a slope of 0.52 and intercept of 3.39. The solid red line is from linear regression fit between adjusted and observed log10 kon values, with a slope of 0.8 and intercept of 1.32.
Figure 6
Figure 6. Application of the computational framework to an independent test set.
(a) The calculated logarithmic values of the kon from KMC simulations show a high correlation with the experimental data, and the Pearson’s correlation coefficient is 0.8. The red line is from linear regression fit between simulated and observed log10kon values, with a slope of 0.68 and intercept of 2.18. (b) The machine learning process was implemented to identify potential overestimation in simulations and adjust the calculated kon values, giving a Pearson’s correlation coefficient of 0.85. The red line is from linear regression fit between adjusted and observed log10kon values, with a slope of 1.25 and intercept of −1.98.
Figure 7
Figure 7
(a) Representation of our coarse-grained model. Each residue is represented by two sites, C and S. The positions of the Cα atoms (C) show the pseudo-backbone of the protein (green). The side chain of each residue is simplified as a representative center (S) (cyan) selected based on the specific properties of a particular amino acid. (b,c) A KMC simulation trajectory is initiated starting from a conformation in which a pair of proteins is randomly placed in a 3D cubic box (b), and the simulation is terminated if an encounter complex is formed between these two molecules (c).
Figure 8
Figure 8
(a) Flowchart of the overall prediction framework, in which multiple trajectories of the KMC simulation are used to calculate kon. In parallel, three indicators are calculated based on the structural and energetic features at the binding interface of the query protein complex. These indicators are input into a trained “complex decision tree” to identify potential overestimation, and then the kon calculated from the KMC simulations is adjusted based on the machine learning output. (b) Procedures involved in the KMC simulation. The detailed simulation algorithm is described in the Methods.

Similar articles

Cited by

References

    1. Plewczynski D. & Ginalski K. The interactome: predicting the protein-protein interactions in cells. Cell Mol Biol Lett 14, 1–22, doi: 10.2478/s11658-008-0024-7 (2009). - DOI - PMC - PubMed
    1. Janin J. & Chothia C. The structure of protein-protein recognition sites. J Biol Chem 265, 16027–16030 (1990). - PubMed
    1. Xenarios I. et al.. DIP: the database of interacting proteins. Nucleic Acids Res 28, 289–291 (2000). - PMC - PubMed
    1. Zhou H. X. & Bates P. A. Modeling protein association mechanisms and kinetics. Curr Opin Struct Biol 23, 887–893, doi: 10.1016/j.sbi.2013.06.014 (2013). - DOI - PMC - PubMed
    1. Schreiber G., Haran G. & Zhou H. X. Fundamental aspects of protein-protein association kinetics. Chem Rev 109, 839–860, doi: 10.1021/cr800373w (2009). - DOI - PMC - PubMed

Publication types

Substances

LinkOut - more resources