Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 May;34(5):601-633.
doi: 10.1007/s10822-020-00290-5. Epub 2020 Jan 27.

The SAMPL6 SAMPLing challenge: assessing the reliability and efficiency of binding free energy calculations

Affiliations

The SAMPL6 SAMPLing challenge: assessing the reliability and efficiency of binding free energy calculations

Andrea Rizzi et al. J Comput Aided Mol Des. 2020 May.

Abstract

Approaches for computing small molecule binding free energies based on molecular simulations are now regularly being employed by academic and industry practitioners to study receptor-ligand systems and prioritize the synthesis of small molecules for ligand design. Given the variety of methods and implementations available, it is natural to ask how the convergence rates and final predictions of these methods compare. In this study, we describe the concept and results for the SAMPL6 SAMPLing challenge, the first challenge from the SAMPL series focusing on the assessment of convergence properties and reproducibility of binding free energy methodologies. We provided parameter files, partial charges, and multiple initial geometries for two octa-acid (OA) and one cucurbit[8]uril (CB8) host-guest systems. Participants submitted binding free energy predictions as a function of the number of force and energy evaluations for seven different alchemical and physical-pathway (i.e., potential of mean force and weighted ensemble of trajectories) methodologies implemented with the GROMACS, AMBER, NAMD, or OpenMM simulation engines. To rank the methods, we developed an efficiency statistic based on bias and variance of the free energy estimates. For the two small OA binders, the free energy estimates computed with alchemical and potential of mean force approaches show relatively similar variance and bias as a function of the number of energy/force evaluations, with the attach-pull-release (APR), GROMACS expanded ensemble, and NAMD double decoupling submissions obtaining the greatest efficiency. The differences between the methods increase when analyzing the CB8-quinine system, where both the guest size and correlation times for system dynamics are greater. For this system, nonequilibrium switching (GROMACS/NS-DS/SB) obtained the overall highest efficiency. Surprisingly, the results suggest that specifying force field parameters and partial charges is insufficient to generally ensure reproducibility, and we observe differences between seemingly converged predictions ranging approximately from 0.3 to 1.0 kcal/mol, even with almost identical simulations parameters and system setup (e.g., Lennard-Jones cutoff, ionic composition). Further work will be required to completely identify the exact source of these discrepancies. Among the conclusions emerging from the data, we found that Hamiltonian replica exchange-while displaying very small variance-can be affected by a slowly-decaying bias that depends on the initial population of the replicas, that bidirectional estimators are significantly more efficient than unidirectional estimators for nonequilibrium free energy calculations for systems considered, and that the Berendsen barostat introduces non-negligible artifacts in expanded ensemble simulations.

Keywords: Binding affinity; Cucurbit[8]uril; Free energy calculations; Host–guest; Octa-acid; SAMPL6; Sampling.

PubMed Disclaimer

Conflict of interest statement

Disclosures

JDC was a member of the Scientific Advisory Board for Schrödinger, LLC during part of this study. JDC and DLM are current members of the Scientific Advisory Board of OpenEye Scientific Software. The Chodera laboratory receives or has received funding from multiple sources, including the National Institutes of Health, the National Science Foundation, the Parker Institute for Cancer Immunotherapy, Relay Therapeutics, Entasis Therapeutics, Silicon Therapeutics, EMD Serono (Merck KGaA), AstraZeneca, Vir Biotechnology, XtalPi, the Molecular Sciences Software Institute, the Starr Cancer Consortium, the Open Force Field Consortium, Cycle for Survival, a Louis V. Gerstner Young Investigator Award, and the Sloan Kettering Institute. A complete funding history for the Chodera lab can be found at http://choderalab.org/funding. MKG has an equity interest in, and is a cofounder and scientific advisor of VeraChem LLC.

Figures

Figure 1.
Figure 1.. Challenge overview and initial conformations of the host-guest systems featured in the SAMPLing challenge.
The three-dimensional structures of the two hosts (i.e. CB8 and OA) are shown with carbon atoms represented in black, oxygens in red, nitrogens in blue, and hydrogens in white. Both the two-dimensional chemical structures of the guest molecules and the three-dimensional structures of the hosts entering the SAMPLing challenge are shown in the protonation state used for the molecular simulations. We generated five different initial conformations for each of the three host-guest pairs through docking, followed by a short equilibration with Langevin dynamics. The three-dimensional structure overlays of the five conformations for CB8-G3, OA-G3, and OA-G6 are shown from left to right in the figure with the guests’ carbon atoms colored by conformation. Participants used the resulting input files to run their methods in five replicates and submitted the free energy trajectories as a function of the computational cost. We analyzed the submissions in terms of uncertainty of the mean binding free energy ΔG¯ estimate and its bias with respect to the asymptotic free energy ΔGθ.
Figure 2.
Figure 2.. Mean free energy, standard deviation, and bias as a function of computational cost.
The trajectories and shaded areas in the top row represent the mean binding free energies and 95% t-based confidence intervals computed from the 5 replicate predictions for CB8-G3 (left column), OA-G3 (center), and OA-G6 (right) for all submissions, excluding OpenMM/REVO. The same plot including OpenMM/REVO can be found in SI Figure 7. The second and third rows show the standard deviation and bias, respectively, as a function of the computational effort. Given the differences in the simulation parameters between different methods, the finite-time bias is estimated assuming the theoretical binding free energy of the calculation to be the final value of its mean free energy. This means that the bias eventually goes to zero, but also that the bias can be underestimated if the simulation is not converged.
Figure 3.
Figure 3.. Comparison of bidirectional and unidirectional free energy estimators of the same nonequilibrium work switching data.
Average free energy estimates obtained by different estimators from the same nonequilibrium work data collected for CB8-G3 (left), OA-G3 (center), and OA-G6 (right) as a function of the number of energy/force evaluations. The average and the 95% t-based confidence interval (shaded areas) are computed from the 5 replicate calculations. BAR and BAR-long correspond to the GROMACS/NS-DS/SB and GROMACS/NS-DS/SB-long submissions in Figure 2, and utilize the bidirectional Bennett acceptance ratio estimator based on the Crooks fluctuation theorem [101]. Jarzynski-Forward/Reverse are the free energy estimates computed through unidirectional estimators derived from the Jarzynski equality using only the nonequilibrium work values accumulated in the forward/reverse direction respectively. The Gaussian-Forward/Reverse trajectories are based on the Crooks fluctuation theorem and the assumption of normality of the forward/reverse nonequilibrium work distribution, as described in [102]. Unidirectional estimators can introduce significant instabilities and bias in the estimates.
Figure 4.
Figure 4.. OA-G3 volume distribution, restraint radius distributions, and binding free energy dependency on the binding site definition.
Box volume empirical distributions obtained by NPT simulations using the Monte Carlo barostat implemented in OpenMM (right) and the Berendsen barostat implemented in GROMACS (left) at 298 K. The continuous blue (ρMD(V|1atm)) and orange (ρMD(V|100atm)) lines represent Gaussian kernel density estimates of volume distributions sampled with simple molecular dynamics at a constant pressure of 1 atm and 100 atm respectively. The green distribution is obtained by reweighting ρMD(V|1 atm) to 100 atm. The red densities ρMD(V|1 atm)) represent the volume distribution sampled in the bound state by the enhanced sampling algorithm (i.e., expanded ensemble for the Berendsen barostat and HREX for the Monte Carlo barostat). The expected distribution is predicted correctly only from the volumes sampled using the Monte Carlo barostat, while the Berendsen barostat samples distributions of similar mean but much smaller fluctuations. Moreover, the expanded ensemble algorithm introduce artifacts in the volumes sampled by the Berendsen barostat.
Figure 5.
Figure 5.. Initiating the HREX calculation from a single conformation introduces significant bias that slowly relaxes as the system reaches equilibrium.
Mean (top row) and standard deviation (bottom row) of the five replicate free energy trajectories as a function of the simulation length computed after discarding an increasing number of initial iterations going from 1000 (purple) to 24000 (light green) for the three host-guest systems. The trajectories are plotted starting from the last discarded iteration. The initial bias is consistently negative, and it decays faster in OA-G3/G6 than in CB8-G3, in which correlation times are longer. Ignoring the beginning of the trajectory removes the bias.
Figure 6.
Figure 6.. Bound water molecules induce metastability in HREX replicas with CB8-G3.
(A) Histograms of the number of bound water by thermodynamic state. The color maps the progression of the alchemical protocol from the bound state (purple) to the discharged state (blue), where all the charges are turned off but Lennard-Jones interactions are still active, and decoupled state (yellow). The number of bound waters has a peaked distribution around 0–2 for most of the alchemical protocol, and it rapidly shifts to the right in the near-decoupled state. (B) Superposition of the trajectories of the number of bound waters and the state index for replica 1 and 5 of the OpenMM/HREX calculation for CB8-G3–0 (top) and autocorrelation function computed from the time series of the number of bound waters (dark colors) and replica state indices (light colors) for CB8-G3–0 (blue), OA-G3–0 (green), and OA-G6–0 (red) (bottom). Each autocorrelation function was computed as the average of the correlation functions estimated for each replica trajectory [39, 111]. Replicas remain stuck in the near-decoupled states for several nanoseconds. CB8-G3 exhibits much longer correlation times for both time series than the two OA systems.

References

    1. Shirts MR, Mobley DL, Brown SP. Free-energy calculations in structure-based drug design. Drug design: structure- and ligand-based approaches. 2010; p. 61–86.
    1. Kuhn B, Tichý M, Wang L, Robinson S, Martin RE, Kuglstatter A, Benz J. Prospective evaluation of free energy calculations for the prioritization of cathepsin L inhibitors. Journal of medicinal chemistry. 2017; 60(6):2485–2497. - PubMed
    1. Ciordia M, Pérez-Benito L, Delgado F, Trabanco AA, Tresadern G. Application of free energy perturbation for the design of BACE1 inhibitors. Journal of Chemical information and modeling. 2016; 56(9):1856–1871. - PubMed
    1. Schindler C, Rippmann F, Kuhn D. Relative binding affinity prediction of farnesoid X receptor in the D3R Grand Challenge 2 using FEP+. Journal of computer-aided molecular design. 2018; 32(1):265–272. - PubMed
    1. Wang L, Wu Y, Deng Y, Kim B, Pierce L, Krilov G, Lupyan D, Robinson S, Dahlgren MK, Greenwood J, et al. Accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern free-energy calculation protocol and force field. Journal of the American Chemical Society. 2015; 137(7):2695–2703. - PubMed

Publication types