Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan 9;20(1):239-252.
doi: 10.1021/acs.jctc.3c01050. Epub 2023 Dec 26.

Tuning Potential Functions to Host-Guest Binding Data

Affiliations

Tuning Potential Functions to Host-Guest Binding Data

Jeffry Setiadi et al. J Chem Theory Comput. .

Abstract

Software to more rapidly and accurately predict protein-ligand binding affinities is of high interest for early-stage drug discovery, and physics-based methods are among the most widely used technologies for this purpose. The accuracy of these methods depends critically on the accuracy of the potential functions that they use. Potential functions are typically trained against a combination of quantum chemical and experimental data. However, although binding affinities are among the most important quantities to predict, experimental binding affinities have not to date been integrated into the experimental data set used to train potential functions. In recent years, the use of host-guest complexes as simple and tractable models of binding thermodynamics has gained popularity due to their small size and simplicity, relative to protein-ligand systems. Host-guest complexes can also avoid ambiguities that arise in protein-ligand systems such as uncertain protonation states. Thus, experimental host-guest binding data are an appealing additional data type to integrate into the experimental data set used to optimize potential functions. Here, we report the extension of the Open Force Field Evaluator framework to enable the systematic calculation of host-guest binding free energies and their gradients with respect to force field parameters, coupled with the curation of 126 host-guest complexes with available experimental binding free energies. As an initial application of this novel infrastructure, we optimized generalized Born (GB) cavity radii for the OBC2 GB implicit solvent model against experimental data for 36 host-guest systems. This refitting led to a dramatic improvement in accuracy for both the training set and a separate test set with 90 additional host-guest systems. The optimized radii also showed encouraging transferability from host-guest systems to 59 protein-ligand systems. However, the new radii are significantly smaller than the baseline radii and lead to excessively favorable hydration free energies (HFEs). Thus, users of the OBC2 GB model currently may choose between GB cavity radii that yield more accurate binding affinities and GB cavity radii that yield more accurate HFEs. We suspect that achieving good accuracy on both will require more far-reaching adjustments to the GB model. We note that binding free-energy calculations using the OBC2 model in OpenMM gain about a 10× speedup relative to corresponding explicit solvent calculations, suggesting a future role for implicit solvent absolute binding free-energy (ABFE) calculations in virtual compound screening. This study proves the principle of using host-guest systems to train potential functions that are transferrable to protein-ligand systems and provides an infrastructure that enables a range of applications.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Workflow of the ForceBalance-Evaluator framework for iterative optimization of force field parameters, θ, against physical properties. The baseline OpenFF-Evaluator can estimate and optimize force fields to experimental liquid-state properties and HFEs. ForceBalance can also interface with quantum mechanical software for the inclusion of quantum chemical reference data in the optimization (omitted from the diagram for simplicity). We have extended OpenFF-Evaluator to include the host–guest binding affinities, ΔGb, part of the workflow. The host–guest system definitions and experimental binding data are stored in Taproom, and our pAPRika binding free energy tool computes ΔGb and the gradient with respect to a set of force field parameters θi (i.e. ΔGbθi). OpenFF-Evaluator sends these quantities to ForceBalance, which returns updated parameters θi+1 for a new iteration. In the diagram above, we only show OpenFF-Evaluator passing ΔGb and ΔGbθi to ForceBalance, though this is generalizable to the other properties as well.
Figure 2:
Figure 2:
Hosts and guests used in the training set. The Taproom names and experimental values are summarized in Table S1.
Figure 3:
Figure 3:
Calculated versus experimental absolute binding free energies of the host–guest training set. A: Calculations with the original mBondi2 radii. B: Calculations with the HG-optimized radii. The dark and light gray shaded areas represent 1 kcal/mol and 2 kcal/mol deviations from the unity line, respectively. Values in square brackets are 95% confidence intervals from bootstrapping over the whole data set. The full data and error statistics are available in Tables S1 and S4.
Figure 4:
Figure 4:
Computed versus experimental binding free energies of the host–guest test set. A: Calculations with the original mBondi2 radii. B: Calculations with the HG-optimized radii. The dark and light gray shaded areas represent 1 kcal/mol and 2 kcal/mol deviations from the unity line, respectively. Values in square brackets are 95% confidence intervals from bootstrapping over the whole data set. The full data and error statistics are available in Tables S3 and S5.
Figure 5:
Figure 5:
Protein–ligand absolute binding free energy benchmark with the ff14SB protein force field. Top row: Sage small molecule force field with the (A) mBondi2 and (B) HG-optimized radii set. Bottom row: GAFF2 small molecule force field with the (C) mBondi2 and (D) HG-optimized radii set. The statistics on the top left of each graph are for all four proteins. The dark and light gray shaded areas represent 1 kcal/mol and 2 kcal/mol deviations from the unity line, respectively. Values in square brackets give the 95% confidence intervals from bootstrapping over the whole data set. The full data are available in Tables S6 and S7, and the error statistics are summarized in Table S10.
Figure 6:
Figure 6:
RMSD of the four proteins (PWWP1, HSP90, MCL-1, and Cyclophilin D) over 100 ns of unrestrained MD simulation without any ligand bound. The proteins are simulated in TIP3P (blue), and OBC2 implicit solvent with the mBondi2 (orange) and HG-optimized (green) radii sets.
Figure 7:
Figure 7:
Sample structures of each of the four proteins from our simulations with TIP3P water (cyan) and OBC2 implicit solvent with our HG-optimized radii (yellow). Each structure was aligned to the respective protein’s initial structure from Alibay et al. These 100 ns simulations were run without bound ligands, and the sample structure in each case is that last frame. We added the ligand molecule from the reference Alibay conformation as a reference to show the binding pocket.
Figure 8:
Figure 8:
Small molecule hydration free energy benchmark with the Sage force field. A: Calculations with the original mBondi2 radii. B: Calculations with the HG-optimized radii. The dark and light gray shaded areas represent 1 kcal/mol and 2 kcal/mol deviations from the unity line, respectively. Values in square brackets give the 95% confidence intervals from bootstrapping over the whole data set. The full data and error statistics are summarized in Tables S12 and S13.

Similar articles

Cited by

References

    1. Gilson MK; Given JA; Bush BL; McCammon JA The Statistical-Thermodynamic Basis for Computation of Binding Affinities: A Critical Review. Biophysical Journal 1997, 72, 1047–1069. - PMC - PubMed
    1. Boresch S; Tettinger F; Leitgeb M; Karplus M Absolute Binding Free Energies: A Quantitative Approach for their Calculation. Journal of Physical Chemistry B 2003, 107, 9535–9551.
    1. Woo HJ; Roux B Calculation of Absolute Protein–Ligand Binding Free Energy from Computer Simulations. Proceedings of the National Academy of Sciences 2005, 102, 6825–6830. - PMC - PubMed
    1. Heinzelmann G; Gilson MK Automation of Absolute Protein–Ligand Binding Free Energy Calculations for Docking Refinement and Compound Evaluation. Scientific Reports 2021, 11, 1–18. - PMC - PubMed
    1. Gapsys V; Yildirim A; Aldeghi M; Khalak Y; van der Spoel D; de Groot BL Accurate Absolute Free Energies for Ligand–Protein Binding Based on Non-equilibrium Approaches. Communications Chemistry 2021, 4, 1–13. - PMC - PubMed