Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jan;35(1):1-35.
doi: 10.1007/s10822-020-00363-5. Epub 2021 Jan 4.

SAMPL7 Host-Guest Challenge Overview: assessing the reliability of polarizable and non-polarizable methods for binding free energy calculations

Affiliations

SAMPL7 Host-Guest Challenge Overview: assessing the reliability of polarizable and non-polarizable methods for binding free energy calculations

Martin Amezcua et al. J Comput Aided Mol Des. 2021 Jan.

Abstract

The SAMPL challenges focus on testing and driving progress of computational methods to help guide pharmaceutical drug discovery. However, assessment of methods for predicting binding affinities is often hampered by computational challenges such as conformational sampling, protonation state uncertainties, variation in test sets selected, and even lack of high quality experimental data. SAMPL blind challenges have thus frequently included a component focusing on host-guest binding, which removes some of these challenges while still focusing on molecular recognition. Here, we report on the results of the SAMPL7 blind prediction challenge for host-guest affinity prediction. In this study, we focused on three different host-guest categories-a familiar deep cavity cavitand series which has been featured in several prior challenges (where we examine binding of a series of guests to two hosts), a new series of cyclodextrin derivatives which are monofunctionalized around the rim to add amino acid-like functionality (where we examine binding of two guests to a series of hosts), and binding of a series of guests to a new acyclic TrimerTrip host which is related to previous cucurbituril hosts. Many predictions used methods based on molecular simulations, and overall success was mixed, though several methods stood out. As in SAMPL6, we find that one strategy for achieving reasonable accuracy here was to make empirical corrections to binding predictions based on previous data for host categories which have been studied well before, though this can be of limited value when new systems are included. Additionally, we found that alchemical free energy methods using the AMOEBA polarizable force field had considerable success for the two host categories in which they participated. The new TrimerTrip system was also found to introduce some sampling problems, because multiple conformations may be relevant to binding and interconvert only slowly. Overall, results in this challenge tentatively suggest that further investigation of polarizable force fields for these challenges may be warranted.

Keywords: Binding affinity; Blind challenge; Cucurbituril; Cyclodextrin; Free energy; Host–guest binding; OctaAcid; SAMPL.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. Structures of the TrimerTrip host and guest molecules for the SAMPL7 Host-Guest Blind Challenge.
The acyclic CB[n]-type receptor, TrimerTrip, is shown on the top. It is composed of a glycoluril trimer with aromatic triptycene sidewalls at both ends, and four sulfonate groups to increase its solubility. The host can take on a C-shape (though other conformers can be possible) and binds guests inside the cavity. The guests for the SAMPL7 challenge have the characteristics of typical CB[n] binders. The guests are named g1 through g19 (g4, g13, g14 were not included in the challenge).
Figure 2.
Figure 2.. Structures of the GDCC host and guest molecules for the SAMPL7 Host-Guest Blind Challenge.
(top left) OctaAcid, (top right) exo-OctaAcid; (bottom) guests. The difference between the hosts is the placement of the carboxylate groups near the cavity opening. While the carboxylates protrude outward away from the cavity in OA, in exoOA they are at the rim of the cavity opening. The guests for SAMPL7 are named g1 - g8. Four guests have a carboxylate group, and four a quaternary ammonium group. For the OA host, guests g1 - g6 have binding free energies which were previously reported and thus calculation of values was made optional for participants.
Figure 3.
Figure 3.. Structures of the cyclodextrin host derivatives and guests for the SAMPL7 Host-Guest Blind Challenge.
The cyclodextrin derivatives are a series of macrocycles composed of seven glucose subunits linked by 1,4 glycosidic bonds. The native β-cyclodextrin (bCD) contains the primary (2’OH) and secondary glucose subunit hydroxyls, while all of the cyclodextrin derivatives (MGLab#) differ by a substituent at either of these positions. MGLab8, MGLab9, MGLab19, MGLab23, MGLab24, and MGLab36 have substituents out from the top or primary face (wide opening), while MGLab34 and MGLab35 have the substituents out from the bottom or secondary face (narrow opening). The two guests are trans-4-methylcyclohexanol (g1) and cationic R-Rimantadine (g2).
Figure 4.
Figure 4.. βCD host structures.
Shown are two views of βCD. It and its derivatives are known to bind guests in two orientations, primary and secondary. The primary binding orientation is when an asymmetric guest’s polar head group projects out towards the glucose primary alcohols or the smaller opening (down). The secondary binding orientation is when a guest’s polar head group projects towards the secondary alcohol or the larger opening (up).
Figure 5.
Figure 5.. SAMPL7 submission breakdown.
The SAMPL7 challenge saw 7 TrimerTrip submissions, of which 3 were ranked (blue) and 4 were non-ranked (orange). There were 16 GDCC submissions, with 4 ranked (green) and 12 nonranked (red), and 7 CD submissions, with 3 ranked (purple) and 4 nonranked (brown).
Figure 6.
Figure 6.. TrimerTrip Error Metrics for Ranked Methods.
Shown is the distribution of performance for TrimerTrip submissions, ordered based on the median for each metric. The median is indicated by the white circle in the violin plots. The violin plots were generated by bootstrapping samples with replacement (including experimental uncertainties), and the plots describe the shape of the sampling distribution for each prediction. The black horizontal bar represents the first and third quartiles. From top to bottom the error metrics are RMSE, ME, R2, τ, and slope (m).
Figure 7.
Figure 7.. Correlation plots for TrimerTrip ranked submissions.
Shown are correlation plots comparing calculated versus experimental values for (Left to Right) AMOEBA/DDM/BAR, FSDAM/GAFF2/OPC3, and MD/DOCKING/GAFF/xtb-GNF ranked predictions for the TrimerTrip dataset. The R2 and slope for each ranked prediction were 0.50 and 1.25, 0.12 and 0.60, and 0.00 and −0.10 respectively.
Figure 8.
Figure 8.. RMSE and ME statistics by host-guest system for ranked methods.
Shown are free energy error statistics by host-guest system, across methods/participants. The ΔG root mean square error (RMSE) and mean signed error (ME) were computed via bootstrapping with replacement (including experimental uncertainties) for all host-guest systems (except optional systems OA-g1, OA-g2, OA-g3, OA-g4, OA-g5, OA-g6, bCD-g1, and bCD-g2) and includes all ranked methods submitted (except the AM1-BCC/MD/GAFF/TIP4PEW/QMMM method for the cyclodextrin dataset which is omitted from this analysis because errors were so large for that method). The black error bars represent the 95-percentile bootstrap confidence intervals. The host-guest datasets for the SAMPL7 challenge were TrimerTrip (blue), GDCC (separated into OA (yellow) and exo-OA (red) sub-datasets to analyze each host-guest system), and cyclodextrin derivatives (green)
Figure 9.
Figure 9.. GDCC Error Metrics for Ranked Methods.
Shown is accuracy of GDCC submissions, with the median value for each metric indicated by the white circle in the violin plots. The violin plots were generated by bootstrapping samples with replacement, and the plots describe the shape of the sampling distribution for each prediction. The black horizontal bar represents the first and third quartiles. From top to bottom the error metrics are RMSE, ME, R2, τ, and slope (m).
Figure 10.
Figure 10.. Correlation plots for GDCC (combined OA and exo-OA) and exo-OA ranked submissions.
Shown are correlation plots comparing calculated and experimental values for (Left to Right) AMOEBA/DDM/BAR, RESP/GAFF/MMPBSA-Cor, B2PLYPD3/SMD_QZ-R, and xtb-GNF/Machine Learning/CORINA MD ranked predictions for GDCC (top row) and exo-OA (bottom row). The AMOEBA/DDM/BAR approach performed particularly well by a variety of metrics, as did RESP/GAFF/MMPBSA-Cor. The former had the slope closest to 1 and its RMS error was among the lowest, whereas the latter performed better on error and correlation metrics but had a slope which was systematically incorrect. (See Table 3)
Figure 11.
Figure 11.. exo-OA Error Metrics for Ranked Methods.
Shown are exo-OA methods, with the median indicated by the white circle in the violin plots. The violin plots for RMSE, ME, R2, τ, and slope describe the shape of the sampling distribution after bootstrapping for each method. The black horizontal bar represents the first and third quartiles. From top to bottom the error metrics are RMSE, ME, R2, τ, and slope (m).
Figure 12.
Figure 12.. Cyclodextrin derivatives error metrics for ranked methods.
Shown are CD submissions ordered based on the median and is indicated by the white circle in the violin plots. The violin plots were generated by bootstrapping samples with replacement, and the plots describe the shape of the sampling distribution for each prediction. The black horizontal bar represents the first and third quartiles. From top to bottom the error metrics are RMSE, ME, R2, τ, and slope (m). AM1-BCC/GAFF/TIP4PEW/QMMM method was not included in these plots. In addition, the optional bCD-g1 and bCD-g2 host-guest systems are not included in this analysis.
Figure 13.
Figure 13.. Correlation plots for CD ranked submissions
Shown are correlation plots comparing calculated versus experimental values for (Left to Right) FSDAM/GAFF2/OPC3, Noneq/Alchemy/consensus, and AM1-BCC/MD/GAFF/TIP4PEW ranked predictions for the CD dataset. The R2 and slope for each ranked predictions were 0.04 and 0.17, 0.03 and 0.18, and 0.04 and 7.62 respectively. Note: the optional bCD-g1 and bCD-g2 host-guest systems were not included in the analysis.

References

    1. Goerigk L, Grimme S. Efficient and Accurate Double-Hybrid-Meta-GGA Density Functionals-Evaluation with the Extended GMTKN30 Database for General Main Group Thermochemistry, Kinetics, and Noncovalent Interactions. Journal of chemical theory and computation. 2011; doi: 10.1021/ct100466k. - DOI - PubMed
    1. Grimme S, Ehrlich S, Goerigk L. Effect of the Damping Function in Dispersion Corrected Density Functional Theory. J Comput Chem. 2011. May; 32(7):1456–1465. doi: 10.1002/jcc.21759. - DOI - PubMed
    1. Yin J, Henriksen NM, Slochower DR, Shirts MR, Chiu MW, Mobley DL, Gilson MK. Overview of the SAMPL5 Host–Guest Challenge: Are We Doing Better? J Comput Aided Mol Des. 2017. January; 31(1):1–19. doi: 10.1007/s10822-016-9974-4. - DOI - PMC - PubMed
    1. Rizzi A, Murkli S, McNeill JN, Yao W, Sullivan M, Gilson MK, Chiu MW, Isaacs L, Gibb BC, Mobley DL, Chodera JD. Overview of the SAMPL6 Host–Guest Binding Affinity Prediction Challenge. J Comput Aided Mol Des. 2018. October; 32(10):937–963. doi: 10.1007/s10822-018-0170-6. - DOI - PMC - PubMed
    1. Rizzi A, Jensen T, Slochower DR, Aldeghi M, Gapsys V, Ntekoumes D, Bosisio S, Papadourakis M, Henriksen NM, de Groot BL, Cournia Z, Dickson A, Michel J, Gilson MK, Shirts MR, Mobley DL, Chodera JD. The SAMPL6 SAMPLing Challenge: Assessing the Reliability and Efficiency of Binding Free Energy Calculations. J Comput Aided Mol Des. 2020. January; doi: 10.1007/s10822-020-00290-5. - DOI - PMC - PubMed

Publication types