Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Sep 26;51(9):2115-31.
doi: 10.1021/ci200269q. Epub 2011 Aug 29.

CSAR benchmark exercise of 2010: combined evaluation across all submitted scoring functions

Affiliations
Free PMC article

CSAR benchmark exercise of 2010: combined evaluation across all submitted scoring functions

Richard D Smith et al. J Chem Inf Model. .
Free PMC article

Abstract

As part of the Community Structure-Activity Resource (CSAR) center, a set of 343 high-quality, protein-ligand crystal structures were assembled with experimentally determined K(d) or K(i) information from the literature. We encouraged the community to score the crystallographic poses of the complexes by any method of their choice. The goal of the exercise was to (1) evaluate the current ability of the field to predict activity from structure and (2) investigate the properties of the complexes and methods that appear to hinder scoring. A total of 19 different methods were submitted with numerous parameter variations for a total of 64 sets of scores from 16 participating groups. Linear regression and nonparametric tests were used to correlate scores to the experimental values. Correlation to experiment for the various methods ranged R(2) = 0.58-0.12, Spearman ρ = 0.74-0.37, Kendall τ = 0.55-0.25, and median unsigned error = 1.00-1.68 pK(d) units. All types of scoring functions-force field based, knowledge based, and empirical-had examples with high and low correlation, showing no bias/advantage for any particular approach. The data across all the participants were combined to identify 63 complexes that were poorly scored across the majority of the scoring methods and 123 complexes that were scored well across the majority. The two sets were compared using a Wilcoxon rank-sum test to assess any significant difference in the distributions of >400 physicochemical properties of the ligands and the proteins. Poorly scored complexes were found to have ligands that were the same size as those in well-scored complexes, but hydrogen bonding and torsional strain were significantly different. These comparisons point to a need for CSAR to develop data sets of congeneric series with a range of hydrogen-bonding and hydrophobic characteristics and a range of rotatable bonds.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Example of comparing a set of scores, pKd (calculated), to their corresponding experimentally determined affinities. (Top) When fitting a line (black) using least-squares linear regression, the distance in the y direction between each data point and the line is its residual. (Bottom) The residuals for all the data points have a normal distribution around zero. The characteristics are well-defined, including the definition of standard deviation (σ in red, which happens to be 1.4 pKd in this example) and the number of data points with residuals outside ± σ (15.8% in each tail). Higher correlations lead to larger R2 and smaller σ; weaker correlations lead to lower R2 and larger σ, but the distributions remain Gaussian in shape.
Figure 2
Figure 2
Crystal structure of FXa bound with a 5 pM ligand (PDB id 2p3t). The ligand is very exposed with few hydrogen bonds to the protein.
Figure 3
Figure 3
Least-squares linear regression of the 17 core scoring functions. Black lines are the linear regression fit. Red lines indicate +σ and −σ, the standard deviation of the residuals. Blue points are UNDER complexes which were underscored in ≥12 of the 17 functions. The red points are OVER complexes which were overscored in ≥12 of the 17 functions.
Figure 4
Figure 4
Comparison of experimental and calculated values from the nine functions which predicted absolute binding affinity, listed roughly in order of increasing Med |Err| and RMSE. Black lines represent perfect agreement. The red lines indicate +Med |Err| and −Med |Err| from the black line. The blue circles denote complexes for which ≥7 of the 9 methods have consistently underestimated the affinity by at least Med |Err|, while the red circles are those where the affinity was overestimated.
Figure 5
Figure 5
Distribution of binding affinities in the GOOD and BAD complexes (left) are compared to those of the NULL case (right). The NULL case is generated by the sets of all complexes with affinities ≤50 nM (high), 50 nM–50 μM (middle), and ≥50 μM (low). This midrange of affinities is highlighted with a wide, gray bar on both figures.
Figure 6
Figure 6
Distribution of amino acids in the binding sites of the GOOD and BAD complexes meeting the ≥12 of 17 definition (left) are compared to those of the NULL case (right). The graph in the lower left provides the distribution of all amino acids in the full protein sequences to show that the important trends do not result from inherent differences in composition of the proteins (the same is true of the NULLs, data not shown). Metals and modified residues are denoted as other, “OTH”. Averages and error bars for the amino acid content were determined by bootstrapping.

References

    1. Leach A. R.; Shoichet B. K.; Peishoff C. E. Prediction of protein-ligand interactions. Docking and scoring: successes and gaps. J. Med. Chem. 2006, 49, 5851–5855. - PubMed
    1. Warren G. L.; Andrews C. W.; Capelli A.-M.; Clarke B.; LaLonde J.; Lambert M. H.; Lindvall M.; Nevins N.; Semus S. F.; Senger S.; Tedesco G.; Wall I. D.; Woolven J. M.; Peishoff C. E.; Head M. S. A critical assessment of docking programs and scoring functions. J. Med. Chem. 2006, 49, 5912–5931. - PubMed
    1. Dunbar J. B. Jr.; Smith R. D.; Yang C. Y.; Ung P. M.; Lexa K. W.; Khazanov N. A.; Stuckey J. A.; Wang S.; Carlson H. A. CSAR Benchmark Exercise of 2010: Selection of the protein-ligand complexes. J. Chem. Inf. Model. 2011, 10.1021/ci200082t. - PMC - PubMed
    1. Wang R.; Lu Y.; Wang S. Comparative evaluation of 11 scoring functions for molecular docking. J. Med. Chem. 2003, 46, 2287–2303. - PubMed
    1. Muchmore S. W.; Debe D. A.; Metz J. T.; Brown S. P.; Martin Y. C.; Hajduk P. J. Application of belief theory to similarity data fusion for use in analog searching and lead hopping. J. Chem. Inf. Model. 2008, 48, 941–948. - PubMed

Publication types