Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Sep 26;51(9):2036-46.
doi: 10.1021/ci200082t. Epub 2011 Jul 22.

CSAR benchmark exercise of 2010: selection of the protein-ligand complexes

Affiliations
Free PMC article

CSAR benchmark exercise of 2010: selection of the protein-ligand complexes

James B Dunbar Jr et al. J Chem Inf Model. .
Free PMC article

Erratum in

  • J Chem Inf Model. 2011Sep 26;51(9):2146

Abstract

A major goal in drug design is the improvement of computational methods for docking and scoring. The Community Structure Activity Resource (CSAR) aims to collect available data from industry and academia which may be used for this purpose ( www.csardock.org ). Also, CSAR is charged with organizing community-wide exercises based on the collected data. The first of these exercises was aimed to gauge the overall state of docking and scoring, using a large and diverse data set of protein-ligand complexes. Participants were asked to calculate the affinity of the complexes as provided and then recalculate with changes which may improve their specific method. This first data set was selected from existing PDB entries which had binding data (K(d) or K(i)) in Binding MOAD, augmented with entries from PDB bind. The final data set contains 343 diverse protein-ligand complexes and spans 14 pK(d). Sixteen proteins have three or more complexes in the data set, from which a user could start an inspection of congeneric series. Inherent experimental error limits the possible correlation between scores and measured affinity; Pearson R is limited to ~ 0.91 (Pearson R2 0.83) when fitting to the data set without over parameterizing. Pearson R is limited to ~ 0.83(Pearson R2 ~ 0.70) when scoring the data set with a method trained on outside data [corrected]. The details of how the data set was initially selected, and the process by which it matured to better fit the needs of the community are presented. Many groups generously participated in improving the data set, and this underscores the value of a supportive, collaborative effort in moving our field forward.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Flowchart of how the data set was curated.
Figure 2
Figure 2
Distribution analysis of the calculated physical properties of the data sets.
Figure 3
Figure 3
The addition of random error with standard deviations of 0.5 log K (top) or 1.0 log K (bottom) do not significantly degrade the “signal to noise” in the CSAR-NRC data set. Only 10 of the 100 randomly generated sets are shown for clarity, and a line with a slope of 1.0 is given as a guideline in all the graphs. (Left) Correlations based on the model that the reported affinities are ideal (y-axis) and random, normally distributed error can be added to generate possible measurements found in another lab (x-axis). (Right) Correlations based on the model that both the reported value and another measured value could have the same, random error. These plots also approximate the variation between scores and measured affinity values.

References

    1. Berman H. M.; Westbrook J; Feng Z.; Gilliland G.; Bhat T. N.; Weissig H.; Shindyalov I. N.; Bourne P. E. The Protein Data Bank. Nucleic Acids Res. 2000, 28, 235–242. - PMC - PubMed
    1. Hu L.; Benson M. L.; Smith R. D.; Lerner M. G.; Carlson H. A. Binding MOAD (Mother of All Databases). Proteins: Struct., Funct., Bioinf. 2005, 60, 333–340. - PubMed
    1. Wang R.; Fang X.; Lu Y.; Wang S. The PDBbind Database: Collection of Binding Affinities for Protein-Ligand Complexes with Known Three-Dimensional Structures. J. Med. Chem. 2004, 47, 2977–2980. - PubMed
    1. Kuntz I. D.; Blaney J. M.; Oatley S. J.; Langridge R.; FerrinT. E. A Geometric Approach to Macromolecule-Ligand interactions. J. Mol. Biol. 1982, 161, 269–288. - PubMed
    1. Meng E. C.; Shoichet B. K.; Kuntz I. D. Automated docking with grid-based energy evaluation. J. Comput. Chem. 1992, 13, 505–524.

Publication types

LinkOut - more resources