CSAR 2014: A Benchmark Exercise Using Unpublished Data from Pharma

Affiliations

¹ Department of Medicinal Chemistry, College of Pharmacy, University of Michigan , 428 Church St., Ann Arbor, Michigan 48109-1065, United States.
² Center for Structural Biology, University of Michigan , 3358E Life Sciences Institute, 210 Washtenaw Ave., Ann Arbor, Michigan 48109-2216, United States.
³ Computational and Structural Sciences, Medicines Research Centre, GlaxoSmithKline Research & Development , Gunnels Wood Road, Stevenage, Hertfordshire SG1 2NY, United Kingdom.
⁴ Computational and Structural Sciences, GlaxoSmithKline Research & Development , 1250 South Collegeville Road, Collegeville, Pennsylvania 19426, United States.

PMID: 27149958
PMCID: PMC5228621
DOI: 10.1021/acs.jcim.5b00523

CSAR 2014: A Benchmark Exercise Using Unpublished Data from Pharma

Heather A Carlson et al. J Chem Inf Model. 2016.

. 2016 Jun 27;56(6):1063-77.

doi: 10.1021/acs.jcim.5b00523. Epub 2016 May 17.

Affiliations

¹ Department of Medicinal Chemistry, College of Pharmacy, University of Michigan , 428 Church St., Ann Arbor, Michigan 48109-1065, United States.
² Center for Structural Biology, University of Michigan , 3358E Life Sciences Institute, 210 Washtenaw Ave., Ann Arbor, Michigan 48109-2216, United States.
³ Computational and Structural Sciences, Medicines Research Centre, GlaxoSmithKline Research & Development , Gunnels Wood Road, Stevenage, Hertfordshire SG1 2NY, United Kingdom.
⁴ Computational and Structural Sciences, GlaxoSmithKline Research & Development , 1250 South Collegeville Road, Collegeville, Pennsylvania 19426, United States.

PMID: 27149958
PMCID: PMC5228621
DOI: 10.1021/acs.jcim.5b00523

Abstract

The 2014 CSAR Benchmark Exercise was the last community-wide exercise that was conducted by the group at the University of Michigan, Ann Arbor. For this event, GlaxoSmithKline (GSK) donated unpublished crystal structures and affinity data from in-house projects. Three targets were used: tRNA (m1G37) methyltransferase (TrmD), Spleen Tyrosine Kinase (SYK), and Factor Xa (FXa). A particularly strong feature of the GSK data is its large size, which lends greater statistical significance to comparisons between different methods. In Phase 1 of the CSAR 2014 Exercise, participants were given several protein-ligand complexes and asked to identify the one near-native pose from among 200 decoys provided by CSAR. Though decoys were requested by the community, we found that they complicated our analysis. We could not discern whether poor predictions were failures of the chosen method or an incompatibility between the participant's method and the setup protocol we used. This problem is inherent to decoys, and we strongly advise against their use. In Phase 2, participants had to dock and rank/score a set of small molecules given only the SMILES strings of the ligands and a protein structure with a different ligand bound. Overall, docking was a success for most participants, much better in Phase 2 than in Phase 1. However, scoring was a greater challenge. No particular approach to docking and scoring had an edge, and successful methods included empirical, knowledge-based, machine-learning, shape-fitting, and even those with solvation and entropy terms. Several groups were successful in ranking TrmD and/or SYK, but ranking FXa ligands was intractable for all participants. Methods that were able to dock well across all submitted systems include MDock,1 Glide-XP,2 PLANTS,3 Wilma,4 Gold,5 SMINA,6 Glide-XP2/PELE,7 FlexX,8 and MedusaDock.9 In fact, the submission based on Glide-XP2/PELE7 cross-docked all ligands to many crystal structures, and it was particularly impressive to see success across an ensemble of protein structures for multiple targets. For scoring/ranking, submissions that showed statistically significant achievement include MDock1 using ITScore1,10 with a flexible-ligand term,11 SMINA6 using Autodock-Vina,12,13 FlexX8 using HYDE,14 and Glide-XP2 using XP DockScore2 with and without ROCS15 shape similarity.16 Of course, these results are for only three protein targets, and many more systems need to be investigated to truly identify which approaches are more successful than others. Furthermore, our exercise is not a competition.

PubMed Disclaimer

Figures

**Figure 1**
Examples are given for TrmD, SYK, and FXa, showing the near-native poses (thick sticks with green carbons) among each set of 199 decoys (black lines). Protein surfaces are shown in white and are partially transparent. Ligands are labeled with a short-hand notation above; the complexes are TrmD-gtc000451, SYK-gtc000233, and FXa-gtc000101. These three ligands have the most favorable binding affinity, out of the ligands that have an available crystal structure.

**Figure 2**
Histograms of the results of Phase 1 of the 2014 CSAR Exercise. A total of 22 crystal structures were used, and 52 methods were submitted. Participants were given 199 decoys and one near-native pose for each structure. The histograms show how many methods predicted the near-native pose with their top score and within the top-3 scores across all the structures.

**Figure 3**
The poses that comprise a second, local minimum for gtc000445 are shown. The decoys (colored purple) are 5Å RMSD from the crystal pose, but they have significant overlap with the correct, near-native pose (colored green). The decoys are flipped over backwards with many favorable hydrogen bonds that lead to good scores.

**Figure 4**
Comparison of docking and ranking performance for each method submitted for Phase 2. The region in the upper left is the area where the most successful submissions are found. The value in blue is the number of methods with median RMSD ≤ 2 Å and ρ ≥ 0.5. Median ρ are calculated using all the unique ligands for each system. Median RMSD are calculated with the set of all Phase-2 ligands that have crystal structures.

**Figure 5**
There is very tight agreement in the IC₅₀ data from different assays for FXa and different salt forms of SYK. **(A)** Across all the duplicate measurements for FXa, the average unsigned difference is 0.15 pIC₅₀ and the standard deviation is only 0.24 pIC₅₀. The slope is 1. **(B)** For SYK, the average unsigned difference is 0.27 pIC₅₀ and the standard deviation is 0.21 pIC₅₀. The slope deviates from 1.0, but the smaller range of data makes this less relevant. **(C)** For calculations of repeat FXa inhibitors, 9 methods produced the exact same scores/ranks for all the inhibitors. Another 20 methods had average differences less than 2σ_expt (red line). **(D)** For calculations of SYK repeats, 11 methods gave the same scores/ranks, and 23 other methods had average differences less than 2σ_expt (red line).

See this image and copyright information in PMC

References

1. Huang S-Y, Zou X. Ensemble Docking of Multiple Protein Structures: Considering Protein Structural Variations in Molecular Docking. Proteins Struct. Funct. Bioinforma. 2007;66:399–421. - PubMed
1. Friesner RA, Murphy RB, Repasky MP, Frye LL, Greenwood JR, Halgren TA, Sanschagrin PC, Mainz DT. Extra Precision Glide: Docking and Scoring Incorporating a Model of Hydrophobic Enclosure for Protein-Ligand Complexes. J. Med. Chem. 2006;49:6177–6196. - PubMed
1. Korb O, Stützle T, Exner TE. Empirical Scoring Functions for Advanced Protein−Ligand Docking with PLANTS. J. Chem. Inf. Model. 2009;49:84–96. - PubMed
1. Sulea T, Hogues H, Purisima EO. Exhaustive Search and Solvated Interaction Energy (SIE) for Virtual Screening and Affinity Prediction. J. Comput. Aided Mol. Des. 2011;26:617–633. - PubMed
1. Jones G, Willett P, Glen RC, Leach AR, Taylor R. Development and Validation of a Genetic Algorithm for Flexible Docking. J. Mol. Biol. 1997;267:727–748. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

CSAR 2014: A Benchmark Exercise Using Unpublished Data from Pharma

Affiliations

CSAR 2014: A Benchmark Exercise Using Unpublished Data from Pharma

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous