Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jun;29(6):485-509.
doi: 10.1007/s10822-015-9846-3. Epub 2015 May 5.

Knowledge-guided docking: accurate prospective prediction of bound configurations of novel ligands using Surflex-Dock

Affiliations

Knowledge-guided docking: accurate prospective prediction of bound configurations of novel ligands using Surflex-Dock

Ann E Cleves et al. J Comput Aided Mol Des. 2015 Jun.

Abstract

Prediction of the bound configuration of small-molecule ligands that differ substantially from the cognate ligand of a protein co-crystal structure is much more challenging than re-docking the cognate ligand. Success rates for cross-docking in the range of 20-30 % are common. We present an approach that uses structural information known prior to a particular cutoff-date to make predictions on ligands whose bounds structures were determined later. The knowledge-guided docking protocol was tested on a set of ten protein targets using a total of 949 ligands. The benchmark data set, called PINC ("PINC Is Not Cognate"), is publicly available. Protein pocket similarity was used to choose representative structures for ensemble-docking. The docking protocol made use of known ligand poses prior to the cutoff-date, both to help guide the configurational search and to adjust the rank of predicted poses. Overall, the top-scoring pose family was correct over 60 % of the time, with the top-two pose families approaching a 75 % success rate. Correct poses among all those predicted were identified nearly 90 % of the time. The largest improvements came from the use of molecular similarity to improve ligand pose rankings and the strategy for identifying representative protein structures. With the exception of a single outlier target, the knowledge-guided docking protocol produced results matching the quality of cognate-ligand re-docking, but it did so on a very challenging temporally-segregated cross-docking benchmark.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
The structural information available about CDK2 inhibitors prior to July 2003: all 42 protein structures shown with five small ligands in the active site (top); all 42 ligands oriented with their hinge-binding moieties upward (bottom left); and all 42 ligands with the viewpoint from the kinase hinge (bottom right)
Fig. 2
Fig. 2
Temporal partitioning for cross-docking prediction: the information above the dotted line became publicly available prior to 6-27-2003 and is to be used to make predictions on ligands whose bound configurations was determined later
Fig. 3
Fig. 3
The four binding sites with the largest volumes
Fig. 4
Fig. 4
The protein binding sites with volume sizes ranked 5–8
Fig. 5
Fig. 5
Dynamic substructure matching of a ligand to be docked to ligands whose bound poses are known is used to provide additional search focus on known binding motifs. Note that the ligand conformations shown are from the respective crystal structures to illustrate the relative alignments of known fragments to those present in the subject ligand
Fig. 6
Fig. 6
Molecular similarity to bound ligands of different scaffold types can help in pose-ranking
Fig. 7
Fig. 7
The top scoring pose families for the 2XNB ligand for knowledge-guided docking (a marked “G-”) and unguided docking (b marked “U-”). The crystallographic pose (two alternates) are shown in thick tan sticks with the docked pose families shown in contrasting color
Fig. 8
Fig. 8
Overall performance of Surflex-Dock under different docking protocols (left) and considering the effect of using a protein ensemble (right) compared with a single protein variant for each target (aggregated over five different selections each)
Fig. 9
Fig. 9
Overall docking performance under different docking protocols for nine targets. The key curves are blue (top scoring pose family using the knowledge-guided protocol), magenta (top two pose families), and red (top pose family in the unguided protocol)
Fig. 10
Fig. 10
The effects of using single protein exemplars versus an ensemble of five for nine targets. The key comparisons are between the red curve (protein ensemble with no substructural guidance during docking), blue curve (adding substructural guidance), and the remaining curves (each from a single protein variant from the ensemble, using no substructural guidance)
Fig. 11
Fig. 11
The experimental electron density (red mesh) for the 2XNB ligand is shown along with that computed for the entire top-scoring pose family ensemble (thin cyan sticks with blue transparent surface, a) and for the two alternate poses modeled in the crystallographic experiment (thick tan sticks with gray transparent surface, b)
Fig. 12
Fig. 12
For thrombin, comparison between docking without knowledge-based guidance (top right, pink) and with guidance (bottom, cyan) for the ligand of 1ZGV, a triazolo-pyrimidine with a non-basic S1 binding element, (top left in 2D and thick tan sticks in its experimental pose)
Fig. 13
Fig. 13
Comparison between without knowledge-based guidance (top) and with guidance (bottom) for the thrombin ligand within 3RML, showing the top two pose families in the guided case
Fig. 14
Fig. 14
The HIV-RT pocket volume (top) is shown along with docking results for the ligand of 3LAL: tan sticks for the experimental pose, cyan for the top-scoring pose family with knowledge-based guidance, and pink for the second-ranked family
Fig. 15
Fig. 15
HIV-PR (magenta), shown in top view (left) and side view (right), with the top-scoring predicted pose family for the ligand of 1ZSR (cyan with experimental pose in tan)
Fig. 16
Fig. 16
PTP1b (pink) is shown with the ligand of 1Q6S (tan sticks), the relevant known molecules sharing the difuoromethylphosphate (green), and the top two predicted pose families (cyan and light magenta)
Fig. 17
Fig. 17
Canonical early bound ligands of PPARγ with a shared binding mode (tan), along with the nine worst test cases (cyan), all exhibiting a completely different binding mode for organic acids

References

    1. Kuntz I, Blaney J, Oatley S, Langridge R, Ferrin T. A geometric approach to macromolecule-ligand interactions. J Mol Biol. 1982;161(2):269–288. doi: 10.1016/0022-2836(82)90153-X. - DOI - PubMed
    1. Goodsell D, Olson A. Automated docking of substrates to proteins by simulated annealing. Proteins Struct Funct Bioinform. 1990;8(3):195–202. doi: 10.1002/prot.340080302. - DOI - PubMed
    1. Jones G, Willett P, Glen RC. A genetic algorithm for flexible molecular overlay and pharmacophore elucidation. J Comput Aided Mol Des. 1995;9(6):532–549. doi: 10.1007/BF00124324. - DOI - PubMed
    1. Jones G, Willett P, Glen R, Leach A, Taylor R. Development and validation of a genetic algorithm for flexible docking. J Mol Biol. 1997;267(3):727–748. doi: 10.1006/jmbi.1996.0897. - DOI - PubMed
    1. Welch W, Ruppert J, Jain AN. Hammerhead: fast, fully automated docking of flexible ligands to protein binding sites. Chem Biol. 1996;3(6):449–462. doi: 10.1016/S1074-5521(96)90093-9. - DOI - PubMed

Publication types

MeSH terms

LinkOut - more resources