Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Dec 27;51(12):3078-92.
doi: 10.1021/ci200377u. Epub 2011 Nov 21.

Statistical potential for modeling and ranking of protein-ligand interactions

Affiliations

Statistical potential for modeling and ranking of protein-ligand interactions

Hao Fan et al. J Chem Inf Model. .

Abstract

Applications in structural biology and medicinal chemistry require protein-ligand scoring functions for two distinct tasks: (i) ranking different poses of a small molecule in a protein binding site and (ii) ranking different small molecules by their complementarity to a protein site. Using probability theory, we developed two atomic distance-dependent statistical scoring functions: PoseScore was optimized for recognizing native binding geometries of ligands from other poses and RankScore was optimized for distinguishing ligands from nonbinding molecules. Both scores are based on a set of 8,885 crystallographic structures of protein-ligand complexes but differ in the values of three key parameters. Factors influencing the accuracy of scoring were investigated, including the maximal atomic distance and non-native ligand geometries used for scoring, as well as the use of protein models instead of crystallographic structures for training and testing the scoring function. For the test set of 19 targets, RankScore improved the ligand enrichment (logAUC) and early enrichment (EF(1)) scores computed by DOCK 3.6 for 13 and 14 targets, respectively. In addition, RankScore performed better at rescoring than each of seven other scoring functions tested. Accepting both the crystal structure and decoy geometries with all-atom root-mean-square errors of up to 2 Å from the crystal structure as correct binding poses, PoseScore gave the best score to a correct binding pose among 100 decoys for 88% of all cases in a benchmark set containing 100 protein-ligand complexes. PoseScore accuracy is comparable to that of DrugScore(CSD) and ITScore/SE and superior to 12 other tested scoring functions. Therefore, RankScore can facilitate ligand discovery, by ranking complexes of the target with different small molecules; PoseScore can be used for protein-ligand complex structure prediction, by ranking different conformations of a given protein-ligand pair. The statistical potentials are available through the Integrative Modeling Platform (IMP) software package (http://salilab.org/imp) and the LigScore Web server (http://salilab.org/ligscore/).

PubMed Disclaimer

Figures

Figure 1
Figure 1. The performance of the statistical potential affected by the distance cut-off, showed on the training sets
(a) Two parameters of the potential were fixed ( wref = 0.4, wuni = 0 ), the potential showed the highest accuracy in ligand pose detection when the other parameter rmax is set to 6 Å, selecting correct binding mode for 64 (91%) targets in the training set of 70 proteins. (b) Two parameters of the potential were fixed ( wref = 0.4, wuni = 0 ) the potential showed the highest accuracy in the rescoring when the other parameter rmax is set to 6 Å, improving enrichment (logAUC) for 14 targets in the DUD-1 training set.
Figure 2
Figure 2. Four examples of accurate ligand pose prediction from the PoseScore test set
For each target, the crystal structure of the protein binding site and the co-crystallized ligand (solid stick, green) as well as the best-ranked ligand geometric decoy (solid stick, yellow) are shown. (a) Thrombin (1a46). The crystal structure of the ligand was ranked 1. A geometric decoy with the 1.39 Å RMSD error was ranked 2. (b) Carbonic anhydrase I (1bzm). The crystal structure of the ligand was ranked 3. A geometric decoy with the 1.65 Å RMSD error was ranked 1. (c) Elastase (1ela). The crystal structure of the ligand was ranked 1. A geometric decoy with the 1.37 Å RMSD error was ranked 2. (d) Streptavidin (1sre). The crystal structure of the ligand was ranked 5. A geometric decoy with the 1.39 Å RMSD error was ranked 1.
Figure 3
Figure 3. Four examples of inaccurate ligand pose prediction from the PoseScore test set
For each target, the crystal structure of the protein binding site, the co-crystallized ligand, and the highest ranking geometric decoy of the ligand are presented as in Figure 2. See Results for more detail.
Figure 4
Figure 4. Ligand poses of AmpC β-lactamase from the test set of RankScore
(a) 2D images of AmpC ligands HTC and CTC (b) Docking poses of HTC (yellow stick) and CTC (blue stick) generated by screening against the B chain of AmpC structure (PDB code: 1xgj).
Figure 5
Figure 5. The effect of the parameter α on the performance of statistical potential derived using DFIRE formula, showed on the training set
α value was set to 1, 2, 3, 4, 5 and 6 in the calculation of the potential independently. For each α value, 5 different values were chosen for the maximal boundary rmax including 6 Å (black solid line), 8 Å (black dotted line), 10 Å (red solid line), 12 Å (red dotted line), 14 Å (blue solid line) respectively. The generated potentials were tested on the training set containing 70 proteins. The potential was the most accurate when α was set to 3 and rmax set to 6 Å.
Figure 6
Figure 6. Schematic presentation of a protein-ligand complex
The protein is approximated as the outer sphere (solid line) and the ligand is completely embedded inside the protein as the inner sphere with a radius r. For the ligand atom positioned at a distance of d to the ligand center, the amount of protein-ligand atom pairs within certain distance R ( RRcutoff ) is calculated by equation 12.
Figure 7
Figure 7. The probability distribution of protein-ligand atom pairs, assuming no difference between atom types
Five distributions are plotted. First, the distribution derived using eqn. 8, from the sample of X-ray structures of protein-ligand complexes (black solid line). Second, the distribution derived using eqn. 8, from the sample of docking poses that had RMSD error of larger than 2 Å with respect to the X-ray structures (black dashed line). Third, the distribution derived using eqn. 11 in which the parameter α was set to 2 (red solid line). Fourth, the distribution derived using eqn. 11 in which the parameter α was set to 3 (blue solid line). Fifth, the distribution derived using eqn. 11 in which the parameter α was set to 4 (brown solid line).

Similar articles

Cited by

References

    1. Brooijmans N, Kuntz ID. Molecular recognition and docking algorithms. Annu Rev Biophys Biomol Struct. 2003;32:335–373. - PubMed
    1. Leach AR, Shoichet BK, Peishoff CE. Prediction of protein-ligand interactions. Docking and scoring: Successes and gaps. J Med Chem. 2006;49(20):5851–5855. - PubMed
    1. Klebe G. Virtual ligand screening: strategies, perspectives and limitations. Drug Discov Today. 2006;11(13–14):580–594. - PMC - PubMed
    1. Gilson MK, Zhou HX. Calculation of protein-ligand binding affinities. Annu Rev Biophys Biomol Struct. 2007:3621–42. - PubMed
    1. Jain AN. Scoring functions for protein-ligand docking. Curr Protein Pept Sci. 2006;7(5):407–420. - PubMed

Publication types