Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Sep 26;51(9):2047-65.
doi: 10.1021/ci1003009. Epub 2011 Jun 6.

Evaluation of several two-step scoring functions based on linear interaction energy, effective ligand size, and empirical pair potentials for prediction of protein-ligand binding geometry and free energy

Affiliations

Evaluation of several two-step scoring functions based on linear interaction energy, effective ligand size, and empirical pair potentials for prediction of protein-ligand binding geometry and free energy

Obaidur Rahaman et al. J Chem Inf Model. .

Abstract

The performances of several two-step scoring approaches for molecular docking were assessed for their ability to predict binding geometries and free energies. Two new scoring functions designed for "step 2 discrimination" were proposed and compared to our CHARMM implementation of the linear interaction energy (LIE) approach using the Generalized-Born with Molecular Volume (GBMV) implicit solvation model. A scoring function S1 was proposed by considering only "interacting" ligand atoms as the "effective size" of the ligand and extended to an empirical regression-based pair potential S2. The S1 and S2 scoring schemes were trained and 5-fold cross-validated on a diverse set of 259 protein-ligand complexes from the Ligand Protein Database (LPDB). The regression-based parameters for S1 and S2 also demonstrated reasonable transferability in the CSARdock 2010 benchmark using a new data set (NRC HiQ) of diverse protein-ligand complexes. The ability of the scoring functions to accurately predict ligand geometry was evaluated by calculating the discriminative power (DP) of the scoring functions to identify native poses. The parameters for the LIE scoring function with the optimal discriminative power (DP) for geometry (step 1 discrimination) were found to be very similar to the best-fit parameters for binding free energy over a large number of protein-ligand complexes (step 2 discrimination). Reasonable performance of the scoring functions in enrichment of active compounds in four different protein target classes established that the parameters for S1 and S2 provided reasonable accuracy and transferability. Additional analysis was performed to definitively separate scoring function performance from molecular weight effects. This analysis included the prediction of ligand binding efficiencies for a subset of the CSARdock NRC HiQ data set where the number of ligand heavy atoms ranged from 17 to 35. This range of ligand heavy atoms is where improved accuracy of predicted ligand efficiencies is most relevant to real-world drug design efforts.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of CHARMM-based molecular docking protocol and scoring functions used for step 1 (geometry) and step 2 (binding free energy) discrimination.
Figure 2
Figure 2
Comparison of predicted and experimental ΔGbind (kcal/mol) for six scoring functions optimized for LPDB259: (A) LIE(GBMV), (B) S1, (C) S2, (D) S2h, (E) S3, (F) S3w.
Figure 3
Figure 3
Comparison of predicted and experimental ΔGbind (kcal/mol) for the S2 scoring function for NRC HiQ data sets. (A) S2 parameters optimized on HiQ set 1 (B) S2 parameters optimized on NRC HiQ set 2 (C) S2 rescore set 1 (with parameters optimized from set 2) (D) S2 rescore set 2 (with parameters optimized from set 1).
Figure 4
Figure 4
Average discriminative power (DP) of various scoring functions as a function of molecular weight. DP has been averaged over LPDB entries using a 50 dalton window of ligand size, and interpolated for windows with less than three LPDB entries to produce a smooth plot. (A) DP of several CHARMM-based scoring functions compared to LIE(Rdie) and LIE(GBMV) fit to the LPDB259. (B) DP of LIE(GBMV) with three different values of the electrostatic parameter β, while the van der Waals parameter is fixed at α=0.20. (C) DP of LIE(GBMV), S1, S2, S2h, S3, S3w all fit to LPDB259.
Figure 5
Figure 5
Discriminative power (DP) of LIE scoring functions as a function of the LIE electrostatic parameter β, keeping the van der Waals parameter α fixed at 0.20. (A) DP for LIE(GBMV) and LIE(Rdie) over the LPDB160 dataset. (B) DP for LIE(GBMV) calculated across 5 cross validation subgroups of LPDB160.
Figure 6
Figure 6
Comparison of predicted and experimental ligand efficiencies for subsets of the NRC HiQ data sets containing 17–35 heavy atoms. (A) S1 rescore NRC HiQ set 1 (parameters optimized on set 2) (B) S1 rescore NRC HiQ set 2 (parameters optimized on set 1 (C) S2 rescore set 1 (with parameters optimized from set 2) (D) S2 rescore set 2 (with parameters optimized from set 1). For these data subsets there is a very low correlation between binding affinity and molecular weight.
Figure 7
Figure 7
Distribution of the molecular weight (MW) for p38a MAP kinase active ligands and decoy ligands taken from the DUD. (A) Distribution of the entire MW range of actives from 200–700 using 10 dalton bins. DUD decoys only cover the range MW 320–450. (B) Distribution of the MW range from 200 to 450 using 5 dalton bins. The molecular weight range of 320–450 is shown broken into two groups for Enrichment Factor and ROC curve analysis: low MW range (320–375) and high MW range (375–450).
Figure 8
Figure 8
Early enrichment for the top 10 percent of the database. EF factors are shown for the six scoring functions over three molecular weight (MW) ranges: (A) MW: 320–450 which reflects the true total enrichment over the database (B) low MW range from 320–375 (C) high MW range from 375–450.
Figure 9
Figure 9
Receiver Operating Characteristic (ROC) curve for database. ROC curves are shown for the six scoring functions over three molecular weight (MW) ranges: (A) MW: 320–450 which reflects the true total enrichment over the database (B) low MW range from 320–375 (C) high MW range from 375–450. ROC of random is shown as a black line on the diagonal.

References

    1. Ewing TJA, Kuntz ID. Critical evaluation of search algorithms for automated molecular docking and database screening. J Comput Chem. 1997;18 (9):1175–1189.
    1. Morris GM, Goodsell DS, Halliday RS, Huey R, Hart WE, Belew RK, Olson AJ. Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. J Comput Chem. 1998;19 (14):1639–1662.
    1. Wu GS, Robertson DH, Brooks CL, Vieth M. Detailed analysis of grid-based molecular docking: A case study of CDOCKER - A CHARMm-based MD docking algorithm. J Comput Chem. 2003;24 (13):1549–1562. - PubMed
    1. Ferrara P, Gohlke H, Price DJ, Klebe G, Brooks CL. Assessing scoring functions for protein-ligand interactions. J Med Chem. 2004;47 (12):3032–3047. - PubMed
    1. Wang R, Lu Y, Fang X, Wang S. An Extensive Test of 14 Scoring Functions Using the PDBbind Refined Set of 800 Protein-Ligand Complexes. J Chem Inf Comput Sci. 2004;44:2114–2125. - PubMed

Publication types