Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 28;62(22):5550-5567.
doi: 10.1021/acs.jcim.2c00926. Epub 2022 Nov 3.

DyScore: A Boosting Scoring Method with Dynamic Properties for Identifying True Binders and Nonbinders in Structure-Based Drug Discovery

Affiliations

DyScore: A Boosting Scoring Method with Dynamic Properties for Identifying True Binders and Nonbinders in Structure-Based Drug Discovery

Yanjun Li et al. J Chem Inf Model. .

Abstract

The accurate prediction of protein-ligand binding affinity is critical for the success of computer-aided drug discovery. However, the accuracy of current scoring functions is usually unsatisfactory due to their rough approximation or sometimes even omittance of many factors involved in protein-ligand binding. For instance, the intrinsic dynamics of the protein-ligand binding state is usually disregarded in scoring function because these rapid binding affinity prediction approaches are only based on a representative complex structure of the protein and ligand in the binding state. That is, the dynamic protein-ligand binding complex ensembles are simplified as a static snapshot in calculation. In this study, two novel features were proposed for characterizing the dynamic properties of protein-ligand binding based on the static structure of the complex, which is expected to be a valuable complement to the current scoring functions. The two features demonstrate the geometry-shape matching between a protein and a ligand as well as the dynamic stability of protein-ligand binding. We further combined these two novel features with several classical scoring functions to develop a binary classification model called DyScore that uses the Extreme Gradient Boosting algorithm to classify compound poses as binders or non-binders. We have found that DyScore achieves state-of-the-art performance in distinguishing active and decoy ligands on both enhanced DUD data set and external test sets with both proposed novel features showing significant contributions to the improved performance. Especially, DyScore exhibits superior performance on early recognition, a crucial requirement for success in virtual screening and de novo drug design. The standalone version of DyScore and Dyscore-MF are freely available to all at: https://github.com/YanjunLi-CS/dyscore.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Fig. 1:
Fig. 1:
Illustration of the data split on the DUD-E data set. Note that different targets may contain different numbers of complexes, although we use the horizontal bars with the same length to represent.
Fig. 2:
Fig. 2:
Definition of the lock-key matching score. The region colored in green indicates the space occupied by atoms of the protein. The white region indicates the space that is not occupied by atoms of protein or ligand. The gray region in the blue cycle indicates the space occupied by atoms of ligands. The matching score is defined by the difference between the green and white regions. (Only grids within the observation region are accounted in.)
Fig. 3:
Fig. 3:
(a) Illustration for two types of potential force between proteins and ligands. One is the potential repulsion force (source from van der Waals interactions), which could provide a counterforce if the ligand is closer to the protein atom than the equilibrium distance. The other one is potential attraction force (source from hydrogen bond, metal coordination, or electrostatic interaction), which also could provide a counterforce to prevent the ligand from leaving. The potential repulsion force and attraction force are represented by blue and red arrows, respectively. Potential repulsion force is always coupling with possible attraction force. (b) Rigid fragments split by rotatable bonds in ligand could be treated as several separated rigid bodies. (c) Comprehensive effect of all potential repulsion and/or attraction forces for each rigid body could be considered as a resultant force. Intra-force between the rigid body is treated as bidirectional repulsion and attraction forces. (d) Potential repulsion and/or attraction force between the protein and ligand would result in motion restraints of ligands, which is intuitive to demonstrate in the two-dimension model. By simplifying the rigid body into a hexagon in a 2D model, the motion restraint could be evaluated by counting the restrained vertex (colored in red) and free vertex (colored in black). By using icosahedron (12 vertexes) to demonstrate the rigid body, the motion restraint originated from the protein–ligand interaction could be evaluated in the same manner as the two-dimension model.
Fig. 4:
Fig. 4:
Comparison of different scoring methods on the DUD-E test set generated by target-aware split. Each gray dot represents a target (102 in total), and every point value is averaged over three randomly split test sets.
Fig. 5:
Fig. 5:
Comparison of different scoring methods on the DUD-E test set generated by target-unaware split. Each gray dot represents a target.
Fig. 6:
Fig. 6:
Comparison of different scoring methods on the DEKOIS 2.0 data set. Each gray dot represents a target from the DEKOIS data set.
Fig. 7:
Fig. 7:
Estimated density functions of DyScore prediction on the DUD-E test set with target-aware split (left) and DEKOIS 2.0 benchmark set (right)
Fig. 8:
Fig. 8:
The permutation feature importance results of the stability and matching scores.
Fig. 9:
Fig. 9:
Ligand fingerprint visualization for three targets (i.e., WEE1, SAHH, and PA2GA) in a two-dimensional space. Starting from the original 1024-dimensional ligand fingerprint, we first performed PCA to reduce the dimension to 50 and then ran the t-SNE algorithm to further reduce to two dimensions. The cross and dot, respectively, represent the actives and decoys.
Fig. 10:
Fig. 10:
Comparison of interaction-based DyScore and similarity-based DyScore-MF methods on the DUD-E test set generated by target-aware split.
Fig. 11:
Fig. 11:
Enrichment factor and BEDROC to evaluate the performance of different scoring methods with respect to the target number. The metrics are measured on the DUD-E test set generated by the target-aware split.
Fig. 12:
Fig. 12:
Enrichment factor and BEDROC to evaluate the performance of different scoring methods with respect to the target number. The metrics are measured on the DUD-E test set generated by the target-unaware split.
Fig. 13:
Fig. 13:
Enrichment factor and BEDROC to evaluate the performance of different scoring methods on the DEKOIS 2.0 data set with respect to the target number.

Similar articles

Cited by

References

    1. Yu W; MacKerell AD Computer-aided drug design methods. In Antibiotics; Springer, 2017, pp 85–106. - PMC - PubMed
    1. Bhm H-J The development of a simple empirical scoring function to estimate the binding constant for a protein-ligand complex of known three-dimensional structure. J. Comput.-Aided Mol. Des. 1994, 8, 243–256. - PubMed
    1. Feher M Consensus scoring for protein–ligand interactions. Drug DiscovToday 2006, 11, 421–428. - PubMed
    1. Gohlke H; Hendlich M; Klebe G Knowledge-based scoring function to predict protein-ligand interactions. J. Mol. Biol. 2000, 295, 337–356. - PubMed
    1. Gohlke H; Klebe G Statistical potentials and scoring functions applied to protein–ligand binding. Curr. Opin. Struct. Biol. 2001, 11, 231–235. - PubMed

Publication types