Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Apr 25:7:46710.
doi: 10.1038/srep46710.

Performance of machine-learning scoring functions in structure-based virtual screening

Affiliations

Performance of machine-learning scoring functions in structure-based virtual screening

Maciej Wójcikowski et al. Sci Rep. .

Abstract

Classical scoring functions have reached a plateau in their performance in virtual screening and binding affinity prediction. Recently, machine-learning scoring functions trained on protein-ligand complexes have shown great promise in small tailored studies. They have also raised controversy, specifically concerning model overfitting and applicability to novel targets. Here we provide a new ready-to-use scoring function (RF-Score-VS) trained on 15 426 active and 893 897 inactive molecules docked to a set of 102 targets. We use the full DUD-E data sets along with three docking tools, five classical and three machine-learning scoring functions for model building and performance assessment. Our results show RF-Score-VS can substantially improve virtual screening performance: RF-Score-VS top 1% provides 55.6% hit rate, whereas that of Vina only 16.2% (for smaller percent the difference is even more encouraging: RF-Score-VS top 0.1% achieves 88.6% hit rate for 27.5% using Vina). In addition, RF-Score-VS provides much better prediction of measured binding affinity than Vina (Pearson correlation of 0.56 and -0.18, respectively). Lastly, we test RF-Score-VS on an independent test set from the DEKOIS benchmark and observed comparable results. We provide full data sets to facilitate further research in this area (http://github.com/oddt/rfscorevs) as well as ready-to-use RF-Score-VS (http://github.com/oddt/rfscorevs_binary).

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interests.

Figures

Figure 1
Figure 1. Per-Target, Horizontal and vertical split of DUD-E targets.
Each barrel represents all the protein-ligand complexes (actives and decoys) associated with a different target. The training sets are coloured red, the test sets with green.
Figure 2
Figure 2. Comparison of EF1% results obtained from classical SFs: D_score, Chemscore, G_score, PMF_score, native score (i.e. which was used to by docking software), with results from three versions of RF-Score-VS.
Unlike RF-Score-VS, RF-Score v3 does not train on any negative data (this SF for binding affinity prediction was exclusively trained on X-ray crystal structures12). Each boxplot shows five EF1% values for a given SF resulting from the five 80:20 data partitions (i.e. five non-overlapping test sets collectively comprising all data). All train-test splitting scenarios are present, namely vertical, horizontal and per-target. A dramatic increase in machine-learning scoring performance (measured as EF1%) can be seen in RF-Score-VS compared to classical SFs.
Figure 3
Figure 3. Comparison of EF1% results from Per-Target and Horizontal-split models.
Each data point is a separate corresponds to the performance of both models on a particular DUD-E target. The darker is the colour of DUD-E target is, the more active ligands it has. Docking conformations were obtained from Autodock Vina. Dashed red line denotes equal performance, and dotted green line show 5-unit intervals. For most targets and contrary to common assumption, there is little advantage in training machine machine-learning SFs for per-target vs using a more generic approach (in this case horizontal split), especially for targets with greater number of active molecules.
Figure 4
Figure 4. Boxplots presenting EF1% for Autodock Vina, RF-Score v2 and novel RF-Score-VS v2 and v3 training on negative data on the part of the DEKOIS 2.0 benchmark not overlapping with DUD-E benchmark (i.e. different targets, ligands and decoys).
Figure 5
Figure 5. Predicted vs measured activity.
Top 1% of compounds predicted to be active for each target in DUD-E by (A) the Autodock Vina and its native SF (Rp = −0.18); (B) RF-Score-VS v2 trained on horizontally split dataset (Rp = 0.56); and (C) RF-Score-VS v2 trained on vertically split dataset (Rp = 0.2). Red points represent decoys (putative inactive compounds), green points – compounds with measured activity. Predicted values for machine-learning SFs are taken from the relevant cross-validation split.

References

    1. Schneider G. Virtual screening: an endless staircase? Nat. Rev. Drug Discov. 9, 273–276 (2010). - PubMed
    1. Scior T. et al.. Recognizing Pitfalls in Virtual Screening: A Critical Review. J. Chem. Inf. Model. 52, 867–881 (2012). - PubMed
    1. Bauer M. R., Ibrahim T. M., Vogel S. M. & Boeckler F. M. Evaluation and Optimization of Virtual Screening Workflows with DEKOIS 2.0 – A Public Library of Challenging Docking Benchmark Sets. J. Chem. Inf. Model. 53, 1447–1462 (2013). - PubMed
    1. Boström J., Hogner A. & Schmitt S. Do Structurally Similar Ligands Bind in a Similar Fashion? J. Med. Chem. 49, 6716–6725 (2006). - PubMed
    1. Mysinger M. M., Carchia M., Irwin J. J. & Shoichet B. K. Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking. J. Med. Chem. 55, 6582–6594 (2012). - PMC - PubMed

Publication types

LinkOut - more resources