Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Aug 4;117(31):18477-18488.
doi: 10.1073/pnas.2000585117. Epub 2020 Jul 15.

Machine learning classification can reduce false positives in structure-based virtual screening

Affiliations

Machine learning classification can reduce false positives in structure-based virtual screening

Yusuf O Adeshina et al. Proc Natl Acad Sci U S A. .

Abstract

With the recent explosion in the size of libraries available for screening, virtual screening is positioned to assume a more prominent role in early drug discovery's search for active chemical matter. In typical virtual screens, however, only about 12% of the top-scoring compounds actually show activity when tested in biochemical assays. We argue that most scoring functions used for this task have been developed with insufficient thoughtfulness into the datasets on which they are trained and tested, leading to overly simplistic models and/or overtraining. These problems are compounded in the literature because studies reporting new scoring methods have not validated their models prospectively within the same study. Here, we report a strategy for building a training dataset (D-COID) that aims to generate highly compelling decoy complexes that are individually matched to available active complexes. Using this dataset, we train a general-purpose classifier for virtual screening (vScreenML) that is built on the XGBoost framework. In retrospective benchmarks, our classifier shows outstanding performance relative to other scoring functions. In a prospective context, nearly all candidate inhibitors from a screen against acetylcholinesterase show detectable activity; beyond this, 10 of 23 compounds have IC50 better than 50 μM. Without any medicinal chemistry optimization, the most potent hit has IC50 280 nM, corresponding to Ki of 173 nM. These results support using the D-COID strategy for training classifiers in other computational biology tasks, and for vScreenML in virtual screening campaigns against other protein targets. Both D-COID and vScreenML are freely distributed to facilitate such efforts.

Keywords: machine learning classifier; protein–ligand complex; structure-based drug design; virtual screening.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

Fig. 1.
Fig. 1.
Developing a challenging training set (D-COID). (A) Active complexes were assembled from the PDB by filtering for ligands that match those reflected in a screening library. For each active complex, 50 physicochemically matched compounds were selected and overlaid onto the active compounds; the three most similar compounds on the basis of overall shape and electrostatic similarity were aligned into the protein active site, and used as decoy complexes. This strategy mimics the selection of candidate (active) compounds in a realistic pharmacophore-based screening pipeline, and thus generates highly compelling decoy complexes for training/testing. (B) Modern scoring functions cannot distinguish active complexes from decoys in this set. Overlaid histograms are presented for scores obtained using various scoring functions when applied to active complexes (blue) and decoy complexes (red) in D-COID. For all eight methods tested, the distribution of scores assigned to active complexes strongly overlaps with the distribution of scores assigned to decoy complexes. From each model’s continuous scores, 10-fold cross-validation was used to obtain the classification cutoff that maximizes Matthews correlation coefficient (MCC) on each subset of the data, and these cutoffs were used in calculating precision/recall/MCC. Performance measures are presented as the average of 100 bootstrapped models, and uncertainty is presented as 95% confidence intervals.
Fig. 2.
Fig. 2.
Development of vScreenML. Overlaid histograms are presented for scores obtained when scoring active complexes (blue) and decoy complexes (red) from D-COID. Scoring functions used were: (A) default Rosetta energy function, (B) linearly reweighted Rosetta energy terms, (C) Rosetta energy terms combined via XGBoost, (D) Rosetta energy terms plus structural assessments, (E) Rosetta terms plus additional diverse descriptors (nonoptimized vScreenML), and (F) vScreenML after hyperparameter tuning. Over the course of this sequence, the overlap between the active and decoy complexes is progressively reduced and MCC systematically increases. For the first two panels, 10-fold cross-validation was used to obtain the classification cutoff that maximizes Matthews correlation coefficient (MCC) on each subset of the data, and these cutoffs were used in calculating precision/recall/MCC. Because the remaining panels each report results from classification models, their thresholds are fixed at 0.5. Performance measures are presented as the average of 100 trained models, each of which derived from 10-fold cross-validation (Methods). Uncertainty is presented as 95% confidence intervals. In all cases, performance measures were calculated for a subset of the data that was held out from the training step.
Fig. 3.
Fig. 3.
Comparing vScreenML to other scoring functions using two independent virtual screening benchmarks. Each benchmark is composed of multiple protein targets, corresponding to points on these plots. (A) DEKOIS benchmark, composed of 23 protein targets. For each target (individual dots), 30 to 40 active complexes and 800 to 1,200 decoy complexes are provided. For a given target, each scoring is used to rank the set of complexes. For a given scoring function, the number of active complexes in the top 1% of all complexes is used to calculate the enrichment of actives relative to randomly selecting complexes; thus, higher numbers indicate better performance. When comparing vScreenML against another method, a point below the diagonal indicates superior performance by vScreenML for this particular target. Targets seen by rfscore_VS during training of this method are marked with black triangles. (B) PPI benchmark, composed of 10 protein targets. For each target, a single active complex is hidden among 2,000 decoy complexes. Instead of using enrichment, the rank of the active compound (relative to the decoys) is calculated: thus, lower numbers indicate better performance. When comparing vScreenML against another method, a point above the diagonal indicates superior performance by vScreenML for this particular target. P values in both cases were computed using the two-tailed Wilcoxon signed-rank test.
Fig. 4.
Fig. 4.
Prospective evaluation of vScreenML in a virtual screen against human acetylcholinesterase (AChE). (A) Of the 23 compounds prioritized by vScreenML for testing, at 50 μM nearly all of these inhibit AChE. Data are presented as mean ± SEM; n = 3. (B) Chemical structures of the most potent hit compounds. (C) Dose–response curve for the most potent hit compound, AC6. Data are presented as mean ± SEM; n = 3. (D) Model of AC6 (orange sticks) in the active site of the AChE (light gray). (E) Predicted activity of AC6 from three target identification tools: None of these identifies AChE as a potential target of this compound, suggesting that this is a new scaffold for AChE inhibition. (F) Similarity searching against all compounds in ChEMBL designated as AChE inhibitors (either by fingerprint similarity of by shared substructure) finds no hits with discernible similarity, confirming that this is a new scaffold for AChE inhibition.

Similar articles

Cited by

References

    1. Vogelstein B. et al. ., Cancer genome landscapes. Science 339, 1546–1558 (2013). - PMC - PubMed
    1. Chang M. T. et al. ., Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity. Nat. Biotechnol. 34, 155–163 (2016). - PMC - PubMed
    1. Bunnage M. E., Chekler E. L., Jones L. H., Target validation using chemical probes. Nat. Chem. Biol. 9, 195–199 (2013). - PubMed
    1. Macarron R. et al. ., Impact of high-throughput screening in biomedical research. Nat. Rev. Drug Discov. 10, 188–195 (2011). - PubMed
    1. Clare R. H. et al. ., Industrial scale high-throughput screening delivers multiple fast acting macrofilaricides. Nat. Commun. 10, 11 (2019). - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources