Machine learning classification can reduce false positives in structure-based virtual screening

doi:10.1073/pnas.2000585117

. 2020 Aug 4;117(31):18477-18488.

doi: 10.1073/pnas.2000585117. Epub 2020 Jul 15.

Machine learning classification can reduce false positives in structure-based virtual screening

Yusuf O Adeshina^{1

2}, Eric J Deeds^{2

3}, John Karanicolas⁴

Affiliations

¹ Program in Molecular Therapeutics, Fox Chase Cancer Center, Philadelphia, PA 19111.
² Center for Computational Biology, University of Kansas, Lawrence, KS 66045.
³ Department of Molecular Biosciences, University of Kansas, Lawrence, KS 66045.
⁴ Program in Molecular Therapeutics, Fox Chase Cancer Center, Philadelphia, PA 19111; john.karanicolas@fccc.edu.

PMID: 32669436
PMCID: PMC7414157
DOI: 10.1073/pnas.2000585117

Machine learning classification can reduce false positives in structure-based virtual screening

Yusuf O Adeshina et al. Proc Natl Acad Sci U S A. 2020.

. 2020 Aug 4;117(31):18477-18488.

doi: 10.1073/pnas.2000585117. Epub 2020 Jul 15.

Authors

Yusuf O Adeshina^{1

2}, Eric J Deeds^{2

3}, John Karanicolas⁴

Affiliations

¹ Program in Molecular Therapeutics, Fox Chase Cancer Center, Philadelphia, PA 19111.
² Center for Computational Biology, University of Kansas, Lawrence, KS 66045.
³ Department of Molecular Biosciences, University of Kansas, Lawrence, KS 66045.
⁴ Program in Molecular Therapeutics, Fox Chase Cancer Center, Philadelphia, PA 19111; john.karanicolas@fccc.edu.

PMID: 32669436
PMCID: PMC7414157
DOI: 10.1073/pnas.2000585117

Abstract

With the recent explosion in the size of libraries available for screening, virtual screening is positioned to assume a more prominent role in early drug discovery's search for active chemical matter. In typical virtual screens, however, only about 12% of the top-scoring compounds actually show activity when tested in biochemical assays. We argue that most scoring functions used for this task have been developed with insufficient thoughtfulness into the datasets on which they are trained and tested, leading to overly simplistic models and/or overtraining. These problems are compounded in the literature because studies reporting new scoring methods have not validated their models prospectively within the same study. Here, we report a strategy for building a training dataset (D-COID) that aims to generate highly compelling decoy complexes that are individually matched to available active complexes. Using this dataset, we train a general-purpose classifier for virtual screening (vScreenML) that is built on the XGBoost framework. In retrospective benchmarks, our classifier shows outstanding performance relative to other scoring functions. In a prospective context, nearly all candidate inhibitors from a screen against acetylcholinesterase show detectable activity; beyond this, 10 of 23 compounds have IC₅₀ better than 50 μM. Without any medicinal chemistry optimization, the most potent hit has IC₅₀ 280 nM, corresponding to K_i of 173 nM. These results support using the D-COID strategy for training classifiers in other computational biology tasks, and for vScreenML in virtual screening campaigns against other protein targets. Both D-COID and vScreenML are freely distributed to facilitate such efforts.

Keywords: machine learning classifier; protein–ligand complex; structure-based drug design; virtual screening.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

**Fig. 1.**
Developing a challenging training set (D-COID). (A) Active complexes were assembled from the PDB by filtering for ligands that match those reflected in a screening library. For each active complex, 50 physicochemically matched compounds were selected and overlaid onto the active compounds; the three most similar compounds on the basis of overall shape and electrostatic similarity were aligned into the protein active site, and used as decoy complexes. This strategy mimics the selection of candidate (active) compounds in a realistic pharmacophore-based screening pipeline, and thus generates highly compelling decoy complexes for training/testing. (B) Modern scoring functions cannot distinguish active complexes from decoys in this set. Overlaid histograms are presented for scores obtained using various scoring functions when applied to active complexes (blue) and decoy complexes (red) in D-COID. For all eight methods tested, the distribution of scores assigned to active complexes strongly overlaps with the distribution of scores assigned to decoy complexes. From each model’s continuous scores, 10-fold cross-validation was used to obtain the classification cutoff that maximizes Matthews correlation coefficient (MCC) on each subset of the data, and these cutoffs were used in calculating precision/recall/MCC. Performance measures are presented as the average of 100 bootstrapped models, and uncertainty is presented as 95% confidence intervals.

**Fig. 2.**
Development of vScreenML. Overlaid histograms are presented for scores obtained when scoring active complexes (blue) and decoy complexes (red) from D-COID. Scoring functions used were: (A) default Rosetta energy function, (B) linearly reweighted Rosetta energy terms, (C) Rosetta energy terms combined via XGBoost, (D) Rosetta energy terms plus structural assessments, (E) Rosetta terms plus additional diverse descriptors (nonoptimized vScreenML), and (F) vScreenML after hyperparameter tuning. Over the course of this sequence, the overlap between the active and decoy complexes is progressively reduced and MCC systematically increases. For the first two panels, 10-fold cross-validation was used to obtain the classification cutoff that maximizes Matthews correlation coefficient (MCC) on each subset of the data, and these cutoffs were used in calculating precision/recall/MCC. Because the remaining panels each report results from classification models, their thresholds are fixed at 0.5. Performance measures are presented as the average of 100 trained models, each of which derived from 10-fold cross-validation (*Methods*). Uncertainty is presented as 95% confidence intervals. In all cases, performance measures were calculated for a subset of the data that was held out from the training step.

**Fig. 3.**
Comparing vScreenML to other scoring functions using two independent virtual screening benchmarks. Each benchmark is composed of multiple protein targets, corresponding to points on these plots. (A) DEKOIS benchmark, composed of 23 protein targets. For each target (individual dots), 30 to 40 active complexes and 800 to 1,200 decoy complexes are provided. For a given target, each scoring is used to rank the set of complexes. For a given scoring function, the number of active complexes in the top 1% of all complexes is used to calculate the enrichment of actives relative to randomly selecting complexes; thus, higher numbers indicate better performance. When comparing vScreenML against another method, a point below the diagonal indicates superior performance by vScreenML for this particular target. Targets seen by rfscore_VS during training of this method are marked with black triangles. (B) PPI benchmark, composed of 10 protein targets. For each target, a single active complex is hidden among 2,000 decoy complexes. Instead of using enrichment, the rank of the active compound (relative to the decoys) is calculated: thus, lower numbers indicate better performance. When comparing vScreenML against another method, a point above the diagonal indicates superior performance by vScreenML for this particular target. P values in both cases were computed using the two-tailed Wilcoxon signed-rank test.

**Fig. 4.**
Prospective evaluation of vScreenML in a virtual screen against human acetylcholinesterase (AChE). (A) Of the 23 compounds prioritized by vScreenML for testing, at 50 μM nearly all of these inhibit AChE. Data are presented as mean ± SEM; n = 3. (B) Chemical structures of the most potent hit compounds. (C) Dose–response curve for the most potent hit compound, AC6. Data are presented as mean ± SEM; n = 3. (D) Model of AC6 (orange sticks) in the active site of the AChE (light gray). (E) Predicted activity of AC6 from three target identification tools: None of these identifies AChE as a potential target of this compound, suggesting that this is a new scaffold for AChE inhibition. (F) Similarity searching against all compounds in ChEMBL designated as AChE inhibitors (either by fingerprint similarity of by shared substructure) finds no hits with discernible similarity, confirming that this is a new scaffold for AChE inhibition.

See this image and copyright information in PMC

Cited by

A vending machine for drug-like molecules - automated synthesis of virtual screening hits.
McMillan AE, Wu WWX, Nichols PL, Wanner BM, Bode JW. McMillan AE, et al. Chem Sci. 2022 Oct 28;13(48):14292-14299. doi: 10.1039/d2sc05182f. eCollection 2022 Dec 14. Chem Sci. 2022. PMID: 36545137 Free PMC article.
G Protein-Coupled Receptor-Ligand Pose and Functional Class Prediction.
Szwabowski GL, Griffing M, Mugabe EJ, O'Malley D, Baker LN, Baker DL, Parrill AL. Szwabowski GL, et al. Int J Mol Sci. 2024 Jun 22;25(13):6876. doi: 10.3390/ijms25136876. Int J Mol Sci. 2024. PMID: 38999982 Free PMC article.
Machine learning-aided scoring of synthesis difficulties for designer chromosomes.
Zheng Y, Song K, Xie ZX, Han MZ, Guo F, Yuan YJ. Zheng Y, et al. Sci China Life Sci. 2023 Jul;66(7):1615-1625. doi: 10.1007/s11427-023-2306-x. Epub 2023 Mar 3. Sci China Life Sci. 2023. PMID: 36881317
Essential Dynamics Ensemble Docking for Structure-Based GPCR Drug Discovery.
McKay K, Hamilton NB, Remington JM, Schneebeli ST, Li J. McKay K, et al. Front Mol Biosci. 2022 Jun 29;9:879212. doi: 10.3389/fmolb.2022.879212. eCollection 2022. Front Mol Biosci. 2022. PMID: 35847975 Free PMC article.
Energy-entropy method using multiscale cell correlation to calculate binding free energies in the SAMPL8 host-guest challenge.
Ali HS, Chakravorty A, Kalayan J, de Visser SP, Henchman RH. Ali HS, et al. J Comput Aided Mol Des. 2021 Aug;35(8):911-921. doi: 10.1007/s10822-021-00406-5. Epub 2021 Jul 15. J Comput Aided Mol Des. 2021. PMID: 34264476 Free PMC article.

See all "Cited by" articles

References

1. Vogelstein B. et al. ., Cancer genome landscapes. Science 339, 1546–1558 (2013). - PMC - PubMed
1. Chang M. T. et al. ., Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity. Nat. Biotechnol. 34, 155–163 (2016). - PMC - PubMed
1. Bunnage M. E., Chekler E. L., Jones L. H., Target validation using chemical probes. Nat. Chem. Biol. 9, 195–199 (2013). - PubMed
1. Macarron R. et al. ., Impact of high-throughput screening in biomedical research. Nat. Rev. Drug Discov. 10, 188–195 (2011). - PubMed
1. Clare R. H. et al. ., Industrial scale high-throughput screening delivers multiple fast acting macrofilaricides. Nat. Commun. 10, 11 (2019). - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

[1] Vogelstein B. et al. ., Cancer genome landscapes. Science 339, 1546–1558 (2013). - PMC - PubMed

[2] Vogelstein B. et al. ., Cancer genome landscapes. Science 339, 1546–1558 (2013). - PMC - PubMed

[3] Chang M. T. et al. ., Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity. Nat. Biotechnol. 34, 155–163 (2016). - PMC - PubMed

[4] Chang M. T. et al. ., Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity. Nat. Biotechnol. 34, 155–163 (2016). - PMC - PubMed

[5] Bunnage M. E., Chekler E. L., Jones L. H., Target validation using chemical probes. Nat. Chem. Biol. 9, 195–199 (2013). - PubMed

[6] Bunnage M. E., Chekler E. L., Jones L. H., Target validation using chemical probes. Nat. Chem. Biol. 9, 195–199 (2013). - PubMed

[7] Macarron R. et al. ., Impact of high-throughput screening in biomedical research. Nat. Rev. Drug Discov. 10, 188–195 (2011). - PubMed

[8] Macarron R. et al. ., Impact of high-throughput screening in biomedical research. Nat. Rev. Drug Discov. 10, 188–195 (2011). - PubMed

[9] Clare R. H. et al. ., Industrial scale high-throughput screening delivers multiple fast acting macrofilaricides. Nat. Commun. 10, 11 (2019). - PMC - PubMed

[10] Clare R. H. et al. ., Industrial scale high-throughput screening delivers multiple fast acting macrofilaricides. Nat. Commun. 10, 11 (2019). - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Machine learning classification can reduce false positives in structure-based virtual screening

Affiliations

Machine learning classification can reduce false positives in structure-based virtual screening

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources