MEMES: Machine learning framework for Enhanced MolEcular Screening

Sarvesh Mehta¹, Siddhartha Laghuvarapu¹, Yashaswi Pathak¹, Aaftaab Sethi², Mallika Alvala³, U Deva Priyakumar¹

Affiliations

¹ Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology Hyderabad 500 032 India deva@iiit.ac.in +91 40 6653 1413 +91 40 6653 1161.
² Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research Hyderabad 500 037 India.
³ School of Pharmacy and Technology Management, Narsee Monjee Institute of Management Sciences Hyderabad India.

PMID: 34659706
PMCID: PMC8442698
DOI: 10.1039/d1sc02783b

MEMES: Machine learning framework for Enhanced MolEcular Screening

Sarvesh Mehta et al. Chem Sci. 2021.

. 2021 Jul 26;12(35):11710-11721.

doi: 10.1039/d1sc02783b. eCollection 2021 Sep 15.

Authors

Sarvesh Mehta¹, Siddhartha Laghuvarapu¹, Yashaswi Pathak¹, Aaftaab Sethi², Mallika Alvala³, U Deva Priyakumar¹

Affiliations

¹ Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology Hyderabad 500 032 India deva@iiit.ac.in +91 40 6653 1413 +91 40 6653 1161.
² Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research Hyderabad 500 037 India.
³ School of Pharmacy and Technology Management, Narsee Monjee Institute of Management Sciences Hyderabad India.

PMID: 34659706
PMCID: PMC8442698
DOI: 10.1039/d1sc02783b

Abstract

In drug discovery applications, high throughput virtual screening exercises are routinely performed to determine an initial set of candidate molecules referred to as "hits". In such an experiment, each molecule from a large small-molecule drug library is evaluated in terms of physical properties such as the docking score against a target receptor. In real-life drug discovery experiments, drug libraries are extremely large but still there is only a minor representation of the essentially infinite chemical space, and evaluation of physical properties for each molecule in the library is not computationally feasible. In the current study, a novel Machine learning framework for Enhanced MolEcular Screening (MEMES) based on Bayesian optimization is proposed for efficient sampling of the chemical space. The proposed framework is demonstrated to identify 90% of the top-1000 molecules from a molecular library of size about 100 million, while calculating the docking score only for about 6% of the complete library. We believe that such a framework would tremendously help to reduce the computational effort in not only drug-discovery but also areas that require such high-throughput experiments.

This journal is © The Royal Society of Chemistry.

PubMed Disclaimer

Conflict of interest statement

International Institute of Information Technology, Hyderabad has filed provisional patent application for the use of the MEMES framework in high-throughput screening exercises, with U. D. P., S. M., S. L., and Y. P. listed as inventors. Provisional patent application No.: 202041050608. Application status: awaiting complete specification (provisional patent filed). The funders did not have any role in the design, idea, data collection, analysis, interpretation, writing of the manuscript or decision to submit it for publication.

Figures

**Fig. 1. Overview of the proposed method, MEMES.**

Fig. 2. Performance on Zinc-250K using ExactMEMES against both target receptors. (a) and (b) compare the mean docking score of top hits sampled by MEMES and random sampling against the mean docking score of actual top hits in the library. (c) and (d) show the fraction of the top 500 sampled molecules that are actual top hits against the percentage of the dataset sampled. The reported results are an average of 3 runs and the shaded region represents standard deviation across these runs.

Fig. 3. Venn diagram showing the intersection of the top 500 molecules identified by the MEMES framework and actual top 500 hits from the Zinc-250K docking library (the statistics shown are for one of the three runs).

Fig. 4. To compare the performance of ExactMEMES and DeepMEMES, a fraction of the top 500 molecules sampled that are actual top hits from the Zinc-250K dataset is plotted against the percentage of the dataset sampled (see ESI Fig. S7 for similar analysis for top 100 molecules). Mol2Vec as a featurization technique was used for this comparison. The reported trial results are an average of 3 runs and the shaded region represents standard deviation across these runs.

Fig. 5. (a) and (b) show the performance of DeepMEMES on the Enamine dataset against target protein TTBK1. (a) shows the fraction of the top 500 sampled molecules that are actual top hits in the library. The reported results are an average of 3 runs and the shaded region represents standard deviation across these runs. The Venn diagram (b) demonstrates the overlap of top 500 hits DeepMEMES(Mol2Vec), random sampling and the whole dataset (the statistics shown are for one of the three runs).

Fig. 6. (a)–(c) show the performance of DeepMEMES on an Ultra Large Docking Library against target protein AmpC. (a) shows the fraction of the top 1000 sampled molecules that are actual top hits in the library. The result shown is an average over three runs. The Venn diagram (b) demonstrates the overlap of the top 1000 hits identified by DeepMEMES (Mol2Vec), random sampling and the whole dataset. (c) shows the distribution of the docking scores for top 10 000 molecules sampled by DeepMEMES, random sampling and the whole dataset. The vertical red line denotes the cutoff docking score for the top 1000 hits (the distribution plot and Venn diagram are made from one of the three runs).

Fig. 7. Fraction of top molecules sampled by DeepMEMES (with Mol2Vec as the featurization technique) that matches with actual top hits from the corresponding subsets against the percentage of the dataset sampled.

See this image and copyright information in PMC

References

1. Schmidt H. R. Betz R. M. Dror R. O. Kruse A. C. Structural basis for σ 1 receptor ligand recognition. Nat. Struct. Mol. Biol. 2018;25:981–987. doi: 10.1038/s41594-018-0137-2. - DOI - PMC - PubMed
1. Lyne P. D. Structure-based virtual screening: an overview. Drug discovery today. 2002;7:1047–1055. doi: 10.1016/S1359-6446(02)02483-2. - DOI - PubMed
1. Cheng T. Li Q. Zhou Z. Wang Y. Bryant S. H. Structure-based virtual screening for drug discovery: a problem-centric review. AAPS J. 2012;14:133–141. doi: 10.1208/s12248-012-9322-0. - DOI - PMC - PubMed
1. McCorvy J. D. Butler K. V. Kelly B. Rechsteiner K. Karpiak J. Betz R. M. Kormos B. L. Shoichet B. K. Dror R. O. Jin J. et al., Structure-inspired design of β-arrestin-biased ligands for aminergic GPCRs. Nat. Chem. Biol. 2018;14:126. doi: 10.1038/nchembio.2527. - DOI - PMC - PubMed
1. Gaulton A. Bellis L. J. Bento A. P. Chambers J. Davies M. Hersey A. Light Y. McGlinchey S. Michalovich D. Al-Lazikani B. et al., ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012;40:D1100–D1107. doi: 10.1093/nar/gkr777. - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

MEMES: Machine learning framework for Enhanced MolEcular Screening

Affiliations

MEMES: Machine learning framework for Enhanced MolEcular Screening

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources