. 2024 May 30;16(1):64.

doi: 10.1186/s13321-024-00861-w.

MolScore: a scoring, evaluation and benchmarking framework for generative models in de novo drug design

Morgan Thomas¹, Noel M O'Boyle², Andreas Bender³, Chris De Graaf²

Affiliations

¹ Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK. morganthomas263@gmail.com.
² Computational Chemistry, Nxera Pharma, Steinmetz Building, Granta Park, Great Abington, Cambridge, CB21 6DG, UK.
³ Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK.

PMID: 38816825
PMCID: PMC11141043
DOI: 10.1186/s13321-024-00861-w

MolScore: a scoring, evaluation and benchmarking framework for generative models in de novo drug design

Morgan Thomas et al. J Cheminform. 2024.

. 2024 May 30;16(1):64.

doi: 10.1186/s13321-024-00861-w.

Authors

Morgan Thomas¹, Noel M O'Boyle², Andreas Bender³, Chris De Graaf²

Affiliations

¹ Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK. morganthomas263@gmail.com.
² Computational Chemistry, Nxera Pharma, Steinmetz Building, Granta Park, Great Abington, Cambridge, CB21 6DG, UK.
³ Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK.

PMID: 38816825
PMCID: PMC11141043
DOI: 10.1186/s13321-024-00861-w

Abstract

Generative models are undergoing rapid research and application to de novo drug design. To facilitate their application and evaluation, we present MolScore. MolScore already contains many drug-design-relevant scoring functions commonly used in benchmarks such as, molecular similarity, molecular docking, predictive models, synthesizability, and more. In addition, providing performance metrics to evaluate generative model performance based on the chemistry generated. With this unification of functionality, MolScore re-implements commonly used benchmarks in the field (such as GuacaMol, MOSES, and MolOpt). Moreover, new benchmarks can be created trivially. We demonstrate this by testing a chemical language model with reinforcement learning on three new tasks of increasing complexity related to the design of 5-HT_2a ligands that utilise either molecular descriptors, 266 pre-trained QSAR models, or dual molecular docking. Lastly, MolScore can be integrated into an existing Python script with just three lines of code. This framework is a step towards unifying generative model application and evaluation as applied to drug design for both practitioners and researchers. The framework can be found on GitHub and downloaded directly from the Python Package Index.Scientific ContributionMolScore is an open-source platform to facilitate generative molecular design and evaluation thereof for application in drug design. This platform takes important steps towards unifying existing benchmarks, providing a platform to share new benchmarks, and improves customisation, flexibility and usability for practitioners over existing solutions.

Keywords: Benchmarking; De novo molecule generation; Drug design; Generative model; Scoring functions.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Fig. 1**
Design of the molscore and moleval sub-packages. The main elements of molscore include the manager.py module that interacts with a generative model and manages scoring of the molecules according to the objective. The gui folder contains the scripts to set write configuration files or monitor de novo molecules. The scoring_functions folder contains modules for individual scoring functions, the folder scaffold_memory contains code that defines the diversity filters [25], and the utils folder contains code for the transformation and aggregation functions. The main elements of the moleval package are the metrics.py module that computes evaluation metrics and the statistics_by_n.py script that computes the evaluation metrics to a molscore output file every n-steps or n-samples

**Fig. 2**
Integration of MolScore into a python module, including initialisation with a model name and path to a configuration file, followed by scoring of an arbitrary list of SMILES that require scoring (which would be repeated for generative model optimisation). An explicit step number can be provided during scoring, if not, it will iteratively count up from one

**Fig. 3**
Integration of MolScore benchmark mode into a python module, including initialisation with a specific pre-existing benchmark and budget. Existing benchmarks are stored in MolScoreBenchmark.presets. The budget specifies a number of molecules to be evaluated before task.finished is set to True. Upon exit, benchmark metrics will be automatically calculated and written to CSV in the output directories

**Fig. 4**
a Example configuration file reimplementing the Albuterol Similarity GuacaMol task. b Streamlit app to aid the creation of new configuration files and avoid manual writing of JSON files. The app annotates options available to the user and automatically parses it into the required JSON format

**Fig. 5**
Streamlit app that can be run during or after goal-directed generative model optimisation (here showing optimisation of 5-HT_2A predicted probability of activity). This is the main page used to plot training progress and select, visualise, and export molecules. Further pages are shown in Figures S1–S3

**Fig. 6**
De novo optimisation of the first set of objectives designed by molscore by number of optimisation steps (left) with the equivalent score distribution for 3771 real 5-HT_2A ligands (right). The dashed line represents the mean of the real ligand distribution and solid lines plus/minus one standard deviation from the mean. a The predicted probability of 5-HT_2A activity at a concentration of 1 µM. b The first objective a combined with predicted synthesizability by RAscore. c The first objective a combined with property ranges increasing the probability of BBB. d All three objectives a–c combined

**Fig. 7**
De novo optimisation of the second set of objectives designed by molscore by number of optimisation steps (left) with the equivalent score distribution for 3771 real 5-HT_2A ligands (right). The dashed line represents the mean of the real ligand distribution and solid lines plus/minus one standard deviation from the mean. a The predicted probability of 5-HT_2A activity at a concentration of 1 µM. b The first objective a combined with predicted selectivity *versus* membrane receptors. c The first objective a combined with predicted selectivity *versus* D₂. d The first objective a combined with predicted selectivity *versus* dopamine receptors. e The first objective a combined with predicted selectivity *versus* other serotonin sub-types. f The first objective a combined with selectivity *versus* other serotonin sub-types and dopamine receptors

**Fig. 8**
Example nearest neighbour de novo molecules to real 5-HT_2A selective ligands (w.r.t D₂ binding) a The five most 5-HT_2A selective ligands with respect to D₂ binding identified in ChEMBL31 that contain a D₂ pChEMBL value above 4, respective pChEMBL values are shown. b Nearest neighbour de novo molecules to each molecule in a, identified during the 5-HT_2A vs D₂ task with respective Tanimoto similarity (Tc) and objective score. c Predicted probabilities of class A GPCR off-targets for real and de novo ligand counterparts using PIDGINv5. d Predicted class A GPCR targets mapped onto a GPCRome tree [62], shared predicted targets are shown in red, predicted only for the real ligand in blue, and predicted only for the de novo ligand in orange

**Fig. 9**
De novo optimisation of the third set of objectives designed by molscore by number of optimisation steps (left) with the equivalent score distribution for 3771 real 5-HT_2A ligands (right). The dashed line represents the mean of the real ligand distribution and solid lines plus/minus one standard deviation from the mean. a The optimisation of the MPO score for 5-HT_2A docking. b The optimisation of the MPO score for 5-HT_2A vs D₂. c, d The docking scores obtained during optimisation seen in (a) and (b) respectively. Note that due to the ‘moving goal post’ nature of max min normalisation, the ‘Score’ is not representative of underlying parameter optimisation and so docking score is also shown

**Fig. 10**
Analysis of molecules generated during the ‘5-HT2A vs D2’ task via the molscore GUI. a (left) The multi-parameter page of the GUI enabling the identification of top k compounds according to user-specified parameters with the ability to redefine how scores are aggregated. b An example molecule exported to PyMol via the ‘Send2PyMol’ button. c The reference co-crystal ligand Risperidone bound to 5-HT2A

**Fig. 11**
Analysis of protein–ligand ligand interaction in differences in 5-HT_2A between top 10 de novo molecules optimised for 5-HT_2A docking score, or top 10 molecules optimised for 5-HT_2A vs D₂ docking scores. a Protein–ligand interaction fingerprints of the reference co-crystallised ligand Risperidone, 5-HT_2A docking objective, and 5-HT_2A vs D₂. b, c Example docked pose of one of the top 10 molecules from the above objectives respectively

**Fig. 12**
Moleval metrics computed on different fine-tuning epochs. Epoch-0 represents the generative model before fine-tuning. Intrinsic properties a and extrinsic properties in reference to a test set (sample of the training set) b and the set of A2A ligands used for fine-tuning c are shown

See this image and copyright information in PMC

References

1. Chen H, Engkvist O, Wang Y, Olivecrona M, Blaschke T. The rise of deep learning in drug discovery. Drug Discov Today. 2018;23:1241–1250. doi: 10.1016/j.drudis.2018.01.039. - DOI - PubMed
1. Wang M, Wang Z, Sun H, Wang J, Shen C, Weng G, Chai X, Li H, Cao D, Hou T. Deep learning approaches for de novo drug design: an overview. Curr Opin Struct Biol. 2022;72:135–144. doi: 10.1016/j.sbi.2021.10.001. - DOI - PubMed
1. Gao W, Fu T, Sun J, Coley CW. Sample efficiency matters: a benchmark for practical molecular optimization. arxiv. 2022 doi: 10.8550/arxiv.2206.12411. - DOI
1. Chen H. Can generative-model-based drug design become a new normal in drug discovery? J Med Chem. 2021;65:100–102. doi: 10.1021/acs.jmedchem.1c02042. - DOI - PubMed
1. Grisoni F, Huisman BJH, Button AL, Moret M, Atz K, Merk D, Schneider G. Combining generative artificial intelligence and on-chip synthesis for de novo drug design. Sci Adv. 2021;7:eabg3338. doi: 10.1126/sciadv.abg3338. - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

MolScore: a scoring, evaluation and benchmarking framework for generative models in de novo drug design

Affiliations

MolScore: a scoring, evaluation and benchmarking framework for generative models in de novo drug design

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources