Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 May 30;16(1):64.
doi: 10.1186/s13321-024-00861-w.

MolScore: a scoring, evaluation and benchmarking framework for generative models in de novo drug design

Affiliations

MolScore: a scoring, evaluation and benchmarking framework for generative models in de novo drug design

Morgan Thomas et al. J Cheminform. .

Abstract

Generative models are undergoing rapid research and application to de novo drug design. To facilitate their application and evaluation, we present MolScore. MolScore already contains many drug-design-relevant scoring functions commonly used in benchmarks such as, molecular similarity, molecular docking, predictive models, synthesizability, and more. In addition, providing performance metrics to evaluate generative model performance based on the chemistry generated. With this unification of functionality, MolScore re-implements commonly used benchmarks in the field (such as GuacaMol, MOSES, and MolOpt). Moreover, new benchmarks can be created trivially. We demonstrate this by testing a chemical language model with reinforcement learning on three new tasks of increasing complexity related to the design of 5-HT2a ligands that utilise either molecular descriptors, 266 pre-trained QSAR models, or dual molecular docking. Lastly, MolScore can be integrated into an existing Python script with just three lines of code. This framework is a step towards unifying generative model application and evaluation as applied to drug design for both practitioners and researchers. The framework can be found on GitHub and downloaded directly from the Python Package Index.Scientific ContributionMolScore is an open-source platform to facilitate generative molecular design and evaluation thereof for application in drug design. This platform takes important steps towards unifying existing benchmarks, providing a platform to share new benchmarks, and improves customisation, flexibility and usability for practitioners over existing solutions.

Keywords: Benchmarking; De novo molecule generation; Drug design; Generative model; Scoring functions.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Design of the molscore and moleval sub-packages. The main elements of molscore include the manager.py module that interacts with a generative model and manages scoring of the molecules according to the objective. The gui folder contains the scripts to set write configuration files or monitor de novo molecules. The scoring_functions folder contains modules for individual scoring functions, the folder scaffold_memory contains code that defines the diversity filters [25], and the utils folder contains code for the transformation and aggregation functions. The main elements of the moleval package are the metrics.py module that computes evaluation metrics and the statistics_by_n.py script that computes the evaluation metrics to a molscore output file every n-steps or n-samples
Fig. 2
Fig. 2
Integration of MolScore into a python module, including initialisation with a model name and path to a configuration file, followed by scoring of an arbitrary list of SMILES that require scoring (which would be repeated for generative model optimisation). An explicit step number can be provided during scoring, if not, it will iteratively count up from one
Fig. 3
Fig. 3
Integration of MolScore benchmark mode into a python module, including initialisation with a specific pre-existing benchmark and budget. Existing benchmarks are stored in MolScoreBenchmark.presets. The budget specifies a number of molecules to be evaluated before task.finished is set to True. Upon exit, benchmark metrics will be automatically calculated and written to CSV in the output directories
Fig. 4
Fig. 4
a Example configuration file reimplementing the Albuterol Similarity GuacaMol task. b Streamlit app to aid the creation of new configuration files and avoid manual writing of JSON files. The app annotates options available to the user and automatically parses it into the required JSON format
Fig. 5
Fig. 5
Streamlit app that can be run during or after goal-directed generative model optimisation (here showing optimisation of 5-HT2A predicted probability of activity). This is the main page used to plot training progress and select, visualise, and export molecules. Further pages are shown in Figures S1–S3
Fig. 6
Fig. 6
De novo optimisation of the first set of objectives designed by molscore by number of optimisation steps (left) with the equivalent score distribution for 3771 real 5-HT2A ligands (right). The dashed line represents the mean of the real ligand distribution and solid lines plus/minus one standard deviation from the mean. a The predicted probability of 5-HT2A activity at a concentration of 1 µM. b The first objective a combined with predicted synthesizability by RAscore. c The first objective a combined with property ranges increasing the probability of BBB. d All three objectives ac combined
Fig. 7
Fig. 7
De novo optimisation of the second set of objectives designed by molscore by number of optimisation steps (left) with the equivalent score distribution for 3771 real 5-HT2A ligands (right). The dashed line represents the mean of the real ligand distribution and solid lines plus/minus one standard deviation from the mean. a The predicted probability of 5-HT2A activity at a concentration of 1 µM. b The first objective a combined with predicted selectivity versus membrane receptors. c The first objective a combined with predicted selectivity versus D2. d The first objective a combined with predicted selectivity versus dopamine receptors. e The first objective a combined with predicted selectivity versus other serotonin sub-types. f The first objective a combined with selectivity versus other serotonin sub-types and dopamine receptors
Fig. 8
Fig. 8
Example nearest neighbour de novo molecules to real 5-HT2A selective ligands (w.r.t D2 binding) a The five most 5-HT2A selective ligands with respect to D2 binding identified in ChEMBL31 that contain a D2 pChEMBL value above 4, respective pChEMBL values are shown. b Nearest neighbour de novo molecules to each molecule in a, identified during the 5-HT2A vs D2 task with respective Tanimoto similarity (Tc) and objective score. c Predicted probabilities of class A GPCR off-targets for real and de novo ligand counterparts using PIDGINv5. d Predicted class A GPCR targets mapped onto a GPCRome tree [62], shared predicted targets are shown in red, predicted only for the real ligand in blue, and predicted only for the de novo ligand in orange
Fig. 9
Fig. 9
De novo optimisation of the third set of objectives designed by molscore by number of optimisation steps (left) with the equivalent score distribution for 3771 real 5-HT2A ligands (right). The dashed line represents the mean of the real ligand distribution and solid lines plus/minus one standard deviation from the mean. a The optimisation of the MPO score for 5-HT2A docking. b The optimisation of the MPO score for 5-HT2A vs D2. c, d The docking scores obtained during optimisation seen in (a) and (b) respectively. Note that due to the ‘moving goal post’ nature of max min normalisation, the ‘Score’ is not representative of underlying parameter optimisation and so docking score is also shown
Fig. 10
Fig. 10
Analysis of molecules generated during the ‘5-HT2A vs D2’ task via the molscore GUI. a (left) The multi-parameter page of the GUI enabling the identification of top k compounds according to user-specified parameters with the ability to redefine how scores are aggregated. b An example molecule exported to PyMol via the ‘Send2PyMol’ button. c The reference co-crystal ligand Risperidone bound to 5-HT2A
Fig. 11
Fig. 11
Analysis of protein–ligand ligand interaction in differences in 5-HT2A between top 10 de novo molecules optimised for 5-HT2A docking score, or top 10 molecules optimised for 5-HT2A vs D2 docking scores. a Protein–ligand interaction fingerprints of the reference co-crystallised ligand Risperidone, 5-HT2A docking objective, and 5-HT2A vs D2. b, c Example docked pose of one of the top 10 molecules from the above objectives respectively
Fig. 12
Fig. 12
Moleval metrics computed on different fine-tuning epochs. Epoch-0 represents the generative model before fine-tuning. Intrinsic properties a and extrinsic properties in reference to a test set (sample of the training set) b and the set of A2A ligands used for fine-tuning c are shown

References

    1. Chen H, Engkvist O, Wang Y, Olivecrona M, Blaschke T. The rise of deep learning in drug discovery. Drug Discov Today. 2018;23:1241–1250. doi: 10.1016/j.drudis.2018.01.039. - DOI - PubMed
    1. Wang M, Wang Z, Sun H, Wang J, Shen C, Weng G, Chai X, Li H, Cao D, Hou T. Deep learning approaches for de novo drug design: an overview. Curr Opin Struct Biol. 2022;72:135–144. doi: 10.1016/j.sbi.2021.10.001. - DOI - PubMed
    1. Gao W, Fu T, Sun J, Coley CW. Sample efficiency matters: a benchmark for practical molecular optimization. arxiv. 2022 doi: 10.8550/arxiv.2206.12411. - DOI
    1. Chen H. Can generative-model-based drug design become a new normal in drug discovery? J Med Chem. 2021;65:100–102. doi: 10.1021/acs.jmedchem.1c02042. - DOI - PubMed
    1. Grisoni F, Huisman BJH, Button AL, Moret M, Atz K, Merk D, Schneider G. Combining generative artificial intelligence and on-chip synthesis for de novo drug design. Sci Adv. 2021;7:eabg3338. doi: 10.1126/sciadv.abg3338. - DOI - PMC - PubMed

LinkOut - more resources