. 2020 Jul;34(7):783-803.

doi: 10.1007/s10822-020-00300-6. Epub 2020 Feb 28.

Enhancing reaction-based de novo design using a multi-label reaction class recommender

Gian Marco Ghiandoni¹, Michael J Bodkin², Beining Chen³, Dimitar Hristozov², James E A Wallace², James Webster¹, Valerie J Gillet⁴

Affiliations

¹ Information School, University of Sheffield, Regent Court, 211 Portobello, Sheffield, S1 4DP, UK.
² Evotec (U.K.) Ltd, 114 Innovation Drive, Milton Park, Abingdon, OX14 4RZ, UK.
³ Chemistry Department, University of Sheffield, Dainton Building, Brook Hill, Sheffield, S3 7HF, UK.
⁴ Information School, University of Sheffield, Regent Court, 211 Portobello, Sheffield, S1 4DP, UK. v.gillet@sheffield.ac.uk.

PMID: 32112286
PMCID: PMC7293200
DOI: 10.1007/s10822-020-00300-6

Enhancing reaction-based de novo design using a multi-label reaction class recommender

Gian Marco Ghiandoni et al. J Comput Aided Mol Des. 2020 Jul.

. 2020 Jul;34(7):783-803.

doi: 10.1007/s10822-020-00300-6. Epub 2020 Feb 28.

Authors

Gian Marco Ghiandoni¹, Michael J Bodkin², Beining Chen³, Dimitar Hristozov², James E A Wallace², James Webster¹, Valerie J Gillet⁴

Affiliations

¹ Information School, University of Sheffield, Regent Court, 211 Portobello, Sheffield, S1 4DP, UK.
² Evotec (U.K.) Ltd, 114 Innovation Drive, Milton Park, Abingdon, OX14 4RZ, UK.
³ Chemistry Department, University of Sheffield, Dainton Building, Brook Hill, Sheffield, S3 7HF, UK.
⁴ Information School, University of Sheffield, Regent Court, 211 Portobello, Sheffield, S1 4DP, UK. v.gillet@sheffield.ac.uk.

PMID: 32112286
PMCID: PMC7293200
DOI: 10.1007/s10822-020-00300-6

Abstract

Reaction-based de novo design refers to the in-silico generation of novel chemical structures by combining reagents using structural transformations derived from known reactions. The driver for using reaction-based transformations is to increase the likelihood of the designed molecules being synthetically accessible. We have previously described a reaction-based de novo design method based on reaction vectors which are transformation rules that are encoded automatically from reaction databases. A limitation of reaction vectors is that they account for structural changes that occur at the core of a reaction only, and they do not consider the presence of competing functionalities that can compromise the reaction outcome. Here, we present the development of a Reaction Class Recommender to enhance the reaction vector framework. The recommender is intended to be used as a filter on the reaction vectors that are applied during de novo design to reduce the combinatorial explosion of in-silico molecules produced while limiting the generated structures to those which are most likely to be synthesisable. The recommender has been validated using an external data set extracted from the recent medicinal chemistry literature and in two simulated de novo design experiments. Results suggest that the use of the recommender drastically reduces the number of solutions explored by the algorithm while preserving the chance of finding relevant solutions and increasing the global synthetic accessibility of the designed molecules.

Keywords: De novo design; Multi-label classification; Reaction class recommender; Reaction vector.

PubMed Disclaimer

Figures

**Fig. 1**
An overview of the use of the Reaction Class Recommender in de novo design. a Shows one iteration of the de novo design workflow consisting of one or more starting materials, a set of reagents and a set of reaction vectors. The reaction vectors are scanned for each starting material. Applicable reaction vectors are those for which the negative atom pairs are wholly present in the starting material or which are present in the starting material and a reagent. In b the Reaction Class Recommender is used to obtain a list of recommended reaction classes based on the characteristics of the starting material and only those reaction vectors in the recommended classes are considered further

**Fig. 2**
Examples of two molecules which are represented by identical functional groups consisting of the amine (NH) and hydroxyl (OH) groups only. They would be merged to a single entry in the training data which is associated with the two reaction classes shown

**Fig. 3**
Starting materials are extracted from a set of classified reactions along with their associated reaction class label. The starting materials are characterised by whole molecule descriptors (represented as vectors) and duplicate descriptors are merged into a single entry by appending the appropriate reaction class labels. Thus, each entry in the training set represents one or more starting materials and a multi-label classification represented as a binary vector

**Fig. 4**
The composition of reaction classes in the clean USPD Grants subset using the highest level of the classification (level-1)

**Fig. 5**
Property distributions of the starting materials extracted from the clean USPD Grants subset. Calculations were using the RDKit descriptor calculation node in KNIME

**Fig. 6**
The distribution of examples for each of the reaction classes. The level-3 set consists of 319 classes and the level-2 set consists of 259 classes

**Fig. 7**
Model creation tree diagram: the bolded nodes with directed edges represent example combinations of label-type, descriptor-type, fingerprint size, multi-label approach and classifier type

**Fig. 8**
Comparison of the models built using the level-2 and level-3 classification systems. A total of 68 models are shown which vary by descriptor, multi-label approach, modelling method and classification level. The models built using the level-2 classification are in dark blue; those built using the level-3 classification are in light blue

**Fig. 9**
BR, CC, RAkELo, and RAkELd approaches comparison using the level-3 classification scheme

**Fig. 10**
Comparison of RF and SVM, across the nine descriptors, using CC and the level-3 classification scheme

**Fig. 11**
Level-1 reaction classification of the JMC data set

**Fig. 12**
Property distribution of the starting materials extracted from the classified JMC 2018 test set

**Fig. 13**
Property distributions for the starting materials coloured by correct (green), wrong (red), and no-recommendation (blue) following application of the Reaction Class Recommenders

**Fig. 14**
Analysis of a *wrong* recommendation using the CC-RF MACCS model. The recommender did not suggest the *correct* class associated with the top molecule. However, application of the suggested transformation produces a new product for which the correct class is predicted

**Fig. 15**
RSynth and SAscore distributions per library

**Fig. 16**
Level-1 reaction class distributions across libraries

**Fig. 17**
Target hits distributions across libraries

**Fig. 18**
The small-molecule drugs selected for the retrospective validation

See this image and copyright information in PMC

References

1. Hartenfeller M, Schneider G, Hartenfeller M, Proschak E. De novo drug design. In: Bajorath J, editor. Lead generation approaches in drug discovery. Hoboken: Wiley; 2010. pp. 165–185.
1. Schneider P, Schneider G. De novo design at the edge of chaos. J Med Chem. 2016;59:4077–4086. doi: 10.1021/acs.jmedchem.5b01849. - DOI - PubMed
1. Vinkers HM, de Jonge MR, Daeyaert FFD, et al. SYNOPSIS: SYNthesize and OPtimize System in Silico. J Med Chem. 2003;46:2765–2773. doi: 10.1021/jm030809x. - DOI - PubMed
1. Hartenfeller M, Zettl H, Walter M, et al. Dogs: reaction-driven de novo design of bioactive compounds. PLoS Comput Biol. 2012 doi: 10.1371/journal.pcbi.1002380. - DOI - PMC - PubMed
1. Chen H, Engkvist O, Wang Y, et al. The rise of deep learning in drug discovery. Drug Discov Today. 2018;23:1241–1250. doi: 10.1016/j.drudis.2018.01.039. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

BB/R505821/1/Biotechnology and Biological Sciences Research Council/United Kingdom

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Enhancing reaction-based de novo design using a multi-label reaction class recommender

Affiliations

Enhancing reaction-based de novo design using a multi-label reaction class recommender

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources