Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jul;34(7):783-803.
doi: 10.1007/s10822-020-00300-6. Epub 2020 Feb 28.

Enhancing reaction-based de novo design using a multi-label reaction class recommender

Affiliations

Enhancing reaction-based de novo design using a multi-label reaction class recommender

Gian Marco Ghiandoni et al. J Comput Aided Mol Des. 2020 Jul.

Abstract

Reaction-based de novo design refers to the in-silico generation of novel chemical structures by combining reagents using structural transformations derived from known reactions. The driver for using reaction-based transformations is to increase the likelihood of the designed molecules being synthetically accessible. We have previously described a reaction-based de novo design method based on reaction vectors which are transformation rules that are encoded automatically from reaction databases. A limitation of reaction vectors is that they account for structural changes that occur at the core of a reaction only, and they do not consider the presence of competing functionalities that can compromise the reaction outcome. Here, we present the development of a Reaction Class Recommender to enhance the reaction vector framework. The recommender is intended to be used as a filter on the reaction vectors that are applied during de novo design to reduce the combinatorial explosion of in-silico molecules produced while limiting the generated structures to those which are most likely to be synthesisable. The recommender has been validated using an external data set extracted from the recent medicinal chemistry literature and in two simulated de novo design experiments. Results suggest that the use of the recommender drastically reduces the number of solutions explored by the algorithm while preserving the chance of finding relevant solutions and increasing the global synthetic accessibility of the designed molecules.

Keywords: De novo design; Multi-label classification; Reaction class recommender; Reaction vector.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
An overview of the use of the Reaction Class Recommender in de novo design. a Shows one iteration of the de novo design workflow consisting of one or more starting materials, a set of reagents and a set of reaction vectors. The reaction vectors are scanned for each starting material. Applicable reaction vectors are those for which the negative atom pairs are wholly present in the starting material or which are present in the starting material and a reagent. In b the Reaction Class Recommender is used to obtain a list of recommended reaction classes based on the characteristics of the starting material and only those reaction vectors in the recommended classes are considered further
Fig. 2
Fig. 2
Examples of two molecules which are represented by identical functional groups consisting of the amine (NH) and hydroxyl (OH) groups only. They would be merged to a single entry in the training data which is associated with the two reaction classes shown
Fig. 3
Fig. 3
Starting materials are extracted from a set of classified reactions along with their associated reaction class label. The starting materials are characterised by whole molecule descriptors (represented as vectors) and duplicate descriptors are merged into a single entry by appending the appropriate reaction class labels. Thus, each entry in the training set represents one or more starting materials and a multi-label classification represented as a binary vector
Fig. 4
Fig. 4
The composition of reaction classes in the clean USPD Grants subset using the highest level of the classification (level-1)
Fig. 5
Fig. 5
Property distributions of the starting materials extracted from the clean USPD Grants subset. Calculations were using the RDKit descriptor calculation node in KNIME
Fig. 6
Fig. 6
The distribution of examples for each of the reaction classes. The level-3 set consists of 319 classes and the level-2 set consists of 259 classes
Fig. 7
Fig. 7
Model creation tree diagram: the bolded nodes with directed edges represent example combinations of label-type, descriptor-type, fingerprint size, multi-label approach and classifier type
Fig. 8
Fig. 8
Comparison of the models built using the level-2 and level-3 classification systems. A total of 68 models are shown which vary by descriptor, multi-label approach, modelling method and classification level. The models built using the level-2 classification are in dark blue; those built using the level-3 classification are in light blue
Fig. 9
Fig. 9
BR, CC, RAkELo, and RAkELd approaches comparison using the level-3 classification scheme
Fig. 10
Fig. 10
Comparison of RF and SVM, across the nine descriptors, using CC and the level-3 classification scheme
Fig. 11
Fig. 11
Level-1 reaction classification of the JMC data set
Fig. 12
Fig. 12
Property distribution of the starting materials extracted from the classified JMC 2018 test set
Fig. 13
Fig. 13
Property distributions for the starting materials coloured by correct (green), wrong (red), and no-recommendation (blue) following application of the Reaction Class Recommenders
Fig. 14
Fig. 14
Analysis of a wrong recommendation using the CC-RF MACCS model. The recommender did not suggest the correct class associated with the top molecule. However, application of the suggested transformation produces a new product for which the correct class is predicted
Fig. 15
Fig. 15
RSynth and SAscore distributions per library
Fig. 16
Fig. 16
Level-1 reaction class distributions across libraries
Fig. 17
Fig. 17
Target hits distributions across libraries
Fig. 18
Fig. 18
The small-molecule drugs selected for the retrospective validation

References

    1. Hartenfeller M, Schneider G, Hartenfeller M, Proschak E. De novo drug design. In: Bajorath J, editor. Lead generation approaches in drug discovery. Hoboken: Wiley; 2010. pp. 165–185.
    1. Schneider P, Schneider G. De novo design at the edge of chaos. J Med Chem. 2016;59:4077–4086. doi: 10.1021/acs.jmedchem.5b01849. - DOI - PubMed
    1. Vinkers HM, de Jonge MR, Daeyaert FFD, et al. SYNOPSIS: SYNthesize and OPtimize System in Silico. J Med Chem. 2003;46:2765–2773. doi: 10.1021/jm030809x. - DOI - PubMed
    1. Hartenfeller M, Zettl H, Walter M, et al. Dogs: reaction-driven de novo design of bioactive compounds. PLoS Comput Biol. 2012 doi: 10.1371/journal.pcbi.1002380. - DOI - PMC - PubMed
    1. Chen H, Engkvist O, Wang Y, et al. The rise of deep learning in drug discovery. Drug Discov Today. 2018;23:1241–1250. doi: 10.1016/j.drudis.2018.01.039. - DOI - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources