Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Apr 22;12(1):28.
doi: 10.1186/s13321-020-00431-w.

CReM: chemically reasonable mutations framework for structure generation

Affiliations

CReM: chemically reasonable mutations framework for structure generation

Pavel Polishchuk. J Cheminform. .

Abstract

Structure generators are widely used in de novo design studies and their performance substantially influences an outcome. Approaches based on the deep learning models and conventional atom-based approaches may result in invalid structures and fail to address their synthetic feasibility issues. On the other hand, conventional reaction-based approaches result in synthetically feasible compounds but novelty and diversity of generated compounds may be limited. Fragment-based approaches can provide both better novelty and diversity of generated compounds but the issue of synthetic complexity of generated structure was not explicitly addressed before. Here we developed a new framework of fragment-based structure generation that, by design, results in the chemically valid structures and provides flexible control over diversity, novelty, synthetic complexity and chemotypes of generated compounds. The framework was implemented as an open-source Python module and can be used to create custom workflows for the exploration of chemical space.

Keywords: De novo design; De novo structure generation; Matched molecular pairs.

PubMed Disclaimer

Conflict of interest statement

The author declares no competing interests.

Figures

Fig. 1
Fig. 1
Generation of a database of interchangeable fragments and new molecules
Fig. 2
Fig. 2
Canonicalization of attachment point numbers in contexts and fragments
Fig. 3
Fig. 3
Structure generation modes
Fig. 4
Fig. 4
Distributions of the number of generated compounds and average novelty, diversity and synthetic complexity based on 500 compound sets generated from DrugBank compounds using whole ChEMBL fragment database with different context radius
Fig. 5
Fig. 5
Distribution of differences in the number of compounds and average novelty, diversity and synthetic complexity between data sets generated using context radius of 3 (reference) and others. Positive values demonstrate that parameter values of a data set are greater than for data sets generated at radius 3 and vice versa
Fig. 6
Fig. 6
Changes in average synthetic complexity of data sets generated from 500 DrugBank compounds used different restricted strategies relatively to an unrestricted generation used the whole ChEMBL fragment database. The occurrence of a replacing fragment in a database is given after an ampersand symbol
Fig. 7
Fig. 7
The structure of PAINS anil_di_alk_C(246). The image was produced with SMARTSviewer [43]
Fig. 8
Fig. 8
Bemis–Murcko scaffold analysis for compounds generated in course of stochastic exploration of chemical space. a Depicts data for each iteration separately. b Depicts cumulative statistics over iterations
Fig. 9
Fig. 9
Distributions of physicochemical parameters of compounds generated during stochastic exploration of chemical space in comparison with the same parameters of compounds of the initial ChEMBL data set used for generation of the fragment database
Fig. 10
Fig. 10
Example of compounds generated at the twentieth iteration at different context radius

Similar articles

Cited by

References

    1. Polishchuk PG, Madzhidov TI, Varnek A. Estimation of the size of drug-like chemical space based on GDB-17 data. J Comput Aided Mol Des. 2013;27:675–679. doi: 10.1007/s10822-013-9672-4. - DOI - PubMed
    1. Schneider P, Schneider G. De novo design at the edge of chaos. J Med Chem. 2016;59:4077–4086. doi: 10.1021/acs.jmedchem.5b01849. - DOI - PubMed
    1. Schneider G. Automating drug discovery. Nat Rev Drug Discovery. 2017;17:97. doi: 10.1038/nrd.2017.232. - DOI - PubMed
    1. Böhm H-J. The computer program LUDI: a new method for the de novo design of enzyme inhibitors. J Comput Aided Mol Des. 1992;6:61–78. doi: 10.1007/bf00124387. - DOI - PubMed
    1. Wang R, Gao Y, Lai L. LigBuilder: a multi-purpose program for structure-based drug design. Mol Model Annu. 2000;6:498–516. doi: 10.1007/s0089400060498. - DOI

LinkOut - more resources