Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 19;15(1):83.
doi: 10.1186/s13321-023-00742-8.

Integrating synthetic accessibility with AI-based generative drug design

Affiliations

Integrating synthetic accessibility with AI-based generative drug design

Maud Parrot et al. J Cheminform. .

Abstract

Generative models are frequently used for de novo design in drug discovery projects to propose new molecules. However, the question of whether or not the generated molecules can be synthesized is not systematically taken into account during generation, even though being able to synthesize the generated molecules is a fundamental requirement for such methods to be useful in practice. Methods have been developed to estimate molecule "synthesizability", but, so far, there is no consensus on whether or not a molecule is synthesizable. In this paper we introduce the Retro-Score (RScore), which computes a synthetic accessibility score of molecules by performing a full retrosynthetic analysis through our data-driven synthetic planning software Spaya, and its dedicated API: Spaya-API (https://spaya.ai). We start by comparing several synthetic accessibility scores to a binary "chemist score" as estimated by chemists on a bench of generated molecules, as a first experimental validation that the RScore is a reliable synthetic accessibility score. We then describe a pipeline to generate molecules that validate a list of targets while still being easy to synthesize. We further this idea by performing experiments comparing molecular generator outputs across a range of constraints and conditions. We show that the RScore can be learned by a Neural Network, which leads to a new score: RSPred. We demonstrate that using the RScore or RSPred as a constraint during molecular generation enables our molecular generators to produce more synthesizable solutions, with higher diversity. The open-source Python code containing all the scores and the experiments can be found on ( https://github.com/iktos/generation-under-synthetic-constraint ).

Keywords: In silico molecular generation; In-silico synthesizability; Retrosynthesis artificial intelligence; machine learning.

PubMed Disclaimer

Conflict of interest statement

The authors are employees at IKTOS. The authors declare no competing interests in relationship with this manuscript.

Figures

Fig. 1
Fig. 1
Imposed structure for Pi3K/mTOR generation
Fig. 2
Fig. 2
Normalized histogram of the RScore1min on molecules from Chembl dataset
Fig. 3
Fig. 3
Correlation between the RScore1min and the number of synthetic steps given by Spaya API on a sample from Chembl dataset
Fig. 4
Fig. 4
Histogram of RA score on the Chembl dataset
Fig. 5
Fig. 5
Correlation between RA score and RScore1min on Chembl dataset
Fig. 6
Fig. 6
Correlation between SA score and RScore1min on Chembl dataset
Fig. 7
Fig. 7
Correlation between SC score and RScore1min on Chembl dataset
Fig. 8
Fig. 8
Correlation between the RScore1min and the values predicted from the neural network on a test set
Fig. 9
Fig. 9
Reward and accessibility of the top 100 molecules for each task and with different synthetic constraints. The red line is the average reward (without the synthetic score) on the top 100 molecules of the generation. The green line is the percentage of the top 100 molecules with a RScore3min above or equal to 0.5
Fig. 10
Fig. 10
Number of molecules in the blueprint for each generation, with indication on their RScore3min range
Fig. 11
Fig. 11
Example of a synthesis route obtained by Spaya

References

    1. Segler MHS, Kogej T, Tyrchan C, Waller MP. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent Sci. 2018;4:120–131. doi: 10.1021/acscentsci.7b00512. - DOI - PMC - PubMed
    1. Perron Q, Mirguet O, Tajmouati H, Skiredj A, Rojas A, Gohier A, Ducrot P, Bourguignon MP, Sansilvestri-Morel P, Do Huu N, et al. Deep generative models for ligand-based de novo design applied to multi-parametric optimization. ChemRxiv. 2021 doi: 10.26434/chemrxiv.13622417.v1. - DOI - PubMed
    1. Olivecrona M, Blaschke T, Engkvist O, Chen H. Molecular de novo design through deep reinforcement learning. J Cheminf. 2017;9:1–4. doi: 10.1186/s13321-017-0235-x. - DOI - PMC - PubMed
    1. Gómez-Bombarelli R, Wei JN, Duvenaud D, Hernández-Lobato JM, Sánchez-Lengeling B, Sheberla D, Aguilera-Iparraguirre J, Hirzel TD, Adams RP, Aspuru-Guzik A. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent Sci. 2018;4:268–276. doi: 10.1021/acscentsci.7b00572. - DOI - PMC - PubMed
    1. Sattarov B, Baskin II, Horvath D, Marcou G, Bjerrum EJ, De Varnek A. Novo molecular design by combining deep autoencoder recurrent neural networks with generative topographic mapping. J Chem Inf Model. 2019;59:1182–1196. doi: 10.1021/acs.jcim.8b00751. - DOI - PubMed

LinkOut - more resources