. 2025 Aug 31;26(5):bbaf482.

doi: 10.1093/bib/bbaf482.

Growing and linking optimizers: synthesis-driven molecule design

Clarisse Descamps¹, Vincent Bouttier¹, Juan Sanz García¹, Maoussi Lhuillier-Akakpo¹, Quentin Perron¹, Hamza Tajmouati¹

Affiliations

PMID: 40991327
PMCID: PMC12459256
DOI: 10.1093/bib/bbaf482

Growing and linking optimizers: synthesis-driven molecule design

Clarisse Descamps et al. Brief Bioinform. 2025.

. 2025 Aug 31;26(5):bbaf482.

doi: 10.1093/bib/bbaf482.

Authors

Clarisse Descamps¹, Vincent Bouttier¹, Juan Sanz García¹, Maoussi Lhuillier-Akakpo¹, Quentin Perron¹, Hamza Tajmouati¹

Affiliation

¹ Iktos, 65 Rue de Prony, 75017 Paris, Île-de-France, France.

PMID: 40991327
PMCID: PMC12459256
DOI: 10.1093/bib/bbaf482

Abstract

In the present work, two reaction-based generative models for molecular design are presented: growing optimizer and linking optimizer. These models are designed to emulate real-life chemical synthesis by sequentially selecting building blocks and simulating the reactions between them to form new compounds. By focusing on the feasibility of the generated molecules, growing optimizer and linking optimizer offer several advantages, including the ability to restrict chemistry to specific building blocks, reaction types, and synthesis pathways, a crucial requirement in drug design. Unlike text-based models, which construct molecules by iteratively forming a textual representation of the molecular structure, and graph-based models, which assemble molecules atom by atom or fragment by fragment, our approach incorporates a more comprehensive understanding of chemical knowledge, making it relevant for drug discovery projects. Comparative analysis with REINVENT 4, a state-of-the-art molecular generative model, shows that growing optimizer and linking optimizer are more likely to produce synthetically accessible molecules while reaching molecules of interest with the desired properties.

Keywords: deep learning; drug design; generative AI; hit discovery; lead optimization; reinforcement fine tuning.

PubMed Disclaimer

Figures

**Figure 1**
Molecular design strategies. Unconstrained design involves the generation of a molecule without any structural input. Fragment growing generates a molecule from an input fragment that remains part of the final compound. Fragment linking generates a molecule by binding two input fragments with a linker. GO handles unconstrained design and fragment growing, including macrocyclization, while LO handles the fragment linking strategy.

**Figure 2**
Example of a molecular tree generated by GO. The process starts with an initial fragment (a), provided by the user, featuring two exit vectors. GO chooses to add a reaction to the tree and that this reaction will be an reaction. GO chooses building block (b) among the CABB dataset. The reaction predictor applies reaction 1, respecting the exit vector constraints and the intermediate molecule (c) is obtained. GO selects another reaction, chooses building block (d), and applies reaction 2 using the template predictor (see Template predictor). This results in the intermediate product (e). Then GO adds an reaction to the intermediate molecule (e), producing the intermediate (f). Finally, GO decides not to add another reaction to the tree, and (f) becomes the final molecule of the molecular tree.

formula image — **Figure 2**
Example of a molecular tree generated by GO. The process starts with an initial fragment (a), provided by the user, featuring two exit vectors. GO chooses to add a reaction to the tree and that this reaction will be an reaction. GO chooses building block (b) among the CABB dataset. The reaction predictor applies reaction 1, respecting the exit vector constraints and the intermediate molecule (c) is obtained. GO selects another reaction, chooses building block (d), and applies reaction 2 using the template predictor (see Template predictor). This results in the intermediate product (e). Then GO adds an reaction to the intermediate molecule (e), producing the intermediate (f). Finally, GO decides not to add another reaction to the tree, and (f) becomes the final molecule of the molecular tree.

**Figure 3**
Example of a molecular tree generated by LO. The process starts with two initial fragments, (a) and (e), along with their respective exit vectors provided by the user. LO selects building block (b) from the CABB dataset to serve as the linker between the two fragments. The SRNN network determines that a reaction is not necessary to transform the linker before its reaction with fragment (a). The template predictor applies reaction 1, respecting the exit vector constraints, resulting in the intermediate molecule (c). The SRNN network then decides to apply an reaction to transform the remaining portion of the linker in (c), and the reaction predictor carries out the deprotection 2. Finally, the reaction predictor applies reaction 3, also adhering to the exit vector constraints, to link the second fragment (e) to the deprotected linker (d), resulting in the final molecule (f).

**Figure 4**
Target molecules in (a) and maximum Tanimoto similarity achieved by a molecule with an RScore above 0.5 in unconstrained design (b), fragment growing (c), and fragment linking (d) strategies. Across these use cases GO and LO generates molecules with higher rewards (ie higher similarity score to target molecules) than REINVENT 4.

**Figure 5**
Analysis of RScore and the number of molecules within the TPP for the top 500 molecules generated. (a) In unconstrained design, GO produced molecules with higher synthetic accessibility compared to REINVENT 4. (b) In fragment growing, REINVENT 4 generated a substantial number of molecules within the TPP, though only half exhibited an RScore above 0.5.

**Figure 6**
Synthetic accessibility and score distribution for hit discovery experiments. Synthetic access is measured by the number of molecules with an RScore above 0.5 among the top 500 generated molecules. The molecules scores plots represent the score distribution for synthetically accessible molecules within each generation. In (a), GO consistently generates molecules with superior scores and higher synthetic accessibility compared to REINVENT 4. In boxplot (b), while REINVENT 4 achieves high reward scores for ERK2 and TRMD, it struggles to generate synthetically accessible molecules, in contrast to GO, which generates molecules synthetically accessible with high rewards in all three use cases.

**Figure 7**
(a) Molecular tree of the top-scoring molecule generated by GO, adhering to the generation constraints (1 Suzuki reaction). (b) Suzuki reaction template used in reaction 1 of the molecular tree. (c) Top-scoring molecule generated by REINVENT 4, adhering to the generation constraints.

**Figure 8**
(a–c) The three best molecules generated by REINVENT 4 in the hit discovery use case for the PIM1 target with a spiro compound constraint. (d) Molecular tree of the best molecule generated by GO, where the spiro constraint is satisfied by sampling a spiro compound (a) from the CABB dataset. No constraint is applied to the CABB dataset for sampling building block (d).

See this image and copyright information in PMC

References

1. Paul SM, Mytelka DS, Dunwiddie CT et al. How to improve R&D productivity: the pharmaceutical industry’s grand challenge. Nat Rev Drug Discov 2010;9:203–14. 10.1038/nrd3078 - DOI - PubMed
1. Sumathi S, Suganya K, Swathi K et al. A review on deep learning-driven drug discovery: strategies, tools and applications. Curr Pharm Des 2023;29:1013–25. 10.2174/1381612829666230412084137 - DOI - PubMed
1. Stanley M, Segler M. Fake it until you make it? Generative de novo design and virtual screening of synthesizable molecules. Curr Opin Struct Biol 2023;82:102658. issn: 0959-440X. 10.1016/j.sbi.2023.102658 . url: https://www.sciencedirect.com/science/article/pii/S0959440X2300132X - DOI - PubMed
1. Nicolaou CA, Brown N. Multi-objective optimization methods in drug design. Drug Discov Today Technol 2013;10:e427–35. issn: 1740–6749. 10.1016/j.ddtec.2013.02.001 . https://www.sciencedirect.com/science/article/pii/S1740674913000085 - DOI - PubMed
1. Hughes JP, Rees S, Kalindjian SB et al. Principles of early drug discovery. Br J Pharmacol 2011;162:1239–49. 10.1111/j.1476-5381.2010.01127.x - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions

Grants and funding

Iktos

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Growing and linking optimizers: synthesis-driven molecule design

Affiliation

Growing and linking optimizers: synthesis-driven molecule design

Authors

Affiliation

Abstract

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources