Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug 31;26(5):bbaf482.
doi: 10.1093/bib/bbaf482.

Growing and linking optimizers: synthesis-driven molecule design

Affiliations

Growing and linking optimizers: synthesis-driven molecule design

Clarisse Descamps et al. Brief Bioinform. .

Abstract

In the present work, two reaction-based generative models for molecular design are presented: growing optimizer and linking optimizer. These models are designed to emulate real-life chemical synthesis by sequentially selecting building blocks and simulating the reactions between them to form new compounds. By focusing on the feasibility of the generated molecules, growing optimizer and linking optimizer offer several advantages, including the ability to restrict chemistry to specific building blocks, reaction types, and synthesis pathways, a crucial requirement in drug design. Unlike text-based models, which construct molecules by iteratively forming a textual representation of the molecular structure, and graph-based models, which assemble molecules atom by atom or fragment by fragment, our approach incorporates a more comprehensive understanding of chemical knowledge, making it relevant for drug discovery projects. Comparative analysis with REINVENT 4, a state-of-the-art molecular generative model, shows that growing optimizer and linking optimizer are more likely to produce synthetically accessible molecules while reaching molecules of interest with the desired properties.

Keywords: deep learning; drug design; generative AI; hit discovery; lead optimization; reinforcement fine tuning.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Molecular design strategies. Unconstrained design involves the generation of a molecule without any structural input. Fragment growing generates a molecule from an input fragment that remains part of the final compound. Fragment linking generates a molecule by binding two input fragments with a linker. GO handles unconstrained design and fragment growing, including macrocyclization, while LO handles the fragment linking strategy.
Figure 2
Figure 2
Example of a molecular tree generated by GO. The process starts with an initial fragment (a), provided by the user, featuring two exit vectors. GO chooses to add a reaction to the tree and that this reaction will be an formula image reaction. GO chooses building block (b) among the CABB dataset. The reaction predictor applies reaction 1, respecting the exit vector constraints and the intermediate molecule (c) is obtained. GO selects another formula image reaction, chooses building block (d), and applies reaction 2 using the template predictor (see Template predictor). This results in the intermediate product (e). Then GO adds an formula image reaction to the intermediate molecule (e), producing the intermediate (f). Finally, GO decides not to add another reaction to the tree, and (f) becomes the final molecule of the molecular tree.
Figure 3
Figure 3
Example of a molecular tree generated by LO. The process starts with two initial fragments, (a) and (e), along with their respective exit vectors provided by the user. LO selects building block (b) from the CABB dataset to serve as the linker between the two fragments. The SRNN network determines that a formula image reaction is not necessary to transform the linker before its reaction with fragment (a). The template predictor applies reaction 1, respecting the exit vector constraints, resulting in the intermediate molecule (c). The SRNN network then decides to apply an formula image reaction to transform the remaining portion of the linker in (c), and the reaction predictor carries out the deprotection 2. Finally, the reaction predictor applies reaction 3, also adhering to the exit vector constraints, to link the second fragment (e) to the deprotected linker (d), resulting in the final molecule (f).
Figure 4
Figure 4
Target molecules in (a) and maximum Tanimoto similarity achieved by a molecule with an RScore above 0.5 in unconstrained design (b), fragment growing (c), and fragment linking (d) strategies. Across these use cases GO and LO generates molecules with higher rewards (ie higher similarity score to target molecules) than REINVENT 4.
Figure 5
Figure 5
Analysis of RScore and the number of molecules within the TPP for the top 500 molecules generated. (a) In unconstrained design, GO produced molecules with higher synthetic accessibility compared to REINVENT 4. (b) In fragment growing, REINVENT 4 generated a substantial number of molecules within the TPP, though only half exhibited an RScore above 0.5.
Figure 6
Figure 6
Synthetic accessibility and score distribution for hit discovery experiments. Synthetic access is measured by the number of molecules with an RScore above 0.5 among the top 500 generated molecules. The molecules scores plots represent the score distribution for synthetically accessible molecules within each generation. In (a), GO consistently generates molecules with superior scores and higher synthetic accessibility compared to REINVENT 4. In boxplot (b), while REINVENT 4 achieves high reward scores for ERK2 and TRMD, it struggles to generate synthetically accessible molecules, in contrast to GO, which generates molecules synthetically accessible with high rewards in all three use cases.
Figure 7
Figure 7
(a) Molecular tree of the top-scoring molecule generated by GO, adhering to the generation constraints (1 Suzuki reaction). (b) Suzuki reaction template used in reaction 1 of the molecular tree. (c) Top-scoring molecule generated by REINVENT 4, adhering to the generation constraints.
Figure 8
Figure 8
(a–c) The three best molecules generated by REINVENT 4 in the hit discovery use case for the PIM1 target with a spiro compound constraint. (d) Molecular tree of the best molecule generated by GO, where the spiro constraint is satisfied by sampling a spiro compound (a) from the CABB dataset. No constraint is applied to the CABB dataset for sampling building block (d).

References

    1. Paul SM, Mytelka DS, Dunwiddie CT et al. How to improve R&D productivity: the pharmaceutical industry’s grand challenge. Nat Rev Drug Discov 2010;9:203–14. 10.1038/nrd3078 - DOI - PubMed
    1. Sumathi S, Suganya K, Swathi K et al. A review on deep learning-driven drug discovery: strategies, tools and applications. Curr Pharm Des 2023;29:1013–25. 10.2174/1381612829666230412084137 - DOI - PubMed
    1. Stanley M, Segler M. Fake it until you make it? Generative de novo design and virtual screening of synthesizable molecules. Curr Opin Struct Biol 2023;82:102658. issn: 0959-440X. 10.1016/j.sbi.2023.102658 . url: https://www.sciencedirect.com/science/article/pii/S0959440X2300132X - DOI - PubMed
    1. Nicolaou CA, Brown N. Multi-objective optimization methods in drug design. Drug Discov Today Technol 2013;10:e427–35. issn: 1740–6749. 10.1016/j.ddtec.2013.02.001 . https://www.sciencedirect.com/science/article/pii/S1740674913000085 - DOI - PubMed
    1. Hughes JP, Rees S, Kalindjian SB et al. Principles of early drug discovery. Br J Pharmacol 2011;162:1239–49. 10.1111/j.1476-5381.2010.01127.x - DOI - PMC - PubMed

Grants and funding