Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 7;7(1):52.
doi: 10.1038/s42004-024-01133-2.

Efficient retrosynthetic planning with MCTS exploration enhanced A* search

Affiliations

Efficient retrosynthetic planning with MCTS exploration enhanced A* search

Dengwei Zhao et al. Commun Chem. .

Abstract

Retrosynthetic planning, which aims to identify synthetic pathways for target molecules from starting materials, is a fundamental problem in synthetic chemistry. Computer-aided retrosynthesis has made significant progress, in which heuristic search algorithms, including Monte Carlo Tree Search (MCTS) and A* search, have played a crucial role. However, unreliable guiding heuristics often cause search failure due to insufficient exploration. Conversely, excessive exploration also prevents the search from reaching the optimal solution. In this paper, MCTS exploration enhanced A* (MEEA*) search is proposed to incorporate the exploratory behavior of MCTS into A* by providing a look-ahead search. Path consistency is adopted as a regularization to improve the generalization performance of heuristics. Extensive experimental results on 10 molecule datasets demonstrate the effectiveness of MEEA*. Especially, on the widely used United States Patent and Trademark Office (USPTO) benchmark, MEEA* achieves a 100.0% success rate. Moreover, for natural products, MEEA* successfully identifies bio-retrosynthetic pathways for 97.68% test compounds.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Translation of the chemical retrosynthetic route representation to the search tree representation.
a Is the chemical representation of the synthesis plan, and b is the search tree representation. In the search tree, the states of the nodes encompass a set of molecules essential for the synthesis of the target molecule, including all unsynthesized intermediate molecules as well as the building blocks. The edges represent chemical reactions applied to the parent node.
Fig. 2
Fig. 2. Demonstration of the results by methods of insufficient or excessive exploration under the limited search time.
a The first failure case expands states on a non-optimal branch due to the lack of exploration; b The second failure case compulsively explores unnecessary branches and is unable to delve deeper toward the optimal solution; c The cumulative proportions of lengths of synthetic routes required by the exhaustive search for molecules that are failed to be synthesized by the search algorithm; d The failure rates of molecules with different path lengths to be synthesized by the exhaustive search (The source data for c and d is provided in Supplementary Data 1 and 2). Retro*+ and MCTS employ the same expansion function and value estimator. (MCTS does not utilize the pruning technique).
Fig. 3
Fig. 3. The overall framework of MEEA* retrosynthetic planning search algorithm.
a The search process of our MEEA* algorithm includes three steps. (1) Simulation: conduct KMCTS iterations to generate the candidate set from the open set; (2) Selection: select the state with the minimum f value from the candidate set; (3) Expansion: expand the selected state using the single-step retrosynthetic model B. b Evaluation of states during the Simulation step of MEEA*. g(s) is the summation of reaction costs along the traversal path, and h(s) is estimated by the value network, considering all molecules in the state. c Single-step retrosynthetic model B used in the Expansion step. The first non-building block molecule is selected for expansion. The top k best reaction templates are obtained based on the priors provided by the policy network, which are applied to the expanded molecule to generate its possible precursors.
Fig. 4
Fig. 4. MEEA* achieves superior performance compared with A* search and MCTS.
a The learning curve of the cost estimator in Retro* illustrates the overfitting problem; b Success rate on USPTO benchmark with different iterations. Guided by the same heuristic function, Retro*+ also achieves 100% success rate after enough iterations, but MEEA* is much more efficient; c Success rate on Natural Products (The source data for ac is provided in Supplementary Data 3–5); d, e Synthetic route for molecule COc1ccc2nccc(C(N)CC[C@@H]3CCN(CCSc4cccs4)C[C@@H]3C(=O)O)c2c1. MEEA* provides a shorter reaction pathway than the exhaustive search. Retro*+ fails to provide a solution, although the target can be synthesized in three steps. MCTS yields the same solution with MEEA*, but MCTS requires 35 expansions while MEEA* only requires 11 expansions. fh Synthetic route for molecule COCCCc1cc(CN(C(=O)[C@H]2CN(C(=O)OC(C)(C)C)CC[C@@H]2c2ccc(OCCOc3c(Cl) cc(C)cc3Cl)cc2)C2CC2)cc(OC[C@@H]2C[C@H]2C(=O)OCC(=O)N(C)C)c1. Reaction pathways provided by the exhaustive search, Retro*+, and MEEA* with lengths of 16, 12, and 11 respectively. MCTS fails to provide a solution. (MEEA*, MCTS, and Retro*+ are guided with the same single-step expansion policy and cost estimator).

Similar articles

Cited by

References

    1. Corey EJ. The logic of chemical synthesis: multistep synthesis of complex carbogenic molecules. Angew. Chem. Int. Ed. Engl. 1991;30:455–465. doi: 10.1002/anie.199104553. - DOI
    1. Hughes JP, Rees S, Kalindjian SB, Philpott KL. Principles of early drug discovery. Brit. J. Pharmacol. 2011;162:1239–1249. doi: 10.1111/j.1476-5381.2010.01127.x. - DOI - PMC - PubMed
    1. Liu C-H, et al. Retrognn: fast estimation of synthesizability for virtual screening and de novo design by learning from slow retrosynthesis software. J. Chem. Inf. Model. 2022;62:2293–2300. doi: 10.1021/acs.jcim.1c01476. - DOI - PubMed
    1. Yan C, et al. Non-fullerene acceptors for organic solar cells. Nat. Rev. Mater. 2018;3:1–19. doi: 10.1038/natrevmats.2018.3. - DOI
    1. DiMasi JA, Grabowski HG, Hansen RW. Innovation in the pharmaceutical industry: new estimates of r&d costs. J. Health Econ. 2016;47:20–33. doi: 10.1016/j.jhealeco.2016.01.012. - DOI - PubMed