Towards efficient discovery of green synthetic pathways with Monte Carlo tree search and reinforcement learning

doi:10.1039/d0sc04184j

. 2020 Sep 14;11(40):10959-10972.

doi: 10.1039/d0sc04184j.

Towards efficient discovery of green synthetic pathways with Monte Carlo tree search and reinforcement learning

Xiaoxue Wang^{1

2}, Yujie Qian³, Hanyu Gao¹, Connor W Coley¹, Yiming Mo¹, Regina Barzilay³, Klavs F Jensen¹

Affiliations

¹ Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA kfjensen@mit.edu.
² Department of Chemical and Biomolecular Engineering, The Ohio State University Columbus Ohio 43210 USA.
³ Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA.

PMID: 34094345
PMCID: PMC8162445
DOI: 10.1039/d0sc04184j

Towards efficient discovery of green synthetic pathways with Monte Carlo tree search and reinforcement learning

Xiaoxue Wang et al. Chem Sci. 2020.

. 2020 Sep 14;11(40):10959-10972.

doi: 10.1039/d0sc04184j.

Authors

Xiaoxue Wang^{1

2}, Yujie Qian³, Hanyu Gao¹, Connor W Coley¹, Yiming Mo¹, Regina Barzilay³, Klavs F Jensen¹

Affiliations

¹ Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA kfjensen@mit.edu.
² Department of Chemical and Biomolecular Engineering, The Ohio State University Columbus Ohio 43210 USA.
³ Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology Cambridge Massachusetts 02139 USA.

PMID: 34094345
PMCID: PMC8162445
DOI: 10.1039/d0sc04184j

Abstract

Computer aided synthesis planning of synthetic pathways with green process conditions has become of increasing importance in organic chemistry, but the large search space inherent in synthesis planning and the difficulty in predicting reaction conditions make it a significant challenge. We introduce a new Monte Carlo Tree Search (MCTS) variant that promotes balance between exploration and exploitation across the synthesis space. Together with a value network trained from reinforcement learning and a solvent-prediction neural network, our algorithm is comparable to the best MCTS variant (PUCT, similar to Google's Alpha Go) in finding valid synthesis pathways within a fixed searching time, and superior in identifying shorter routes with greener solvents under the same search conditions. In addition, with the same root compound visit count, our algorithm outperforms the PUCT MCTS by 16% in terms of determining successful routes. Overall the success rate is improved by 19.7% compared to the upper confidence bound applied to trees (UCT) MCTS method. Moreover, we improve 71.4% of the routes proposed by the PUCT MCTS variant in pathway length and choices of green solvents. The approach generally enables including Green Chemistry considerations in computer aided synthesis planning with potential applications in process development for fine chemicals or pharmaceuticals.

This journal is © The Royal Society of Chemistry.

PubMed Disclaimer

Conflict of interest statement

There are no conflicts to declare.

Figures

Fig. 1. The process of Monte Carlo Tree Search in synthesis planning. Following the notations of MDP, a molecule (or state) is denoted as s, and a template (or retrosynthetic disconnection action) is denoted as a. In the selection phase, starting from the target molecule, the most “promising” template is recursively chosen by selecting the template with the highest upper confidence bound (UCB(s,a)) value until a leaf node is reached. A policy network is used to narrow down the search beam in each template selection step. In the expansion phase, the leaf node is expanded by applying the selected template. New leaf nodes (precursors) that are not visited by the tree expander before are generated. Once the new leaf nodes are encountered, in the evaluation step, a value network is used to evaluate the values of the leaf nodes (if the node is buyable, the value is set to 1). Then in the backpropagation step, upward along the tree, the visit count N(s,a) of each compound-template (s,a) pairs, or edges, are updated. The Q(s,a) value (see Table 1) is recalculated as well and used to recompute UCB(s,a) values in the next selection step. With the updated values, the tree expander goes back to the selection phase, starting selecting the most promising template for the target molecule (root node) again. Here circles denote compounds. (Blue) not commercially available; (Green) commercially available.

Fig. 2. Bootstrapping process and the reinforcement learning process to train a value network. (a) The bottom-up propagation of z(s) value for bootstrapping. If the route is not from buyable precursors, the z value for all non-buyable compounds are zero (left). If the route is from buyable precursors, starting from the leaf buyable precursors (z = 1), the z value of a compound in the tree will be assigned as the average z value of the compounds' immediate precursors times a discount factor γ (0< γ < 1). If another route under the same compound generates higher z value than the current route, the z value of the compound will be updated to the larger value. Here circles denote compounds: blue circles are compounds that are not commercially available and green ones are buyable compounds. The triangles denote the templates a, through which compounds are transformed into corresponding precursors. (b) The RL process to train the value network. With the z value sampled in (b) from MCTS, a value network can be trained so that we can map v_θ(s) to z(s).

Fig. 3. The success rate of finding buyable synthesis pathways by MCTS variants. Here for the modified UCT with dynamic c tuning and value network (mUCT-dc-V), c value is initialized with 0.1. For all other UCT type MCTS variants, c = 0.1. For both PUCT type of MCTS, c = 1. The value network used here is the Round 1 RL value network. (a) The performance of MCTS expansions for 30 s on test and training sets. The values of the compounds in the buyable catalogue are set to 1 and overrides the value given by the value network. The success rates of the mUCT-dc-V method and the PUCT-V method out stands from all the variants. (b)The success rates of MCTS expansions with a fixed root visit count of 5000 on 1000 compounds, which is the same test set as (a). mUCT-dc-V significantly outperforms all other MCTS variants.

Fig. 4. Examples of chemical routes that mUCT-dc-V method can solve within 30 s while PUCT-V cannot. The value of the P(s,a) given by the policy network and the ranking of the template among the top 50 templates are given. The unique advantage of mUCT-dc-V method is that the P(s,a) value is not explicitly used, therefore even if the P(s,a) value is extremely small as a result of the imperfect policy network, the valid template will still be explored by the tree expander, which is not the case in PUCT-V method. The value network here is Round 1 RL value network. The policy network used by both MCTS variants are the same. The restrictions for both MCTS variants are the same: top 50 templates given by policy network are considered, maximum depth is 10, and minimum plausibility is 0.75 (see Methods). The affected functional groups in each step are marked in blue. The buyable compounds are framed in green.

Fig. 5. Using MCTS to find short synthesis pathways using green solvents. (a) Assigning scores for solvents in the solvent database. (b) Using the prediction of the solvent prediction model to define the reaction solvent score. The suggested top three solvents are shown with the probabilities listed and solvent scores in parentheses. The reaction solvent score (RSS) is defined as the weighted average of the top three solvent scores. (c) Converting the reactions solvent score to reaction solvent penalty (R solvent penalty), then defining the compound solvent score using the R solvent penalty. The compound solvent score (CSS) is defined as the maximal cumulated RSS in a valid pathway. The greenest route for a compound is the path which lead to the CSS. CSS is a function of tree expander and the root compound. It is essential to optimize the tree expander so that CSS can be optimized.

Fig. 6. The greenness of the synthetic routes (compound solvent score (CSS) of the root compound) generated by mUCT-dc-V when compared with PUCT-V as baseline method. Both algorithms use Round 1 RL value network and the tree expansion is restricted within 30 s. 71.4% of the cases show higher root CSS generated by mUCT-dc-V than by PUCT-V.

Fig. 7. Case study of the greenest routes generated by PUCT-V and mUCT-dc-V algorithms. The root CSS is the compound solvent score (CSS) of the root compound, which reflects the overall greenness of the best route in the tree. Typically PUCT-V algorithm generates much longer synthetic routes with resultant accumulative reaction penalties, or CSS, much more negative than the routes generated by mUCT-dc-V algorithm. Orange framed compounds are the most probable solvent suggested by the solvent prediction network. Note that the reaction solvent score (RSS) is the weighted average of the top three solvents suggested, therefore even the top 1 solvents are the same for two reactions, their RSS'es may vary, and therefore their reaction penalties may vary. Green framed compounds are commercially available compounds. The affected functional groups in each step are marked in blue.

**Fig. 8. The minimum visit count N(s,a) required by the first switching as a function of Q/c value.**

Fig. 9. The total visit count of each template before all templates are visited if c = current max Q/2. (a) Average visit counts, (b) maximal visit counts. The results are obtained from 10⁴ random simulations.

**Fig. 10. The dynamic method to decide the value of c. We define c as half of the current max Q(s,b) value during the tree expansion process during which the visit count of the compound s increases.**

**Fig. 11. Architecture of the value network.**

See this image and copyright information in PMC

Cited by

Retrosynthetic planning with experience-guided Monte Carlo tree search.
Hong S, Zhuo HH, Jin K, Shao G, Zhou Z. Hong S, et al. Commun Chem. 2023 Jun 10;6(1):120. doi: 10.1038/s42004-023-00911-8. Commun Chem. 2023. PMID: 37301940 Free PMC article.
Enhancing Monte Carlo Tree Search for Retrosynthesis.
Blackshaw TM, Davies JC, Spoerer KT, Hirst JD. Blackshaw TM, et al. J Chem Inf Model. 2025 Jul 14;65(13):6537-6546. doi: 10.1021/acs.jcim.5c00417. Epub 2025 Jun 13. J Chem Inf Model. 2025. PMID: 40512567 Free PMC article.
Predictive chemistry: machine learning for reaction deployment, reaction development, and reaction discovery.
Tu Z, Stuyver T, Coley CW. Tu Z, et al. Chem Sci. 2022 Nov 28;14(2):226-244. doi: 10.1039/d2sc05089g. eCollection 2023 Jan 4. Chem Sci. 2022. PMID: 36743887 Free PMC article. Review.
Learning in continuous action space for developing high dimensional potential energy models.
Manna S, Loeffler TD, Batra R, Banik S, Chan H, Varughese B, Sasikumar K, Sternberg M, Peterka T, Cherukara MJ, Gray SK, Sumpter BG, Sankaranarayanan SKRS. Manna S, et al. Nat Commun. 2022 Jan 18;13(1):368. doi: 10.1038/s41467-021-27849-6. Nat Commun. 2022. PMID: 35042872 Free PMC article.
Artificial Intelligence Methods and Models for Retro-Biosynthesis: A Scoping Review.
Gricourt G, Meyer P, Duigou T, Faulon JL. Gricourt G, et al. ACS Synth Biol. 2024 Aug 16;13(8):2276-2294. doi: 10.1021/acssynbio.4c00091. Epub 2024 Jul 24. ACS Synth Biol. 2024. PMID: 39047143 Free PMC article.

See all "Cited by" articles

References

1. Segler M. H. S. Preuss M. Waller M. P. Nature. 2018;555:604–610. - PubMed
1. Baylon J. L. Cilfone N. A. Gulcher J. R. Chittenden T. W. J. Chem. Inf. Model. 2019;59:673–688. - PubMed
1. Schreck J. S. Coley C. W. Bishop K. J. M. ACS Cent. Sci. 2019;5:970–981. - PMC - PubMed
1. Cook A. Johnson A. P. Law J. Mirzazadeh M. Ravitz O. Simon A. WIREs Comput. Mol. Sci. 2012;2:79–107.
1. Szymkuć S. Gajewska E. P. Klucznik T. Molga K. Dittwald P. Startek M. Bajczyk M. Grzybowski B. A. Angew. Chem., Int. Ed. 2016;55:5904–5937. - PubMed

LinkOut - more resources

Full Text Sources

[1] Segler M. H. S. Preuss M. Waller M. P. Nature. 2018;555:604–610. - PubMed

[2] Segler M. H. S. Preuss M. Waller M. P. Nature. 2018;555:604–610. - PubMed

[3] Baylon J. L. Cilfone N. A. Gulcher J. R. Chittenden T. W. J. Chem. Inf. Model. 2019;59:673–688. - PubMed

[4] Baylon J. L. Cilfone N. A. Gulcher J. R. Chittenden T. W. J. Chem. Inf. Model. 2019;59:673–688. - PubMed

[5] Schreck J. S. Coley C. W. Bishop K. J. M. ACS Cent. Sci. 2019;5:970–981. - PMC - PubMed

[6] Schreck J. S. Coley C. W. Bishop K. J. M. ACS Cent. Sci. 2019;5:970–981. - PMC - PubMed

[7] Cook A. Johnson A. P. Law J. Mirzazadeh M. Ravitz O. Simon A. WIREs Comput. Mol. Sci. 2012;2:79–107.

[8] Cook A. Johnson A. P. Law J. Mirzazadeh M. Ravitz O. Simon A. WIREs Comput. Mol. Sci. 2012;2:79–107.

[9] Szymkuć S. Gajewska E. P. Klucznik T. Molga K. Dittwald P. Startek M. Bajczyk M. Grzybowski B. A. Angew. Chem., Int. Ed. 2016;55:5904–5937. - PubMed

[10] Szymkuć S. Gajewska E. P. Klucznik T. Molga K. Dittwald P. Startek M. Bajczyk M. Grzybowski B. A. Angew. Chem., Int. Ed. 2016;55:5904–5937. - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Towards efficient discovery of green synthetic pathways with Monte Carlo tree search and reinforcement learning

Affiliations

Towards efficient discovery of green synthetic pathways with Monte Carlo tree search and reinforcement learning

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources