Optimizing Chemical Reactions with Deep Reinforcement Learning

Zhenpeng Zhou¹, Xiaocheng Li², Richard N Zare¹

Affiliations

¹ Department of Chemistry, Stanford University, Stanford, California 94305, United States.
² Department of Management Science and Engineering, Stanford University, Stanford, California 94305, United States.

PMID: 29296675
PMCID: PMC5746857
DOI: 10.1021/acscentsci.7b00492

Optimizing Chemical Reactions with Deep Reinforcement Learning

Zhenpeng Zhou et al. ACS Cent Sci. 2017.

. 2017 Dec 27;3(12):1337-1344.

doi: 10.1021/acscentsci.7b00492. Epub 2017 Dec 15.

Authors

Zhenpeng Zhou¹, Xiaocheng Li², Richard N Zare¹

Affiliations

¹ Department of Chemistry, Stanford University, Stanford, California 94305, United States.
² Department of Management Science and Engineering, Stanford University, Stanford, California 94305, United States.

PMID: 29296675
PMCID: PMC5746857
DOI: 10.1021/acscentsci.7b00492

Abstract

Deep reinforcement learning was employed to optimize chemical reactions. Our model iteratively records the results of a chemical reaction and chooses new experimental conditions to improve the reaction outcome. This model outperformed a state-of-the-art blackbox optimization algorithm by using 71% fewer steps on both simulations and real reactions. Furthermore, we introduced an efficient exploration strategy by drawing the reaction conditions from certain probability distributions, which resulted in an improvement on regret from 0.062 to 0.039 compared with a deterministic policy. Combining the efficient exploration policy with accelerated microdroplet reactions, optimal reaction conditions were determined in 30 min for the four reactions considered, and a better understanding of the factors that control microdroplet reactions was reached. Moreover, our model showed a better performance after training on reactions with similar or even dissimilar underlying mechanisms, which demonstrates its learning ability.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

**Scheme 1. Visualization of the DRO Model Unrolled over Three Time Steps**
As stated earlier, the environment of chemical reaction is characterized by the reaction function of r = R(s).

**Figure 1**
(A) Comparison of average regret of CMA-ES, Nelder–Mead simplex method, SNOBFIT, and DRO. The average regret is calculated as the average regret on 1000 random nonconvex functions. (B) The observed regret of 10 random nonconvex functions in which each line is the regret of one function.

**Figure 2**
Comparison of deterministic policy and randomized policy in the model of DRO.

Scheme 2. (a) Pomeranz–Fritsch Synthesis of Isoquinoline, (b) Friedländer Synthesis of a Substituted Quinoline, (c) Synthesis of Ribose Phosphate, and (d) the Reaction between 2,6-Dichlorophenolindophenol (DCIP) and Ascorbic Acid

**Figure 3**
Performance comparison of CMA-ES, DRO, and OVAT methods on the microdroplet reaction of (A) Pomeranz–Fritsch synthesis of isoquinoline, (B) Friedländer synthesis of a substituted quinoline, (C) synthesis of ribose phosphate, and (D) the reaction between DCIP and ascorbic acid. The signal intensity can be converted into reaction yield with calibration.

**Figure 4**
Performance comparison of CMA-ES and DRO on the bulk-phase reaction of silver nanoparticle synthesis.

**Figure 5**
(A) The performance on Friedländer synthesis of DRO before and after training on the Pomeranz–Fritsch synthesis. (B) The performance on ribose phosphate synthesis of DRO before and after training on the Pomeranz–Fritsch and Friedländer syntheses.

**Figure 6**
Possible reaction response surface of the Friedländer synthesis of a substituted quinoline, predicted from the optimization process.

See this image and copyright information in PMC

References

1. McMullen J. P.; Jensen K. F. Integrated Microreactors for Reaction Automation: New Approaches to Reaction Development. Annu. Rev. Anal. Chem. 2010, 3, 19–42. 10.1146/annurev.anchem.111808.073718. - DOI - PubMed
1. Fabry D. C.; Sugiono E.; Rueping M. Self-Optimizing Reactor Systems: Algorithms, On-line Analytics, Setups, and Strategies for Accelerating Continuous Flow Process Optimization. Isr. J. Chem. 2014, 54, 341–350. 10.1002/ijch.201300080. - DOI
1. McMullen J. P.; Stone M. T.; Buchwald S. L.; Jensen K. F. An integrated microreactor system for self-optimization of a Heck reaction: from micro- to mesoscale flow systems. Angew. Chem., Int. Ed. 2010, 49, 7076–7080. 10.1002/anie.201002590. - DOI - PubMed
1. Parrott A. J.; Bourne R. A.; Akien G. R.; Irvine D. J.; Poliakoff M. Self-Optimizing Continuous Reactions in Supercritical Carbon Dioxide. Angew. Chem., Int. Ed. 2011, 50, 3788–3792. 10.1002/anie.201100412. - DOI - PubMed
1. Reizman B. J.; Wang Y.-M.; Buchwald S. L.; Jensen K. F. Suzuki–Miyaura cross-coupling optimization enabled by automated feedback. React. Chem. Eng. 2016, 1, 658–666. 10.1039/C6RE00153J. - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Optimizing Chemical Reactions with Deep Reinforcement Learning

Affiliations

Optimizing Chemical Reactions with Deep Reinforcement Learning

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources

Other Literature Sources