Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Dec 27;3(12):1337-1344.
doi: 10.1021/acscentsci.7b00492. Epub 2017 Dec 15.

Optimizing Chemical Reactions with Deep Reinforcement Learning

Affiliations

Optimizing Chemical Reactions with Deep Reinforcement Learning

Zhenpeng Zhou et al. ACS Cent Sci. .

Abstract

Deep reinforcement learning was employed to optimize chemical reactions. Our model iteratively records the results of a chemical reaction and chooses new experimental conditions to improve the reaction outcome. This model outperformed a state-of-the-art blackbox optimization algorithm by using 71% fewer steps on both simulations and real reactions. Furthermore, we introduced an efficient exploration strategy by drawing the reaction conditions from certain probability distributions, which resulted in an improvement on regret from 0.062 to 0.039 compared with a deterministic policy. Combining the efficient exploration policy with accelerated microdroplet reactions, optimal reaction conditions were determined in 30 min for the four reactions considered, and a better understanding of the factors that control microdroplet reactions was reached. Moreover, our model showed a better performance after training on reactions with similar or even dissimilar underlying mechanisms, which demonstrates its learning ability.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Scheme 1
Scheme 1. Visualization of the DRO Model Unrolled over Three Time Steps
As stated earlier, the environment of chemical reaction is characterized by the reaction function of r = R(s).
Figure 1
Figure 1
(A) Comparison of average regret of CMA-ES, Nelder–Mead simplex method, SNOBFIT, and DRO. The average regret is calculated as the average regret on 1000 random nonconvex functions. (B) The observed regret of 10 random nonconvex functions in which each line is the regret of one function.
Figure 2
Figure 2
Comparison of deterministic policy and randomized policy in the model of DRO.
Scheme 2
Scheme 2. (a) Pomeranz–Fritsch Synthesis of Isoquinoline, (b) Friedländer Synthesis of a Substituted Quinoline, (c) Synthesis of Ribose Phosphate, and (d) the Reaction between 2,6-Dichlorophenolindophenol (DCIP) and Ascorbic Acid
Figure 3
Figure 3
Performance comparison of CMA-ES, DRO, and OVAT methods on the microdroplet reaction of (A) Pomeranz–Fritsch synthesis of isoquinoline, (B) Friedländer synthesis of a substituted quinoline, (C) synthesis of ribose phosphate, and (D) the reaction between DCIP and ascorbic acid. The signal intensity can be converted into reaction yield with calibration.
Figure 4
Figure 4
Performance comparison of CMA-ES and DRO on the bulk-phase reaction of silver nanoparticle synthesis.
Figure 5
Figure 5
(A) The performance on Friedländer synthesis of DRO before and after training on the Pomeranz–Fritsch synthesis. (B) The performance on ribose phosphate synthesis of DRO before and after training on the Pomeranz–Fritsch and Friedländer syntheses.
Figure 6
Figure 6
Possible reaction response surface of the Friedländer synthesis of a substituted quinoline, predicted from the optimization process.

References

    1. McMullen J. P.; Jensen K. F. Integrated Microreactors for Reaction Automation: New Approaches to Reaction Development. Annu. Rev. Anal. Chem. 2010, 3, 19–42. 10.1146/annurev.anchem.111808.073718. - DOI - PubMed
    1. Fabry D. C.; Sugiono E.; Rueping M. Self-Optimizing Reactor Systems: Algorithms, On-line Analytics, Setups, and Strategies for Accelerating Continuous Flow Process Optimization. Isr. J. Chem. 2014, 54, 341–350. 10.1002/ijch.201300080. - DOI
    1. McMullen J. P.; Stone M. T.; Buchwald S. L.; Jensen K. F. An integrated microreactor system for self-optimization of a Heck reaction: from micro- to mesoscale flow systems. Angew. Chem., Int. Ed. 2010, 49, 7076–7080. 10.1002/anie.201002590. - DOI - PubMed
    1. Parrott A. J.; Bourne R. A.; Akien G. R.; Irvine D. J.; Poliakoff M. Self-Optimizing Continuous Reactions in Supercritical Carbon Dioxide. Angew. Chem., Int. Ed. 2011, 50, 3788–3792. 10.1002/anie.201100412. - DOI - PubMed
    1. Reizman B. J.; Wang Y.-M.; Buchwald S. L.; Jensen K. F. Suzuki–Miyaura cross-coupling optimization enabled by automated feedback. React. Chem. Eng. 2016, 1, 658–666. 10.1039/C6RE00153J. - DOI - PMC - PubMed