Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jun 20;9(26):27987-27997.
doi: 10.1021/acsomega.3c10422. eCollection 2024 Jul 2.

Optimal Dynamic Regimes for CO Oxidation Discovered by Reinforcement Learning

Affiliations

Optimal Dynamic Regimes for CO Oxidation Discovered by Reinforcement Learning

Mikhail S Lifar et al. ACS Omega. .

Abstract

Metal nanoparticles are widely used as heterogeneous catalysts to activate adsorbed molecules and reduce the energy barrier of the reaction. Reaction product yield depends on the interplay between elementary processes: adsorption, activation, desorption, and reaction. These processes, in turn, depend on the inlet gas composition, temperature, and pressure. At a steady state, the active surface sites may be inaccessible due to adsorbed reagents. Periodic regime may thus improve the yield, but the appropriate period and waveform are not known in advance. Dynamic control should account for surface and atmospheric modifications and adjust reaction parameters according to the current state of the system and its history. In this work, we applied a reinforcement learning algorithm to control CO oxidation on a palladium catalyst. The policy gradient algorithm was trained in the theoretical environment, parametrized from experimental data. The algorithm learned to maximize the CO2 formation rate based on CO and O2 partial pressures for several successive time steps. Within a unified approach, we found optimal stationary, periodic, and nonperiodic regimes for different problem formulations and gained insight into why the dynamic regime can be preferential. In general, this work contributes to the task of popularizing the reinforcement learning approach in the field of catalytic science.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
Simplified elementary steps of CO oxidation on a stable Pd surface.
Figure 2
Figure 2
Two problem statements for optimization by RL algorithm. (a) Global optimization of the reaction yield by optimizing both reagents CO and O2 to maximize reaction yield. (b) Response to external conditions by adjusting O2 pressure on varying CO partial pressure to maximize reaction yield. (c) Input gas pressures are controlled by the algorithm. The time intervals between the pressure switches (actions) are highlighted with arrows.
Figure 3
Figure 3
Typical successful learning curve of the algorithm. Each episode lasts for 30 s, the algorithm can switch gas flows every 5 s, and the reaction rate is supplied as a reward on each step.
Figure 4
Figure 4
Ratio of the mean reaction rates in solutions obtained by reinforcement learning (steady state or dynamic) and Nelder–Mead (steady state), depending on rate constants k2 and k5. The RL solutions from cells 1 and 2 are obtained for parameter Sets 1 and 2 (Table 1) and are visualized further in Figures 5 and 6
Figure 5
Figure 5
Temporal variations for CO (orange) and O2 (blue) partial pressures (a), coverages, and reaction rate (b) found by the RL algorithm. The rate constants were taken from set 1 (see Table 1 and Figure 4). The algorithm was allowed to vary gas pressures every 5 s and the episode length for training was 30 s. Green line corresponds to the CO2 formation rate. Dashed lines denote the Nelder–Mead steady optimal regimes.
Figure 6
Figure 6
Two different dynamic regimes discovered by the algorithm with a 30 s episode for training (a, b) and 240 s (c, d) (Set 2, Table 1 and Figure 4). The longer episode length allowed the algorithm to find the periodic solution and prevent asymptotic decay of the reaction yield.
Figure 7
Figure 7
(a) Region of accessible coverages in the steady-state regime (dashed) with the red point indicating conditions for the highest steady reaction rate. The blue line shows the trajectory of coverages in the optimal dynamic regime discovered by RL. (b) CO and oxygen coverages in dynamic regime (solid lines) compared to the ones in optimal steady state in the panel (a) (dashed lines). (c) Advantage of the periodic regime is highlighted by the integral CO2 output.
Figure 8
Figure 8
RL agent policy examined against two CO feed regimes of random (a, b) and triangle (c, d) forms. The top panels (a, c) show the algorithm’s policy, and the bottom panels (b, d) show corresponding reaction rates and coverage dynamics. The agent adjusted the O2 pressure to the supplied CO pressure variations. Model parameters were set according to Set 3 (Table 1).
Figure 9
Figure 9
Validation of the dynamic regime in the kinetic Monte Carlo model. Upon replacing the CO atmosphere with O2 atm (panel (a)), the reaction rate can increases and becomes higher than under optimal steady-state conditions (panel (b)). (c) CO and oxygen occupation of the surface sites at selected times of high (t = 30 s) and low (t = 55 s) conversions.

References

    1. Pakhare D.; Spivey J. A Review of Dry (Co2) Reforming of Methane over Noble Metal Catalysts. Chem. Soc. Rev. 2014, 43, 7813–7837. 10.1039/C3CS60395D. - DOI - PubMed
    1. Pareek V.; Bhargava A.; Gupta R.; Jain N.; Panwar J. Synthesis and Applications of Noble Metal Nanoparticles: A Review. Adv. Sci., Eng. Med. 2017, 9, 527–544. 10.1166/asem.2017.2027. - DOI
    1. Docherty S. R.; Phongprueksathat N.; Lam E.; Noh G.; Safonova O. V.; Urakawa A.; Copéret C. Silica-Supported Pdga Nanoparticles: Metal Synergy for Highly Active and Selective Co2-to-Ch3oh Hydrogenation. JACS Au 2021, 1, 450–458. 10.1021/jacsau.1c00021. - DOI - PMC - PubMed
    1. Sadykov I. I.; Zabilskiy M.; Clark A. H.; Krumeich F.; Sushkevich V.; van Bokhoven J. A.; Nachtegaal M.; Safonova O. V. Time-Resolved Xas Provides Direct Evidence for Oxygen Activation on Cationic Iron in a Bimetallic Pt-Feo X/Al2o3 Catalyst. ACS Catal. 2021, 11, 11793–11805. 10.1021/acscatal.1c02795. - DOI
    1. Soliman N. Factors Affecting Co Oxidation Reaction over Nanosized Materials: A Review. J. Mater. Res. Technol. 2019, 8, 2395–2407. 10.1016/j.jmrt.2018.12.012. - DOI

LinkOut - more resources