Optimal Dynamic Regimes for CO Oxidation Discovered by Reinforcement Learning

Mikhail S Lifar¹, Andrei A Tereshchenko¹, Aleksei N Bulgakov¹, Sergey A Guda^{1

2}, Alexander A Guda¹, Alexander V Soldatov¹

Affiliations

¹ The Smart Materials Research Institute, Southern Federal University, 344090 Rostov-on-Don, Russia.
² Institute for Mathematics, Mechanics and Computer Science in the name of I.I. Vorovich, Southern Federal University, 344090 Rostov-on-Don, Russia.

PMID: 38973853
PMCID: PMC11223201
DOI: 10.1021/acsomega.3c10422

Optimal Dynamic Regimes for CO Oxidation Discovered by Reinforcement Learning

Mikhail S Lifar et al. ACS Omega. 2024.

. 2024 Jun 20;9(26):27987-27997.

doi: 10.1021/acsomega.3c10422. eCollection 2024 Jul 2.

Authors

Mikhail S Lifar¹, Andrei A Tereshchenko¹, Aleksei N Bulgakov¹, Sergey A Guda^{1

2}, Alexander A Guda¹, Alexander V Soldatov¹

Affiliations

¹ The Smart Materials Research Institute, Southern Federal University, 344090 Rostov-on-Don, Russia.
² Institute for Mathematics, Mechanics and Computer Science in the name of I.I. Vorovich, Southern Federal University, 344090 Rostov-on-Don, Russia.

PMID: 38973853
PMCID: PMC11223201
DOI: 10.1021/acsomega.3c10422

Abstract

Metal nanoparticles are widely used as heterogeneous catalysts to activate adsorbed molecules and reduce the energy barrier of the reaction. Reaction product yield depends on the interplay between elementary processes: adsorption, activation, desorption, and reaction. These processes, in turn, depend on the inlet gas composition, temperature, and pressure. At a steady state, the active surface sites may be inaccessible due to adsorbed reagents. Periodic regime may thus improve the yield, but the appropriate period and waveform are not known in advance. Dynamic control should account for surface and atmospheric modifications and adjust reaction parameters according to the current state of the system and its history. In this work, we applied a reinforcement learning algorithm to control CO oxidation on a palladium catalyst. The policy gradient algorithm was trained in the theoretical environment, parametrized from experimental data. The algorithm learned to maximize the CO₂ formation rate based on CO and O₂ partial pressures for several successive time steps. Within a unified approach, we found optimal stationary, periodic, and nonperiodic regimes for different problem formulations and gained insight into why the dynamic regime can be preferential. In general, this work contributes to the task of popularizing the reinforcement learning approach in the field of catalytic science.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

**Figure 1**
Simplified elementary steps of CO oxidation on a stable Pd surface.

**Figure 2**
Two problem statements for optimization by RL algorithm. (a) Global optimization of the reaction yield by optimizing both reagents CO and O₂ to maximize reaction yield. (b) Response to external conditions by adjusting O₂ pressure on varying CO partial pressure to maximize reaction yield. (c) Input gas pressures are controlled by the algorithm. The time intervals between the pressure switches (actions) are highlighted with arrows.

**Figure 3**
Typical successful learning curve of the algorithm. Each episode lasts for 30 s, the algorithm can switch gas flows every 5 s, and the reaction rate is supplied as a reward on each step.

**Figure 4**
Ratio of the mean reaction rates in solutions obtained by reinforcement learning (steady state or dynamic) and Nelder–Mead (steady state), depending on rate constants k₂ and k₅. The RL solutions from cells 1 and 2 are obtained for parameter Sets 1 and 2 (Table 1) and are visualized further in Figures 5 and 6

**Figure 5**
Temporal variations for CO (orange) and O₂ (blue) partial pressures (a), coverages, and reaction rate (b) found by the RL algorithm. The rate constants were taken from set 1 (see Table 1 and Figure 4). The algorithm was allowed to vary gas pressures every 5 s and the episode length for training was 30 s. Green line corresponds to the CO₂ formation rate. Dashed lines denote the Nelder–Mead steady optimal regimes.

**Figure 6**
Two different dynamic regimes discovered by the algorithm with a 30 s episode for training (a, b) and 240 s (c, d) (Set 2, Table 1 and Figure 4). The longer episode length allowed the algorithm to find the periodic solution and prevent asymptotic decay of the reaction yield.

**Figure 7**
(a) Region of accessible coverages in the steady-state regime (dashed) with the red point indicating conditions for the highest steady reaction rate. The blue line shows the trajectory of coverages in the optimal dynamic regime discovered by RL. (b) CO and oxygen coverages in dynamic regime (solid lines) compared to the ones in optimal steady state in the panel (a) (dashed lines). (c) Advantage of the periodic regime is highlighted by the integral CO₂ output.

**Figure 8**
RL agent policy examined against two CO feed regimes of random (a, b) and triangle (c, d) forms. The top panels (a, c) show the algorithm’s policy, and the bottom panels (b, d) show corresponding reaction rates and coverage dynamics. The agent adjusted the O₂ pressure to the supplied CO pressure variations. Model parameters were set according to Set 3 (Table 1).

**Figure 9**
Validation of the dynamic regime in the kinetic Monte Carlo model. Upon replacing the CO atmosphere with O₂ atm (panel (a)), the reaction rate can increases and becomes higher than under optimal steady-state conditions (panel (b)). (c) CO and oxygen occupation of the surface sites at selected times of high (t = 30 s) and low (t = 55 s) conversions.

See this image and copyright information in PMC

References

1. Pakhare D.; Spivey J. A Review of Dry (Co2) Reforming of Methane over Noble Metal Catalysts. Chem. Soc. Rev. 2014, 43, 7813–7837. 10.1039/C3CS60395D. - DOI - PubMed
1. Pareek V.; Bhargava A.; Gupta R.; Jain N.; Panwar J. Synthesis and Applications of Noble Metal Nanoparticles: A Review. Adv. Sci., Eng. Med. 2017, 9, 527–544. 10.1166/asem.2017.2027. - DOI
1. Docherty S. R.; Phongprueksathat N.; Lam E.; Noh G.; Safonova O. V.; Urakawa A.; Copéret C. Silica-Supported Pdga Nanoparticles: Metal Synergy for Highly Active and Selective Co2-to-Ch3oh Hydrogenation. JACS Au 2021, 1, 450–458. 10.1021/jacsau.1c00021. - DOI - PMC - PubMed
1. Sadykov I. I.; Zabilskiy M.; Clark A. H.; Krumeich F.; Sushkevich V.; van Bokhoven J. A.; Nachtegaal M.; Safonova O. V. Time-Resolved Xas Provides Direct Evidence for Oxygen Activation on Cationic Iron in a Bimetallic Pt-Feo X/Al2o3 Catalyst. ACS Catal. 2021, 11, 11793–11805. 10.1021/acscatal.1c02795. - DOI
1. Soliman N. Factors Affecting Co Oxidation Reaction over Nanosized Materials: A Review. J. Mater. Res. Technol. 2019, 8, 2395–2407. 10.1016/j.jmrt.2018.12.012. - DOI

LinkOut - more resources

Full Text Sources
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Optimal Dynamic Regimes for CO Oxidation Discovered by Reinforcement Learning

Affiliations

Optimal Dynamic Regimes for CO Oxidation Discovered by Reinforcement Learning

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources