Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Oct 13;126(40):7051-7069.
doi: 10.1021/acs.jpca.2c06408. Epub 2022 Oct 3.

Graph-Driven Reaction Discovery: Progress, Challenges, and Future Opportunities

Affiliations
Review

Graph-Driven Reaction Discovery: Progress, Challenges, and Future Opportunities

Idil Ismail et al. J Phys Chem A. .

Abstract

Graph-based descriptors, such as bond-order matrices and adjacency matrices, offer a simple and compact way of categorizing molecular structures; furthermore, such descriptors can be readily used to catalog chemical reactions (i.e., bond-making and -breaking). As such, a number of graph-based methodologies have been developed with the goal of automating the process of generating chemical reaction network models describing the possible mechanistic chemistry in a given set of reactant species. Here, we outline the evolution of these graph-based reaction discovery schemes, with particular emphasis on more recent methods incorporating graph-based methods with semiempirical and ab initio electronic structure calculations, minimum-energy path refinements, and transition state searches. Using representative examples from homogeneous catalysis and interstellar chemistry, we highlight how these schemes increasingly act as "virtual reaction vessels" for interrogating mechanistic questions. Finally, we highlight where challenges remain, including issues of chemical accuracy and calculation speeds, as well as the inherent challenge of dealing with the vast size of accessible chemical reaction space.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
Chemical reaction networks (CRNs) serve to connect (a) experimental synthesis and characterization of reactive systems to (b) ab initio characterization of individual elementary reaction steps.
Figure 2
Figure 2
Overview of (a) PES-driven ARD schemes (e.g., AFIR, ab initio nanoreactor, SHS, and TSSCDS), and (b) graph-driven ARD schemes (e.g., single- and double-ended graph-driven sampling [SEGDS, DEGDS respectively], NetGen, RMG, YARP).
Figure 3
Figure 3
Panels (a–c) represent different regions of chemical space, naturally discretized by defining the bonding graphs shown; for example, panel (a) represents the configurational space of all systems which have the bonding graph shown (corresponding to formaldehyde). In panel (c), we note that the bonding graph shown describes both cis and trans isomers of HCOH; as such, simple bonding-graph schemes fail to distinguish conformational isomers and may require further postprocessing to account for conformational differences.
Figure 4
Figure 4
Overview of reaction class definitions, as employed in our recent work.,,− The left-hand panel shows the initial and final bonding graphs for three representative two- or three-atom reactions, namely, (1) dissociation, (2) association, and (3) diatomic dissociation. Here, the bonding matrices show the connectivity for atoms (i, j) or (i, j, k) before and after a given reaction is applied to a reactant set; by selecting a reaction class and related atomic indices, one can automatically induce reactions on a system’s bonding graph to generate new products. The right-hand side shows illustrative examples of this scheme. Starting from formaldehyde, application of reaction class (1) to atoms (i, j) = (1, 3) results in a valid product structure (assuming that the allowed valence range of hydrogen includes zero). Similarly, applying reaction class (3) to atoms (i, j, k) = (1, 3, 4) results in dissociation of molecular hydrogen, which is again considered here to be a valid structure. However, applying reaction class (1) to (i, j) = (1, 2) would here be rejected as a valid reaction, assuming that the allowed valence range of oxygen does not include zero. These examples illustrate how application of generic reaction classes, combined with standard valence constraints, can be used to quickly build a CRN.
Figure 5
Figure 5
Comparison of (a) SEGDS and (b) DEGDS. In the single-ended scheme, repeated application of reaction classes to different sets of reactive atomic indices generates a large number of different structures (shown here as circular nodes) connected through elementary reaction steps (shown here as connections); characterization of each generated reaction using, for example, ab initio quantum chemistry or AI/ML, ultimately enables chemical insight. In the double-ended scheme, plausible mechanisms are generated that definitively connect input reactants to a target product; repeated generation and characterization of different mechanisms enables one to home in the “most likely” reaction mechanism based on thermodynamic and/or kinetic grounds.
Figure 6
Figure 6
Outline of graph-based ARD study of cobalt-catalyzed hydroformylation of C2H4. The assumed reactants are shown at the top, alongside the expected products. ARD simulations based on our proposed dynamic string method generated 32 different molecular structures contained within the catalytic cycle; some of the most relevant structures are shown here, labeled 1–12. Those structures shown in solid circles (1–8) are the key intermediates and products of the expected Heck–Breslow reaction mechanism, whereas representative side products (9–12) are shown in dashed-line circles.
Figure 7
Figure 7
Four representative reaction mechanisms forming benzene from different initial reactant species, as identified in DEGDS simulations. Mechanism (a), discovered by our ARD simulations, corresponds to that previously identified based on experimental data.
Figure 8
Figure 8
(a) Overview of Strecker reaction of benzaldehyde, yielding the related non-natural amino acid structure. (b) Representative DEGDS simulation of the same reaction; although DEGDS can readily identify reactions leading from reactants to products, it is often found that reactions are “out of sequence” in the overall mechanism, or nonrealistic intermediates, such as OH, are generated.
Figure 9
Figure 9
(a) Correlation plot showing ANN-predicted and actual DFT calculated barriers for a test set of around 6500 reactions; the ANN illustrated here was trained in the same way as described recently. (b) ANN prediction performance, compared to DFT activation energies, for two reactions (starting from the same reactants but leading to different products). Energies are given in kcal mol–1.
Figure 10
Figure 10
(a) MEP for the insertion of molecular hydrogen H2 at the cobalt center of HCo(CO)3, the active catalytic species in the Heck–Breslow hydroformylation previously studied by ARD simulations. (b) Calculated flux-side correlation functions given by a standard RPH simulation (requiring multiple Hessian matrix evaluations along the MEP) and by our recent work in which Hessian propagation schemes (in this case, Powell–symmetric–Broyden [PSB]) are used to build the RPH.

References

    1. Angeli D. A Tutorial on Chemical Reaction Network Dynamics. Eur. J. Control 2009, 15, 398–406. 10.3166/ejc.15.398-406. - DOI
    1. Wakelam V.; Smith I. W. M.; Herbst E.; Troe J.; Geppert W.; Linnartz H.; Oberg K.; Roueff E.; Agundez M.; Pernot P.; et al. Reaction Networks For Interstellar Chemical Modelling: Improvements and Challenges. Space Sci. Rev. 2010, 156, 13–72. 10.1007/s11214-010-9712-5. - DOI
    1. Rangarajan S.; Brydon R. R. O.; Bhan A.; Daoutidis P. Automated identification of energetically feasible mechanisms of complex reaction networks in heterogeneous catalysis: application to glycerol conversion on transition metals. Green Chem. 2014, 16, 813–823. 10.1039/C3GC41386A. - DOI
    1. Pietrucci F.; Saitta A. M. Formamide reaction network in gas phase and solution via a unified theoretical approach: Toward a reconciliation of different prebiotic scenarios. Proc. Natl. Acad. Sci. U.S.A. 2015, 112, 15030–15035. 10.1073/pnas.1512486112. - DOI - PMC - PubMed
    1. Ulissi Z. W.; Medford A. J.; Bligaard T.; Nørskov J. K. To address surface reaction network complexity using scaling relations machine learning and DFT calculations. Nat. Commun. 2017, 8, 14621. 10.1038/ncomms14621. - DOI - PMC - PubMed