Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan;625(7995):508-515.
doi: 10.1038/s41586-023-06854-3. Epub 2023 Nov 15.

Computational prediction of complex cationic rearrangement outcomes

Affiliations

Computational prediction of complex cationic rearrangement outcomes

Tomasz Klucznik et al. Nature. 2024 Jan.

Abstract

Recent years have seen revived interest in computer-assisted organic synthesis1,2. The use of reaction- and neural-network algorithms that can plan multistep synthetic pathways have revolutionized this field1,3-7, including examples leading to advanced natural products6,7. Such methods typically operate on full, literature-derived 'substrate(s)-to-product' reaction rules and cannot be easily extended to the analysis of reaction mechanisms. Here we show that computers equipped with a comprehensive knowledge-base of mechanistic steps augmented by physical-organic chemistry rules, as well as quantum mechanical and kinetic calculations, can use a reaction-network approach to analyse the mechanisms of some of the most complex organic transformations: namely, cationic rearrangements. Such rearrangements are a cornerstone of organic chemistry textbooks and entail notable changes in the molecule's carbon skeleton8-12. The algorithm we describe and deploy at https://HopCat.allchemy.net/ generates, within minutes, networks of possible mechanistic steps, traces plausible step sequences and calculates expected product distributions. We validate this algorithm by three sets of experiments whose analysis would probably prove challenging even to highly trained chemists: (1) predicting the outcomes of tail-to-head terpene (THT) cyclizations in which substantially different outcomes are encoded in modular precursors differing in minute structural details; (2) comparing the outcome of THT cyclizations in solution or in a supramolecular capsule; and (3) analysing complex reaction mixtures. Our results support a vision in which computers no longer just manipulate known reaction types1-7 but will help rationalize and discover new, mechanistically complex transformations.

PubMed Disclaimer

Conflict of interest statement

Competing interests The authors declare the following competing interests: T.K., W.B., B.M.-K., M.M., S.S. and B.A.G. are consultants and/or stakeholders of Allchemy, Inc. Allchemy software and its HopCat module are property of Allchemy, Inc., USA. All queries about access options to Allchemy, including academic collaborations, should be sent to saraszymkuc@allchemy.net.

Figures

Extended Data Fig. 1 |
Extended Data Fig. 1 |. HopCat’s mechanistic analysis of a reaction yielding a fused tetrahydropyran.
An example of a problem not “seen” by the machine during training on 715 literature examples. In the original publication, the authors focused on the double 1,5-H shifts as key steps and did not consider the full mechanism. HopCat’s calculations ran up to n = 4 generations and traced a complete and unique mechanistic pathway. This pathway starts with a series of carbonyl and allyl resonances placing positive charge at the position available for 1,5-H shift followed by 1,6-olefin exo cyclization. Subsequently, the sequence of 1,5-H shift and 1,6-olefin exo cyclisation steps is repeated to afford tetrahydropyran’s bicyclic scaffold. The last two mechanistic steps along the pathway are: (i) carbonyl resonance to form oxocarbenium species and (ii) elimination of Lewis acid yielding the final, quenched product. The software’s solution agrees with the partial mechanism postulated in the original publication. a, A screenshot showing a simplified network (without stereochemistry). In reality, the network was generated with full stereochemistry and comprised of ~28,000 nodes that cannot be clearly visualized as a miniature. b, Details of all mechanistic steps (for raw screenshots from HopCat, in traditional and atom-mapped visualization modalities, see Supplementary Section S5). Additional examples are also provided in Supplementary Section S5.
Extended Data Fig. 2 |
Extended Data Fig. 2 |. HopCat’s mechanistic analysis of a reaction leading to a tricyclic dienone.
HopCat solves another problem not “seen” in the 715 training set. The dienone is an intermediate used in the recent synthesis of curcusone diterpenes. In the original publication, authors included a plausible arrow-pushing scheme of electron movements for the double deprotectionaldol sequence but did not support it with a more detailed mechanistic analysis. HopCat identifies the reaction’s product in G4 and proposes a plausible and unique mechanistic route. Starting from a carbocation generated via elimination of substrate’s tertiary alcohol (bottom row of the network), this intermediate undergoes two consecutive resonances (allyl and carbonyl) that result in the formation of an oxocarbenium cation. Subsequent retro oxa-cyclization followed by ring closure constructs a seven-membered, central ring of the molecule. The last two steps describe deprotection of the enol ether. Formation of the oxocarbenium cation via carbonyl resonance makes the alkyl group on the oxygen a good leaving group, enabling its subsequent elimination and formation of the final product. The overall movement of electrons is consistent with the one proposed by the authors. a, A screenshot showing the network comprised of ~2,000 nodes. b, Details of all mechanistic steps (for raw screenshots from HopCat, in traditional and atom-mapped visualization modalities, see Supplementary Section S5). Additional examples are also provided in Supplementary Section S5.
Extended Data Fig. 3 |
Extended Data Fig. 3 |. A contested and only recently resolved biosynthesis of spiroviolene relies on a macrocyclization step (1,11-olefin endo cyclization), which does not occur in abiotic set of carbocation transformations.
Identifying the mechanistic pathway for the biosynthesis of spiroviolene has proven a computationally challenging problem – in fact, the pathway was not found within G7 and expansions to higher generations exceeded computing power. Accordingly, we implemented a “mixed” strategy search in which 7 generations were expanded from the substrate in the forward direction and 6 generations from the product in the retrosynthetic direction (using “reversed” mechanistic rules). This strategy considerably reduces the computational cost as the number of nodes in two smaller networks, each propagated to n generations and with branching factor m, scales as 2mn vs. m2n for one forward network expanded to 2n generations (for n = m = 7, the difference is mn /2 ~ 400,000 times). The algorithm then searched for common node(s) in the two networks and, when they were found, was able to concatenate a 10-step route. a, HopCat’s screenshot showing a grossly simplified network generated by a mixed forward-retro search. In reality, the network comprised of 909,937 nodes that could not be clearly visualized as a miniature. HopCat’s shortest route is marked with purple lines and agrees with the recently revised pathway. Also, in the same network, rearrangement sequences leading to three other natural products were found – phomopsene (red lines and frame), allokutznerene (orange) and variediene (green); b, Details of all mechanistic steps for spiroviolene’s mechanistic route. For raw HopCat’s screenshots of the sequences leading to all four natural products, in traditional and atom-mapped visualization modalities, see Supplementary Section S5). Note: Akin to Fig. 4 and Extended Data Figs. 1, 2, none of the biosyntheses shown in this figure were considered when extracting mechanistic steps from literature examples.
Extended Data Fig. 4 |
Extended Data Fig. 4 |. Theoretical studies of model H- and C-shifts.
a, System setup. For all unique configurations of substituents R1-R4 (-H and -Me were considered), atom X was dragged along distance vector r so as to simulate the shift. Initial geometry was chosen such that the C-X bond was approximately perpendicular to the plane of the carbocation. All trajectories were subsequently verified by visual inspection. b, H-shifts (X = H). Top three panels represent symmetric shifts (such that the orders of initial and resulting carbocations are the same), with the order of carbocation increasing from the left to the right. In the bottom row, two leftmost panels represent shifts in which the carbocation changes order by one, whereas the rightmost panel represents an extreme example of transition between first-order and tertiary carbocations. c, C-shifts (X = Me). Top three panels represent symmetric shifts (such that the orders of initial and resulting carbocations are the same), with the order of carbocation increasing from left to right. In the bottom row, two leftmost panels represent shifts in which the carbocation changes order by one, whereas the rightmost panel represents an extreme example of transition between first-order and tertiary carbocations. d, Theoretical studies of carbocation association process. Each curve represents the SCS-MP2/aug-cc-pVDZ energy profile with PCM model of water, modelling approach of four nucleophiles (formaldehyde, water, methanol and ethene) towards CH3+ along vector R (scheme inserted in the top left of the panel). e, Boxplot representing experimental stabilities of carbocations taken from with respect to the CH3+ cation. The data was grouped according to the order of a carbocation (number of non-hydrogen atoms directly connected to the formally charged carbon atom), showing the general trend in the stability: increasing the order of a carbocation lowers the energy, on average, by 10–20 kcal/mol.
Extended Data Fig. 5 |
Extended Data Fig. 5 |. General synthetic scheme for the preparation of the precursors employed in Fig. 5.
An alkyl bromide is converted into the corresponding organozinc reagent by sequential treatment with t-BuLi and ZnCl2. This reagent is then used in a Negishi coupling with a vinyl iodide bearing a protected alcohol group. The coupling product is then deprotected to give the free alcohol, and the corresponding acetate is prepared by acetylation of this alcohol.
Fig. 1 |
Fig. 1 |. Key aspects of a network-based algorithm to predict mechanisms and product distributions of complex carbocationic rearrangements.
a, One of the literature examples (from total synthesis of methyl Kadsurenin C; ref. 46) analysed by expert chemists to assign individual mechanistic steps. Bonds and atoms coloured in red span the ‘cores’ of mechanistic transforms. b, The horizontal axis counts numbers of literature examples analysed (715 in total, all examples deposited at https://HopCatResults.allchemy.net/) to derive these rules. The vertical axis plots the number of mechanistic rules identified by analysing a given number of literature examples (note that certain rules are grouped, for example, ‘Addition of water 1’ and ‘Addition of water 2’ are counted as one). Blue curve, carbocation generation rules; red, rearrangements and resonances; green, carbocation quenches. For each set, analysis was repeated 10,000 times, each time with a different and random ordering of literature examples. The solid line represents the median; the dark and light shaded areas delineate interpercentile ranges 0.25–0.75 and 0.05–0.95, respectively. All curves flatten out suggesting that our sets of mechanistic rules are nearly complete (for all rules, see Supplementary Information section 2). c, The mechanistic steps thus derived are applied iteratively to propagate reaction networks commencing from arbitrary substrates (‘parent’ node at the very bottom). d, The networks are pruned according to physical-organic constraints (Supplementary Information section 3) to reduce network size by up to roughly 1,000 times. e, Subsequently, the algorithm can trace (orange) mechanistic pathway(s) between the substrate and some known product. Already at this stage, the algorithm can solve some complex mechanistic puzzles (Fig. 3 and Extended Data Figs. 1–3 and Supplementary Figs. 41–66). f, Calculation of energies of all nodes and/or molecules and energetic barriers of all edges and/or steps (Methods) yields kinetic rate constants (here, coloured blue to red to indicate slow to fast steps). Solution of kinetic equations then predicts the abundances of specific products, here indicated by the sizes of the nodes.
Fig. 2 |
Fig. 2 |. Statistics of the mechanistic networks and model performance.
All analyses are based on the networks generated for the ‘715’ set. Orange lines indicate median, boxes envelop data between Q1 and Q3 quartiles; whiskers delineate the most spread pair of points within the (Q1 − 1.5 × interquartile range, Q3 + 1.5 × interquartile range) range, where the interquartile range is Q3Q1. a,b, Sizes of mechanistic networks do not correlate with, for example, a substrate’s molecular weight or the number of stereocentres (a) but increase with the number of multiple, non-aromatic bonds (b). c,d, Average branching factors remain similar irrespective of a network’s synthetic generation, Gn: carbocation (c) and quenching (d) branching factors. The branching factor for carbocations in a given Gn is the number of carbocations in Gn+1 divided by their number in Gn. The quench factor is the number of quenched and/or neutral products in Gn+1 divided by the number of carbocations in Gn. Performance of the kinetic model is quantified in e,f. e, The horizontal axis quantifies the absolute rank (best is zero) of the literature-reported product within the network. The vertical axis gives the percentage of the entire dataset (that is, all 715 networks) for which the predicted rank is not larger than the corresponding value on the x axis (top k statistic). Default settings use the default quench and generation parametrization with time and temperature either taken from literature or set to 298 K and 12 h. The best settings modify these parameters within 30% of the default values (0.7, 0.8, 1.2 and 1.3) and missing time data (here, 2 h and 12 h) that give the best top k statistics. The worst settings correspond to the worst result obtained with such modifications. f, Bars and the left axis quantify the dependence of the top ten statistics on the number of synthetic generations. The right axis and black line plot the average network size. As the networks become very large, the accuracy in predicting the literature product within the top ten decreases markedly.
Fig. 3 |
Fig. 3 |. HopCat’s analysis of a carbocationic rearrangement in Kobayashi’s synthesis of a taxanine derivative.
a, Reaction of β-4(20)-epoxy-5-O-triethylsilyltaxinine A, T1, which on treatment with BF3×OEt2 gives compound T3 containing a cyclobutane ring marked for clarity with green dotted bonds. Although the authors explained the mechanism for the T1 ➔ T2 step (Lewis acid-promoted liberation of formaldehyde followed by elimination of silyl ether and formation of 1,3-dioxane), step T2 ➔ T3 was referred to as curious and had no mechanistic explanation even in a recent review (published 20 years after the original paper). b, Screenshot of a fragment of the 3,404-node network—generated within a few minutes—illustrates HopCat’s analysis starting with LA-activated T2. Resonances correspond to horizontal connections in each synthetic generation Gn; rearrangements are connections between generations. Blue nodes are carbocationic intermediates; green are quenched molecules. The mechanistic route to the experimentally observed product is traced by purple lines. Miniatures showing the intermediates are overlaid on the network (Supplementary Figs. 1–9). Note that an alternative but longer pathway replacing 1,4-olefin exo cyclization with a sequence of 1,5-olefin endo and 1,2-C shift, and then replacing carbonyl resonance and elimination of the Lewis acid with elimination and enolization was also found. For additional problems solved by HopCat on similar time scales, see Extended Data Figs. 1 and 2 and Supplementary Figs. 41–43 and 50–66. The reader is encouraged to use https://HopCat.allchemy.net/ to solve other mechanistic riddles, starting from substrates and carbocations of their choice. OTES, triethylsilyl ether.
Fig. 4 |
Fig. 4 |. Experimental versus predicted product distributions emerging from rearrangements of linalool and fenchol at different temperatures.
a,b, The data for linalool for which experimental conditions were: linalool 1 (0.65 mmol), TsOH•H2O (0.65 mmol), MS 4 Å, dry benzene, Ar, 16 h. c,d, The data for fenchol for which experimental conditions were: fenchol 2 (1.95 mmol), KHSO4 (1.95 mmol), MS 4 Å, neat, Ar, 16 h. For all experimental details, see Supplementary Information section 7. a,c, Screenshots of HopCat’s networks, both propagated up to G4. Purple nodes are products observed experimentally and node sizes correspond to relative abundances of the products (see also the second part of Supplementary Video 1). Previously unreported products are in red frames. b,d, Comparisons of experimental versus predicted product distributions at different temperatures. Vertical axes quantify percentages of specific products in the reaction mixture (whenever applicable, in the model and in experiment, these percentages are sums of values for enantiomers and diastereoisomers). In the experiments, the crude mixture was analysed by GC–MS (see Methods and Supplementary Information section 7 for details).
Fig. 5 |
Fig. 5 |. Experimental versus predicted outcomes for the THT cyclizations and uncertainty of theoretical predictions.
a, Table illustrating the building blocks (light-blue and red fragments) and the corresponding cyclized products, if observed. Methyl groups and the double bonds differing between the fragments are highlighted in grey. Within each tile, dark blue shows yields for solution experiments (determined by GC); orange shows yields for experiments performed in the capsule (isolated yields). The algorithm-predicted top k rankings are listed below the structures. Values in parentheses are rankings of the correct skeleton being formed (that is, with all stereochemical information and all double bonds removed). Unless otherwise noted, capsule reaction conditions were 20 mol% supramolecular capsule, 3 mol% HCl, CHCl3 solvent and 40 °C. b, Scheme of a similar probability and thus high-uncertainty branching along a hypothetical fragment of a mechanistic route. In the graph, the vertical axis quantifies such kinetic uncertainty as the number of branchings (within 4 kcal mol−1) normalized by the maximum value within the set. The horizontal axis plots the normalized numbers of products in each network whose energies are within 4 kcal mol−1 of the correct one; this thermodynamic measure of uncertainty is not predictive. c, Mechanistic routes from two precursors differing only in the distal part (marked in grey in the lower structure). For the lower precursor, rosadiene was predicted as the top one (solution)/top one (capsule) outcome. Indeed, it was obtained in experiments in 33% yield. Note that despite seemingly similar precursors, the two mechanistic routes are markedly different. aGC yield. b10 mol% of capsule used, and reaction carried out at 30 °C. cProduct isolated after preparative scale reaction of alcohol substrate; in all these cases, the same main product is observed in the reaction of the acetate substrate. dSubstrate is an equimolar mixture of diastereomers. eHCl (1.0 equiv.). fBF3×OEt2 (1.0 equiv.) solution conditions.

Similar articles

Cited by

References

    1. Szymkuć S et al. Computer-assisted synthetic planning: the end of the beginning. Angew. Chem. Int. Ed 55, 5904–5937 (2016). - PubMed
    1. Corey EJ & Wipke WT Computer-assisted design of complex organic syntheses. Science 166, 178–192 (1969). - PubMed
    1. Klucznik T et al. Efficient syntheses of diverse, medicinally relevant targets planned by computer and executed in the laboratory. Chem 4, 522–532 (2018).
    1. Segler MHS, Preuss M & Waller MP Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018). - PubMed
    1. Coley CW, Green WH & Jensen KF Machine learning in computer-aided synthesis planning. Acc. Chem. Res 51, 1281–1289 (2018). - PubMed