Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2025 Mar 4;16(13):5383-5412.
doi: 10.1039/d5sc00541h. eCollection 2025 Mar 26.

Computational tools for the prediction of site- and regioselectivity of organic reactions

Affiliations
Review

Computational tools for the prediction of site- and regioselectivity of organic reactions

Lukas M Sigmund et al. Chem Sci. .

Abstract

The regio- and site-selectivity of organic reactions is one of the most important aspects when it comes to synthesis planning. Due to that, massive research efforts were invested into computational models for regio- and site-selectivity prediction, and the introduction of machine learning to the chemical sciences within the past decade has added a whole new dimension to these endeavors. This review article walks through the currently available predictive tools for regio- and site-selectivity with a particular focus on machine learning models while being organized along the individual reaction classes of organic chemistry. Respective featurization techniques and model architectures are described and compared to each other; applications of the tools to critical real-world examples are highlighted. This paper aims to serve as an overview of the field's status quo for both the intended users of the tools, that is synthetic chemists, as well as for developers to find potential new research avenues.

PubMed Disclaimer

Conflict of interest statement

There are no conflicts to declare.

Figures

Fig. 1
Fig. 1. Site- and regioselectivity of organic reactions. (A) Iridium-catalyzed site-selective borylation that proceeds primarily at one of the three possible Caromatic–H groups. (B) Copper-catalyzed regioselective Diels–Alder reaction. (C) Rhodium-catalyzed hydroformylation of myrcene with high site- and regioselectivity. In all cases, the main reaction product is shown first, after which additional possible isomers are given half-transparently.
Fig. 2
Fig. 2. Overview of (A) molecular features, descriptors, and representations and (B) model types for site- and regioselectivity prediction.
Fig. 3
Fig. 3. (A) Diels–Alder reaction of acrylonitrile and 2-methoxybuta-1,3-diene with the main reaction product shown first and the alternative possible regioisomer depicted half-transparently second. The Molecular Transformer failed to recognize the Diels–Alder reaction. (B) Two alkene epoxidation reactions that are both predicted correctly by the Molecular Transformer. (C) Friedel–Crafts acylation of fluorobenzene with the main reaction product shown first and an alternative possible isomer depicted half-transparently second, which was correctly predicted by the Molecular Transformer (top). Expected meta-directing influence of the nitro group during the Friedel–Crafts acylation of nitrobenzene and the respective Molecular Transformer output that predicts para-substitution (bottom).
Fig. 4
Fig. 4. (A) Iron-catalyzed C(sp3)–H oxidation of (+)-artemisinin with the experimentally observed site-selectivity and the respective prediction from a linear model. (B) DFT-computed and ML-predicted hydrogen atom abstraction selectivities at (+)-camptothecin with three different abstraction agents.
Fig. 5
Fig. 5. Dirhodium complex-catalyzed formation of (A) C(sp3)–N and (B) C(sp3)–C bonds by insertion reactions into C(sp3)–H bonds. (C) SMART featurization approach to model the spatial accessibility of the reactive cavity of the Rh2-tetracarboxylate catalysts through the conformational flexibility of the macrocyclic thioether probe attached to the dirhodium catalytic system.
Fig. 6
Fig. 6. (A) Site-selective deprotonation of an allyl group that determines the selectivity of the following oxidation reaction (cf. ref. 149). (B) Aldol reaction followed by oxidation with the Dess–Martin periodinane (DMP). The kinetically controlled reaction product is formed due to deprotonation of the methylene group.
Fig. 7
Fig. 7. (A) Schematic reaction mechanism of an electrophilic aromatic substitution reaction (SEAr). (B) The relative stability of protonated aryl substrates can be used as a surrogate of the real Wheland intermediate for site-selectivity predictions. (C) Shell-wise local featurization of atomic positions. (D) Multitask site-selectivity prediction in which the Weisfeiler-Lehman encoder learns molecular embeddings which are passed to separate feed-forward neural networks for reaction-specific site-selectivity prediction. During training, the entire model (graph encoder + readout networks) is optimized simultaneously.
Fig. 8
Fig. 8. (A) Schematic reaction mechanism of a radical Caromatic–H substitution reaction. Predictions of respective functionalization reactions with (B) the Fukui index for radical attack, f(0), and (C) with a GNN ML model and comparisons to experimental observations.
Fig. 9
Fig. 9. (A) Schematic reaction mechanism of a C–H activation-mediated Caromatic–H substitution reaction. (B) A single directing group favoring two different sites (left) and two different directing groups favoring two different sites (right) during C–H activation. (C) Example of the palladacycle intermediate used by Tomberg et al. to computationally construct a scale for directing group strength. The example directing group is highlighted in green. (D) Borylation site-selectivity predictions made by a three-dimensional GNN model and comparison to the experimental observations. The predicted percentages and respective standard deviations were obtained by applying the model to ten different conformers of the substrate molecule. (E) Schematic representation of the Site of Borylation (SoBo) model architecture and (F) two of its prediction examples in comparison to the experimental observations.
Fig. 10
Fig. 10. Ring-opening reaction of oxiranes with azide as the nucleophile with the possibility of the formation of two different regioisomers.
Fig. 11
Fig. 11. (A) General reaction scheme of a cross-coupling reaction between an aryl halide and an arene or alkene with an appropriate leaving group (LG) catalyzed by a transition metal complex. (B) Handy and Zhang's 1H NMR chemical shift model for site-selectivity prediction of cross-coupling reactions. The larger 1H NMR chemical shift in the surrogate molecule indicates the reactive position. (C) Application of the Regio-MPNN tool to a Buchwald–Hartwig coupling and comparison to the erroneous prediction of a retrosynthesis planning software.
Fig. 12
Fig. 12. (A) Schematic reaction mechanism of a nucleophilic aromatic substitution reaction (SNAr) either through a concerted or stepwise mechanism including a Meisenheimer intermediate. (B) Nucleophilic aromatic substitution reactions and their predicted site-selectivity from an MLR model. (C) SNAr site-selectivity prediction workflow as developed by Guan et al. The reaction site-similarity provides a confidence score and is calculated as the distance in their latent space representations in the last layer of the GNN. (D) Application of the model shown in (C) to an SNAr reaction of a difluoroarene with thiophenol as the nucleophile, which was incorrectly predicted by the ML part of the workflow, although with a low confidence score. This low confidence score triggered DFT optimization of the two individual transition states that corrected the initial erroneous prediction.
Fig. 13
Fig. 13. (A) Regioselectivity-determining alkene insertion step of the Mizoroki–Heck reaction leading to the two different regioisomers. (B) Reaction of a polyolefin that is part of Wang et al.'s Mizoroki–Heck dataset and that allows for the formation of site-isomeric products.
Fig. 14
Fig. 14. (A) General reaction scheme of a hydroformylation reaction of a terminal olefin catalyzed by a phosphine-ligated transition metal central atom and the two possible regioisomeric reaction products. (B) Schematic representation of Wang et al.'s hydroformylation regioselectivity model for terminal olefin substrates.
Fig. 15
Fig. 15. Aryl iodide-catalyzed difluorination of alkenes and the two possible isomeric reaction products.
Fig. 16
Fig. 16. Diels–Alder reaction en route to the total synthesis of rippertenol with the two possible regioisomers (cf. ref. 348). The experimentally observed regioselectivity was correctly predicted by an RF model based on the Hammett constants and the topological steric effect indices of the dienophile's and diene's substituents.
Fig. 17
Fig. 17. (A) Titanium-catalyzed [2 + 2 + 1]-cycloaddition of 1-phenyl-1-propyne and azobenzene to give pyrrole derivatives 10 to 12. (B) Catalyst optimization to maximize the production of the desired regioisomer 10. The reported selectivities refer to 10/(11 + 12).
Fig. 18
Fig. 18. (A) Palladium-catalyzed annulation of ortho-borylaryl triflates and the two possible reaction products with experimentally observed regioselectivities and the respective predictions from a linear model. The Hammett and Charton parameters of the R substituent and the cone angle of the applied ligand at palladium were used to predict regioselectivity. (B) The key intermediate of the reaction shown in (A), which is the palladium complex of the in situ-generated aryne.
Fig. 19
Fig. 19. Schematic representation of how generic synthesis planning software (including retrosynthesis tools) can work in cooperation with explicit regio- and site-selectivity models for the overall improved prediction of synthetic pathways (left part). In the future, increasing generalization of reaction selectivity but also feasibility tools could be sought to evaluate each predicted synthetic step (right part).
None
Lukas M. Sigmund
None
Michele Assante
None
Magnus J. Johansson
None
Per-Ola Norrby
None
Kjell Jorner
None
Mikhail Kabeshov

Similar articles

References

    1. Ley S. V. Fitzpatrick D. E. Ingham R. J. Myers R. M. Angew. Chem., Int. Ed. 2015;54:3449–3464. doi: 10.1002/anie.201410744. - DOI - PubMed
    1. Mahjour B. Shen Y. Cernak T. Acc. Chem. Res. 2021;54:2337–2346. doi: 10.1021/acs.accounts.1c00119. - DOI - PubMed
    1. Biyani S. A. Moriuchi Y. W. Thompson D. H. Chem. Methods. 2021;1:323–339. doi: 10.1002/cmtd.202100023. - DOI
    1. Berritt S., Christensen M., Johansson M. J., Krska S. W., Newman S. G., Sampson J., Simmons E. M., Wang Y. and Strotman N. A., in The Power of High-Throughput Experimentation: General Topics and Enabling Technologies for Synthesis and Catalysis (Volume 1), American Chemical Society, 2022, vol. 1419, ch. 1, pp. 3–9
    1. Tom G. Schmid S. P. Baird S. G. Cao Y. Darvish K. Hao H. Lo S. Pablo-García S. Rajaonson E. M. Skreta M. Yoshikawa N. Corapi S. Akkoc G. D. Strieth-Kalthoff F. Seifrid M. Aspuru-Guzik A. Chem. Rev. 2024;124:9633–9732. doi: 10.1021/acs.chemrev.4c00055. - DOI - PMC - PubMed

LinkOut - more resources