Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2023 Jan 27;29(6):e202202834.
doi: 10.1002/chem.202202834. Epub 2022 Nov 27.

Bridging Chemical Knowledge and Machine Learning for Performance Prediction of Organic Synthesis

Affiliations
Review

Bridging Chemical Knowledge and Machine Learning for Performance Prediction of Organic Synthesis

Shuo-Qing Zhang et al. Chemistry. .

Abstract

Recent years have witnessed a boom of machine learning (ML) applications in chemistry, which reveals the potential of data-driven prediction of synthesis performance. Digitalization and ML modelling are the key strategies to fully exploit the unique potential within the synergistic interplay between experimental data and the robust prediction of performance and selectivity. A series of exciting studies have demonstrated the importance of chemical knowledge implementation in ML, which improves the model's capability for making predictions that are challenging and often go beyond the abilities of human beings. This Minireview summarizes the cutting-edge embedding techniques and model designs in synthetic performance prediction, elaborating how chemical knowledge can be incorporated into machine learning until June 2022. By merging organic synthesis tactics and chemical informatics, we hope this Review can provide a guide map and intrigue chemists to revisit the digitalization and computerization of organic chemistry principles.

Keywords: machine learning; molecular embedding; organic synthesis; performance prediction; synthetic dataset.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Scheme 1
Scheme 1
A general definition of the performance prediction problem in organic synthesis. y 1, y 2, and y 3 are reaction performances of interest such as yields, stereoselectivities, etc.
Figure 1
Figure 1
Characteristics of reaction performance prediction by human chemist and ML.
Figure 2
Figure 2
Key components of ML in reaction performance prediction.
Figure 3
Figure 3
Typical scenarios where chemical knowledge is critical for the statistical pattern of synthetic performance. A) Nucleophilicity vs. basicity of selected pyrrolidines and imidazolidinones. pK aH are the corresponding Brønsted basicities in acetonitrile and N are the Mayr's nucleophilicity parameters. B) Nonlinear Hammett relationship observed in aminolysis of Y‐substituted‐phenyl 2‐methoxybenzoates in acetonitrile.
Figure 4
Figure 4
Application of OHE in yield prediction of Suzuki–Miyaura coupling reactions. A) The target Suzuki–Miyaura coupling reaction and the defined reaction space. B) OHE encoding details. C) The model performance.
Figure 5
Figure 5
Application of molecular fingerprint representation in reaction performance prediction of organic synthesis. A) The target Buchwald–Hartwig coupling reaction and the defined reaction space. B) The target asymmetric imine addition reaction and the defined reaction space. C) The MFF representation technique. D) The model performance.
Figure 6
Figure 6
Yield prediction of Buchwald–Hartwig coupling reactions using chemically meaningful descriptors. A) The target Buchwald–Hartwig coupling reaction and the defined reaction space. B) The chemically meaningful descriptors and dimension. C) The model performance.
Figure 7
Figure 7
Enantioselectivity prediction of phosphoric acid‐catalyzed asymmetric imine addition reactions using designed descriptors for chiral environment. A) The target asymmetric imine addition reaction and the defined reaction space. B) The chemically meaningful molecular representation by ASO and ESP. C) The model performance.
Figure 8
Figure 8
Multi‐variant‐linear prediction of enantioselectivity of asymmetric nucleophilic addition of imines using designed chemical descriptors. A) The target asymmetric nucleophilic addition of imines reaction and the defined reaction space. B) The chemically meaningful encoding by several selected parameters. C) The model performance.
Figure 9
Figure 9
Molecular representation based on transition‐state‐like geometry and its application in stereoselectivity prediction of Michael addition and Diels‐Alder cycloaddition. A) The Michael addition and Diels‐Alder cycloaddition reactions. B) The ACV embedding technique. C) The model performance.
Figure 10
Figure 10
ML prediction of the regioselectivity of radical C−H functionalization of heteroarenes based on mechanism‐based computational statistics. A) The virtual radical C−H functionalization reactions. B) The reaction mechanism‐based ML loop. C) The model performance.
Figure 11
Figure 11
ML prediction of regioselectivity in organic transformations using on‐thy‐fly generated quantum chemical descriptors. A) The target reactions. B) The model design of quantum mechanics descriptors incorporated with GNN. C) The model performance.
Figure 12
Figure 12
Application of transfer learning strategy in stereoselectivity prediction of carbohydrate transformation. A) The dataset and target reactions. B) The model design for transfer learning. C) The model performance.
Figure 13
Figure 13
Application of transfer learning strategy in stereoselectivity prediction of carbohydrate transformation. A) The general reactions in the curated database of asymmetric hydrogenation of olefins. B) The model design for hierarchical learning. C) The model performance.
Figure 14
Figure 14
General pipeline for reaction performance prediction with chemical‐aware ML.

Similar articles

Cited by

References

    1. Noyori R., Nat. Chem. 2009, 1, 5–6. - PubMed
    1. None
    1. Trost B. M., Angew. Chem. Int. Ed. Engl. 1995, 34, 259–281;
    1. Trost B. M., Acc. Chem. Res. 2002, 35, 695–705. - PubMed
    1. None

LinkOut - more resources