Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 May 24;3(5):434-443.
doi: 10.1021/acscentsci.7b00064. Epub 2017 Apr 18.

Prediction of Organic Reaction Outcomes Using Machine Learning

Affiliations

Prediction of Organic Reaction Outcomes Using Machine Learning

Connor W Coley et al. ACS Cent Sci. .

Abstract

Computer assistance in synthesis design has existed for over 40 years, yet retrosynthesis planning software has struggled to achieve widespread adoption. One critical challenge in developing high-quality pathway suggestions is that proposed reaction steps often fail when attempted in the laboratory, despite initially seeming viable. The true measure of success for any synthesis program is whether the predicted outcome matches what is observed experimentally. We report a model framework for anticipating reaction outcomes that combines the traditional use of reaction templates with the flexibility in pattern recognition afforded by neural networks. Using 15 000 experimental reaction records from granted United States patents, a model is trained to select the major (recorded) product by ranking a self-generated list of candidates where one candidate is known to be the major product. Candidate reactions are represented using a unique edit-based representation that emphasizes the fundamental transformation from reactants to products, rather than the constituent molecules' overall structures. In a 5-fold cross-validation, the trained model assigns the major product rank 1 in 71.8% of cases, rank ≤3 in 86.7% of cases, and rank ≤5 in 90.8% of cases.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
Model framework combining forward enumeration and candidate ranking. The primary aim of this work is the creation of the parametrized scoring model, which is trained to maximize the probability assigned to the recorded experimental outcome.
Figure 2
Figure 2
Depiction of the top five most popular forward synthetic templates extracted from 1.1 million USPTO reactions. C[al] denotes any aliphatic carbon.
Figure 3
Figure 3
Edit-based model architecture for scoring candidate reactions. Reactions are represented by four types of edits. Initial atom- and bond-level attributes are converted into feature representations, which are summed and used to calculate that candidate reaction’s likelihood score.
Figure 4
Figure 4
Performance of the three reaction prediction models as indicated by the (a) histogram of probabilities assigned to true outcomes; (b) histogram of ranks assigned to true outcomes, truncated to ranks 1–10; and (c) overall success rate as a function of the minimum acceptable assigned rank. In each case, the model is attempting to select the true product out of several hundred possible reaction products.
Figure 5
Figure 5
Mean model accuracy as a function of the binned model confidence, where the model confidence refers to the probability assigned to the highest-ranked candidate.
Figure 6
Figure 6
Reaction examples where the hybrid model assigned rank 1 to the recorded product. Recorded/predicted reactions: (a) chlorination; (b) amide synthesis; (c) isoxazole synthesis; (d) sulfamide synthesis; (e) etherfication; (f) Suzuki coupling; (g) Grignard addition; (h) azidation; (i) alkylation.
Figure 7
Figure 7
Reaction examples where the hybrid model did not assign rank 1 to the recorded product. Recorded [predicted] reactions: (a) amidation [amidation of different substrate]; (b) hydrolysis [hydrolysis at different ether]; (c) deprotection [nitration]; (d) oxidation [bromination]; hydrogenation [dehalogenation]; (f) iodination [iodination at different site].

References

    1. Corey E. J.; Wipke W. T. Computer-Assisted Design of Complex Organic Syntheses. Science 1969, 166, 178–192. 10.1126/science.166.3902.178. - DOI - PubMed
    1. Corey E. J. General methods for the construction of complex molecules. Pure Appl. Chem. 1967, 14, 19–38. 10.1016/B978-0-08-020741-4.50004-X. - DOI
    1. Pensak D. A.; Corey E. J.. Computer-Assisted Organic Synthesis; ACS Symp. Ser.; 1977; Vol. 61; Chapter 1, pp 1–32, doi:10.1021/bk-1977-0061.ch001. - DOI
    1. Salatin T. D.; Jorgensen W. L. Computer-assisted mechanistic evaluation of organic reactions. 1. Overview. J. Org. Chem. 1980, 45, 2043–2051. 10.1021/jo01299a001. - DOI - PubMed
    1. Satoh H.; Funatsu K. SOPHIA, a Knowledge Base-Guided Reaction Prediction System - Utilization of a Knowledge Base Derived from a Reaction Database. J. Chem. Inf. Model. 1995, 35, 34–44. 10.1021/ci00023a005. - DOI