Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 May 9;62(9):2111-2120.
doi: 10.1021/acs.jcim.1c01065. Epub 2022 Jan 15.

Improving Few- and Zero-Shot Reaction Template Prediction Using Modern Hopfield Networks

Affiliations
Review

Improving Few- and Zero-Shot Reaction Template Prediction Using Modern Hopfield Networks

Philipp Seidl et al. J Chem Inf Model. .

Abstract

Finding synthesis routes for molecules of interest is essential in the discovery of new drugs and materials. To find such routes, computer-assisted synthesis planning (CASP) methods are employed, which rely on a single-step model of chemical reactivity. In this study, we introduce a template-based single-step retrosynthesis model based on Modern Hopfield Networks, which learn an encoding of both molecules and reaction templates in order to predict the relevance of templates for a given molecule. The template representation allows generalization across different reactions and significantly improves the performance of template relevance prediction, especially for templates with few or zero training examples. With inference speed up to orders of magnitude faster than baseline methods, we improve or match the state-of-the-art performance for top-k exact match accuracy for k ≥ 3 in the retrosynthesis benchmark USPTO-50k. Code to reproduce the results is available at github.com/ml-jku/mhn-react.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
Simplified depiction of our approach. Standard approaches only encode the molecule and predict a fixed set of templates. In our modern Hopfield network (MHN)-based approach, the templates are also encoded and transformed to stored patterns via the template encoder. The Hopfield layer learns to associate the encoded input molecule, the state pattern ξ, with the memory of encoded templates, the stored patterns X. Multiple Hopfield layers can operate in parallel or can be stacked using different encoders.
Figure 2
Figure 2
Histogram showing the fraction of samples for different template frequencies. The leftmost red bar indicates that over 40% of chemical reactions of USPTO-lg have a unique reaction template. The majority of reaction templates are rare.
Figure 3
Figure 3
Top-100 accuracy for different template popularity on the USPTO-sm/USPTO-lg datasets. The gray bars represent the proportion of samples in the test set. Error bars represent 95% confidence intervals on binomial proportion. Our method performs especially well on samples with reaction templates with few training examples.
Figure 4
Figure 4
Reactant top-k accuracy versus inference speed for different values of k. Upper left is better. For Transformer/GLN, the points represent different beam sizes. For MHN/NeuralSym, the points reflect different numbers of generated reactant sets, namely, {1, 3, 5, 10, 20, 50}. In case of a Transformer, the points depict different beam sizes: {1, 3, 5, 10, 20, 50, 75, 100}, from left to right.

References

    1. Lombardino J. G.; Lowe J. A. The Role of the Medicinal Chemist in Drug Discovery — Then and Now. Nat. Rev. Drug Discovery 2004, 3, 853–862. 10.1038/nrd1523. - DOI - PubMed
    1. Lu Z.; Chen X.; Liu X.; Lin D.; Wu Y.; Zhang Y.; Wang H.; Jiang S.; Li H.; Wang X.; Lu Z. Interpretable machine-learning strategy for soft-magnetic property and thermal stability in Fe-based metallic glasses. npj Comput. Mater. 2020, 6, 1–9. 10.1038/s41524-020-00460-x. - DOI
    1. Mayr A.; Klambauer G.; Unterthiner T.; Steijaert M.; Wegner J. K.; Ceulemans H.; Clevert D.-A.; Hochreiter S. Large-scale comparison of machine learning methods for drug target prediction on ChEMBL. Chem. Sci. 2018, 9, 5441–5451. 10.1039/C8SC00148K. - DOI - PMC - PubMed
    1. McCammon J. A. Computer-Aided Molecular Design. Science 1987, 238, 486–491. 10.1126/science.3310236. - DOI - PubMed
    1. Ng L. Y.; Chong F. K.; Chemmangattuvalappil N. G. Challenges and Opportunities in Computer-Aided Molecular Design. Comput. Chem. Eng. 2015, 81, 115–129. 10.1016/j.compchemeng.2015.03.009. - DOI

Publication types