Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb 4;11(1):3178.
doi: 10.1038/s41598-021-81889-y.

Discovery of novel chemical reactions by deep generative recurrent neural network

Affiliations

Discovery of novel chemical reactions by deep generative recurrent neural network

William Bort et al. Sci Rep. .

Abstract

The "creativity" of Artificial Intelligence (AI) in terms of generating de novo molecular structures opened a novel paradigm in compound design, weaknesses (stability & feasibility issues of such structures) notwithstanding. Here we show that "creative" AI may be as successfully taught to enumerate novel chemical reactions that are stoichiometrically coherent. Furthermore, when coupled to reaction space cartography, de novo reaction design may be focused on the desired reaction class. A sequence-to-sequence autoencoder with bidirectional Long Short-Term Memory layers was trained on on-purpose developed "SMILES/CGR" strings, encoding reactions of the USPTO database. The autoencoder latent space was visualized on a generative topographic map. Novel latent space points were sampled around a map area populated by Suzuki reactions and decoded to corresponding reactions. These can be critically analyzed by the expert, cleaned of irrelevant functional groups and eventually experimentally attempted, herewith enlarging the synthetic purpose of popular synthetic pathways.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
An example of Suzuki coupling reaction (top) and its condensed graph (CGR, bottom). Reaction SMILES and SMILES/CGR are given underneath. The reaction SMILES features reactants (in orange), and products (in purple). Atom-to-atom mapping is not provided. In the SMILES/CGR broken single bonds are encoded as [- > .] (in red), while the created C–C bond is [. > -] (in green). The colon (:) represents aromatic bonds. See Supporting Information for the details.
Figure 2
Figure 2
Modeling workflow for generation of new reactions consists of five main steps: (1) training sequence-to-sequence autoencoder on the USPTO database of chemical reactions; (2) building of Generative Topographic Map (GTM) using the autoencoder latent variables and preparation of GTM class landscape; (3) selecting on GTM a zone populated to Suzuki coupling reactions and identification of related autoencoder latent vectors; (4) sampling from the autoencoder latent space and generation of new reactions; and, (5) post-processing step. On the Generative Topographic Map, larger transparency levels correspond to lower density. The color code renders the (binary: Suzuki vs Other) reaction class distribution. Thus, zones in dark blue are exclusively populated by Suzuki reactions, zones in dark red are exclusively populated by other types of reactions; while intermediate colors correspond to reaction space areas hosting both categories, in various ratios. The red circle indicates the zone from which virtual Suzuki reactions were sampled.
Figure 3
Figure 3
Example of generated chemical reaction with a new reaction center as is (A), balanced by the addition of a water molecule as a reactant (B), and its simplified form (C). Notice that the aminobenzylic leaving group suggested by the autoencoder for generated reaction looks unrealistic.
Figure 4
Figure 4
Reactions novelty detection workflow. Substructural motifs Sgen (RC, RC + 1, RC + 2, …) are extracted from the query CGR and compared with those for known reactions {Sknown}. In such a way, motifs belonging to novel reactions will easily be identified.
Figure 5
Figure 5
Preparation of a collection of reaction signatures as hash codes. From a CGR generated from a given reaction, substructural motifs containing reaction center (RC), or reaction center with n neighboring bonds and atoms (RC + n, here n = 1) can be extracted. Each motif is encoded by a hashing function into a unique hash code—reaction signature. The ensemble of unique hash codes for all reactions in the database is stored in the hash table.

References

    1. Herges R. Reaction planning: Computer-aided reaction design. Tetrahedron Comput. Methodol. 1988;1:15–25. doi: 10.1016/0898-5529(88)90005-X. - DOI
    1. Balaban AT. Chemical graphs. 3. Reactions with cyclic 6-membered transition states. Rev. Roum. Chim. 1967;12:875–902.
    1. Hendrickson JB. The variety of thermal pericyclic reactions. Angew. Chem. Int. Ed. English. 1974;13:47–76. doi: 10.1002/anie.197400471. - DOI
    1. Arens JF. A formalism for the classification and design of organic reactions. I. The class of (− +)n reactions. Recl. des Trav. Chim. des Pays-Bas. 1979;98:155–161. doi: 10.1002/recl.19790980403. - DOI
    1. Arens JF. A formalism for the classification and design of organic reactions. II. The classes of (+ −)n + and (− +)n − reactions. Recl. des Trav. Chim. des Pays-Bas. 1979;98:395–399. doi: 10.1002/recl.19790980606. - DOI

Publication types

LinkOut - more resources