Review

. 2022 May 9;62(9):2101-2110.

doi: 10.1021/acs.jcim.1c00975. Epub 2021 Nov 4.

Machine Learning of Reaction Properties via Learned Representations of the Condensed Graph of Reaction

Esther Heid¹, William H Green¹

Affiliations

PMID: 34734699
PMCID: PMC9092344
DOI: 10.1021/acs.jcim.1c00975

Review

Machine Learning of Reaction Properties via Learned Representations of the Condensed Graph of Reaction

Esther Heid et al. J Chem Inf Model. 2022.

. 2022 May 9;62(9):2101-2110.

doi: 10.1021/acs.jcim.1c00975. Epub 2021 Nov 4.

Authors

Esther Heid¹, William H Green¹

Affiliation

¹ Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.

PMID: 34734699
PMCID: PMC9092344
DOI: 10.1021/acs.jcim.1c00975

Abstract

The estimation of chemical reaction properties such as activation energies, rates, or yields is a central topic of computational chemistry. In contrast to molecular properties, where machine learning approaches such as graph convolutional neural networks (GCNNs) have excelled for a wide variety of tasks, no general and transferable adaptations of GCNNs for reactions have been developed yet. We therefore combined a popular cheminformatics reaction representation, the so-called condensed graph of reaction (CGR), with a recent GCNN architecture to arrive at a versatile, robust, and compact deep learning model. The CGR is a superposition of the reactant and product graphs of a chemical reaction and thus an ideal input for graph-based machine learning approaches. The model learns to create a data-driven, task-dependent reaction embedding that does not rely on expert knowledge, similar to current molecular GCNNs. Our approach outperforms current state-of-the-art models in accuracy, is applicable even to imbalanced reactions, and possesses excellent predictive capabilities for diverse target properties, such as activation energies, reaction enthalpies, rate constants, yields, or reaction classes. We furthermore curated a large set of atom-mapped reactions along with their target properties, which can serve as benchmark data sets for future work. All data sets and the developed reaction GCNN model are available online, free of charge, and open source.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

**Figure 1**
Schematic depiction of the CGR (middle) for the dissociation of water, constructed from the atom-mapped reactants (right) and the atom-mapped products (left). (Top) Example of balanced reaction. (Bottom) Example of imbalanced reaction. In the CGR, each atom and each bond has two labels, one corresponding to the reactants and another to the products. For imbalanced reactions, the features of an imbalanced atom can either be imputed or set to zero (indicated by the striped area).

**Figure 2**
Architecture of a standard graph convolutional neural net (top) and adaption to reactions via input of the condensed graph of reaction (bottom). Each atom and bond fingerprint now consists of two parts, one describing the reactants (gray) and the other the products (red). If a bond does not exist in reactants or products, the corresponding parts of the fingerprint (white, crossed out) are set to zero. If an atom is missing in an imbalanced reactions, its features can be either imputed or set to zero. The white vectors correspond to the hidden atomic and molecular representations.

**Figure 3**
Comparison of test set R² scores between different models for the ωB97X-D3 computational activation energy data set with pretraining on B97-D3 activation energies. Error bars correspond to the standard deviation between five folds. Best model system highlighted in red; line corresponds to best performance.

**Figure 4**
Comparison of test set R² scores between different models for the E2/S_N2 computational activation energy data set. Error bars correspond to the standard deviation between five folds. Best model system highlighted in red; line corresponds to best performance.

**Figure 5**
Comparison of test set R² scores between different models for the S_NAr experimental activation energy data set. Error bars correspond to the standard deviation between five folds. Best model system highlighted in red; line corresponds to best performance.

**Figure 6**
Mean absolute errors of the CGR GCNN model on subsets of the E_a ωB97X-D3 data set without pretraining.

**Figure 7**
Comparison of test set R² scores between different models for the computational rate constants data set. Error bars correspond to the standard deviation between five folds. Best model system highlighted in red; line corresponds to best performance.

**Figure 8**
Comparison of test set R² scores between different models for the experimental phosphatase reaction yield data set. Error bars correspond to the standard deviation between five folds. Best model system highlighted in red; line corresponds to best performance.

**Figure 9**
Comparison of accuracies between different models for the classification of name reactions via the USPTO-1K-TPL data set (top) or the Pistachio data set (bottom). Error bars correspond to the standard deviation between five folds. The red dot and line correspond to the performance achieved by ref (52).

See this image and copyright information in PMC

References

1. Gilmer J.; Schoenholz S. S.; Riley P. F.; Vinyals O.; Dahl G. E.. Neural Message Passing for Quantum Chemistry. In International Conference on Machine Learning, 2017; pp 1263–1272.
1. Klicpera J.; Groß J.; Günnemann S.. Directional Message Passing for Molecular Graphs. arXiv preprint, arXiv:2003.03123, 2020.
1. Zhang S.; Liu Y.; Xie L.. Molecular Mechanics-Driven Graph Neural Network with Multiplex Graph for Molecular Structures. arXiv preprint, arXiv:2011.07457, 2020.
1. Alperstein Z.; Cherkasov A.; Rolfe J. T.. All Smiles Variational Autoencoder. arXiv preprint, arXiv:1905.13343, 2019.
1. Zaslavskiy M.; Jégou S.; Tramel E. W.; Wainrib G. ToxicBlend: Virtual Screening of Toxic Compounds with Ensemble Predictors. Comp. Toxicol. 2019, 10, 81–88. 10.1016/j.comtox.2019.01.001. - DOI

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Machine Learning of Reaction Properties via Learned Representations of the Condensed Graph of Reaction

Affiliation

Machine Learning of Reaction Properties via Learned Representations of the Condensed Graph of Reaction

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources