Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan 22;64(2):327-339.
doi: 10.1021/acs.jcim.3c00594. Epub 2024 Jan 10.

Invariant Molecular Representations for Heterogeneous Catalysis

Affiliations

Invariant Molecular Representations for Heterogeneous Catalysis

Jawad Chowdhury et al. J Chem Inf Model. .

Abstract

Catalyst screening is a critical step in the discovery and development of heterogeneous catalysts, which are vital for a wide range of chemical processes. In recent years, computational catalyst screening, primarily through density functional theory (DFT), has gained significant attention as a method for identifying promising catalysts. However, the computation of adsorption energies for all likely chemical intermediates present in complex surface chemistries is computationally intensive and costly due to the expensive nature of these calculations and the intrinsic idiosyncrasies of the methods or data sets used. This study introduces a novel machine learning (ML) method to learn adsorption energies from multiple DFT functionals by using invariant molecular representations (IMRs). To do this, we first extract molecular fingerprints for the reaction intermediates and later use a Siamese-neural-network-based training strategy to learn invariant molecular representations or the IMR across all available functionals. Our Siamese network-based representations demonstrate superior performance in predicting adsorption energies compared with other molecular representations. Notably, when considering mean absolute values of adsorption energies as 0.43 eV (PBE-D3), 0.46 eV (BEEF-vdW), 0.81 eV (RPBE), and 0.37 eV (scan+rVV10), our IMR method has achieved the lowest mean absolute errors (MAEs) of 0.18 0.10, 0.16, and 0.18 eV, respectively. These results emphasize the superior predictive capacity of our Siamese network-based representations. The empirical findings in this study illuminate the efficacy, robustness, and dependability of our proposed ML paradigm in predicting adsorption energies, specifically for propane dehydrogenation on a platinum catalyst surface.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
24-length flat molecular fingerprints for the species CH3CH2CH3. Here, C0 denotes carbon atoms with no free valence (saturated carbon), whereas C1, C2, and C3 denote carbon atoms with one, two, and three free valencies. This type of fingerprint contains information based on the number of saturated and unsaturated atoms and the number of bond counts between them.
Figure 2
Figure 2
Three different methods were adopted to generate molecular representations from molecular fingerprints. Top: (Original) No transformation or modification is done on the original or raw fingerprints. Middle: (PCA) Molecular representations generated based on principal component analysis. Bottom: (IMR) Molecular representations generated using our trained Siamese neural network model.
Figure 3
Figure 3
Two major steps of our proposed method/pipeline: (a) training the Siamese neural network to generate invariant molecular representations (IMR) across different functionals using relative energy differences between species and (b) predictive modeling of adsorption energies using IMR generated by the Siamese model trained in step (a).
Figure 4
Figure 4
Feature contribution analysis across different training strategies and DFT functionals. This figure illustrates the mean absolute contribution of various molecular fingerprints (e.g., H, C, and C0) in a matrix format, where rows and columns represent training strategies for the Siamese network and DFT functionals, respectively. The first, second, and third rows represent the FFM, BEM, and FSM training strategies, while the first to fourth columns correspond to PBE-D3, BEEF-vdW, RPBE, and SCAN + rVV10 functionals, respectively. A dotted red line in each plot marks a threshold set at 50% of the maximum contribution value for that specific scenario, delineating the top contributing fingerprints. Fingerprints with negligible contributions were omitted for clarity. This analysis underscores the significant fingerprints contributing to adsorption energies across various training strategies and functionals.

References

    1. Catlow C. R.; Davidson M.; Hardacre C.; Hutchings G. J. Catalysis making the world a better place. Philos. Trans. R. Soc., A 2016, 374, 2015008910.1098/rsta.2015.0089. - DOI - PMC - PubMed
    1. Reuter K.; Plaisance C. P.; Oberhofer H.; Andersen M. Perspective: On the active site model in computational catalyst screening. J. Chem. Phys. 2017, 146, 04090110.1063/1.4974931. - DOI - PubMed
    1. Jover J.; Fey N. The computational road to better catalysts. Chem. - Asian J. 2014, 9, 1714–1723. 10.1002/asia.201301696. - DOI - PubMed
    1. Nørskov J. K.; Studt F.; Abild-Pedersen F.; Bligaard T.. Fundamental Concepts in Heterogeneous Catalysis; John Wiley & Sons, 2014.
    1. Motagamwala A. H.; Dumesic J. A. Microkinetic modeling: a tool for rational catalyst design. Chem. Rev. 2021, 121, 1049–1076. 10.1021/acs.chemrev.0c00394. - DOI - PubMed

Publication types