Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 8;10(1):783.
doi: 10.1038/s41597-023-02690-2.

MultiXC-QM9: Large dataset of molecular and reaction energies from multi-level quantum chemical methods

Affiliations

MultiXC-QM9: Large dataset of molecular and reaction energies from multi-level quantum chemical methods

Surajit Nandi et al. Sci Data. .

Abstract

Well curated extensive datasets have helped spur intense molecular machine learning (ML) method development activities over the last few years, encouraging nonchemists to be part of the effort as well. QM9 dataset is one of the benchmark databases for small molecules with molecular energies based on B3LYP functional. G4MP2 based energies of these molecules were published later. To enable a wide variety of ML tasks like transfer learning, delta learning, multitask learning, etc. with QM9 molecules, in this article, we introduce a new dataset with QM9 molecule energies estimated with 76 different DFT functionals and three different basis sets (228 energy numbers for each molecule). We additionally enumerated all possible A ↔ B monomolecular interconversions within the QM9 dataset and provided the reaction energies based on these 76 functionals, and basis sets. Lastly, we also provide the bond changes for all the 162 million reactions with the dataset to enable structure- and bond-based reaction energy prediction tools based on ML.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Semi-automated workflow diagram of the database preparation.
Fig. 2
Fig. 2
Entity relationship diagram of the molecule and reaction databases. The “index” and “rxnindex” represent the primary keys of the molecules and reactions data respectively. The “reactindex” and the “pdtindex” represents the foreign key and thus indicates the index of the molecules database.
Fig. 3
Fig. 3
Atomization energy distribution of three density functionals: PBE (GGA type), B3LYP(VWN5) (hybrid functional) and M06-2X (highly parametrized meta-hybrid functional) in three different basis sets (SZ, DZP, and TZP). The last plot shows the atomization energy distribution difference between GFN2-xTB and G4MP2 method.
Fig. 4
Fig. 4
Distribution of the error in reaction energies (eV) with respect to the G4MP2 methods for different functionals.

References

    1. Butler KT, Davies DW, Cartwright H, Isayev O, Walsh A. Machine learning for molecular and materials science. Nature. 2018;559:547–555. doi: 10.1038/s41586-018-0337-2. - DOI - PubMed
    1. Isert C, Atz K, Jiménez-Luna J, Schneider G. Qmugs, quantum mechanical properties of drug-like molecules. Sci. Data. 2022;9:273. doi: 10.1038/s41597-022-01390-7. - DOI - PMC - PubMed
    1. Schreiner M, Bhowmik A, Vegge T, Busk J, Winther O. Transition1x-a dataset for building generalizable reactive machine learning potentials. Sci. Data. 2022;9:779. doi: 10.1038/s41597-022-01870-w. - DOI - PMC - PubMed
    1. Liang J, Xu Y, Liu R, Zhu X. Qm-sym, a symmetrized quantum chemistry database of 135 kilo molecules. Sci. Data. 2019;6:213. doi: 10.1038/s41597-019-0237-9. - DOI - PMC - PubMed
    1. Zubatyuk R, Smith JS, Leszczynski J, Isayev O. Accurate and transferable multitask prediction of chemical properties with an atoms-in-molecules neural network. Sci. Adv. 2019;5:eaav6490. doi: 10.1126/sciadv.aav6490. - DOI - PMC - PubMed