Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Oct 18;6(1):213.
doi: 10.1038/s41597-019-0237-9.

QM-sym, a symmetrized quantum chemistry database of 135 kilo molecules

Affiliations

QM-sym, a symmetrized quantum chemistry database of 135 kilo molecules

Jiechun Liang et al. Sci Data. .

Abstract

Applying deep learning methods in materials science research is an important way of solving the time-consuming problems of typical ab initio quantum chemistry methodology, but due to the size of large molecules, large and uncharted fields still exist. Implementing symmetry information can significantly reduce the calculation complexity of structures, as they can be simplified to the minimum symmetric units. Because there are few quantum chemistry databases that include symmetry information, we constructed a new one, named QM-sym, by designing an algorithm to generate 135k organic molecules with the Cnh symmetry composite. Those generated molecules were optimized to a stable state using Gaussian 09. The geometric, electronic, energetic, and thermodynamic properties of the molecules were calculated, including their orbital degeneracy states and orbital symmetry around the HOMO-LUMO. The basic symmetric units were also included. This database p rovides consistent and comprehensive quantum chemical properties for structures with Cnh symmetries. QM-sym can be used as a benchmark for machine learning models in quantum chemistry or as a dataset for training new symmetry-based models.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Generation map of some molecules in the QM-sym database by reducing or retaining symmetry through replacements. The left part is the generation map of some C2h and C3h molecules. Starting from benzene (D6h), the symmetry group can be reduced to C2h or C3h or be retained correspondingly depending on whether replacing or lengthening is carried out during generation. The right part is a generation map of some C2h and C4h molecules. Starting from a C4h molecule, the symmetry group can be reduced to C2h by replacing two atoms or be retained by replacing four atoms. Grey, white, red, and blue balls denote carbon, hydrogen, boron, and fluorine atoms, respectively.
Fig. 2
Fig. 2
An overview of the QM-sym database. The proportion of each space group in the QM-sym database is shown on the top. The left inset indicates the distribution of molecules with respect to their size according to the number of atoms and space groups. The right inset corresponds to some of the C3h molecules, with their radius of rotation plotted versus their cohesive energy, and shows a distinct tendency of the cohesive energy to decrease when the rotation radius increases.
Fig. 3
Fig. 3
Flow chart of the geometry check.
Fig. 4
Fig. 4
Sketch of the excitations between orbitals with different energy levels. The degeneracy levels of the HOMO, HOMO − 2 and LUMO are 2, 1 and 1, respectively. From the results of both group theory and Gaussian 09, the transition from the HOMO to the LUMO is dark, while that from HOMO − 2 to the LUMO is bright, with a singlet with AU in terms of symmetry and an energy of 5.4 eV. An example molecule used for the spectral transition probability calculation is shown at the top left.

References

    1. Curtarolo S, et al. The high-throughput highway to computational materials design. Nature Materials. 2013;12:191. doi: 10.1038/nmat3568. - DOI - PubMed
    1. Kirkpatrick P, Ellis C. Chemical space. Nature. 2004;432:823. doi: 10.1038/432823a. - DOI
    1. Rupp M, Tkatchenko A, Müller K-R, von Lilienfeld OA. Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning. Physical Review Letters. 2012;108:058301. doi: 10.1103/PhysRevLett.108.058301. - DOI - PubMed
    1. Blum LC, Reymond J-L. 970 Million Druglike Small Molecules for Virtual Screening in the Chemical Universe Database GDB-13. Journal of the American Chemical Society. 2009;131:8732–8733. doi: 10.1021/ja902302h. - DOI - PubMed
    1. Montavon G, et al. Machine learning of molecular electronic properties in chemical compound space. New Journal of Physics. 2013;15:095003. doi: 10.1088/1367-2630/15/9/095003. - DOI

Publication types