Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Oct 1;8(10):6924-6935.
doi: 10.1039/c7sc02267k. Epub 2017 Aug 10.

Machine learning molecular dynamics for the simulation of infrared spectra

Affiliations

Machine learning molecular dynamics for the simulation of infrared spectra

Michael Gastegger et al. Chem Sci. .

Abstract

Machine learning has emerged as an invaluable tool in many research areas. In the present work, we harness this power to predict highly accurate molecular infrared spectra with unprecedented computational efficiency. To account for vibrational anharmonic and dynamical effects - typically neglected by conventional quantum chemistry approaches - we base our machine learning strategy on ab initio molecular dynamics simulations. While these simulations are usually extremely time consuming even for small molecules, we overcome these limitations by leveraging the power of a variety of machine learning techniques, not only accelerating simulations by several orders of magnitude, but also greatly extending the size of systems that can be treated. To this end, we develop a molecular dipole moment model based on environment dependent neural network charges and combine it with the neural network potential approach of Behler and Parrinello. Contrary to the prevalent big data philosophy, we are able to obtain very accurate machine learning models for the prediction of infrared spectra based on only a few hundreds of electronic structure reference points. This is made possible through the use of molecular forces during neural network potential training and the introduction of a fully automated sampling scheme. We demonstrate the power of our machine learning approach by applying it to model the infrared spectra of a methanol molecule, n-alkanes containing up to 200 atoms and the protonated alanine tripeptide, which at the same time represents the first application of machine learning techniques to simulate the dynamics of a peptide. In all of these case studies we find an excellent agreement between the infrared spectra predicted via machine learning models and the respective theoretical and experimental spectra.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1. Schematic representation of a high-dimensional neural network potential (HDNNP). The Cartesian coordinates R are transformed into many-body symmetry functions {Gi} describing an atom’s chemical environment. Based on these functions, a NN then predicts the energy contribution Ei associated with atom i. The potential energy Epot of the whole molecule is obtained by summing over all individual atomic energies.
Fig. 2
Fig. 2. A typical run of the adaptive selection scheme starts by using a small set of initial reference data points to train a preliminary ensemble of HDNNPs. These HDNNPs are then used to sample new molecular conformations (e.g. via molecular dynamics simulations). During sampling, the predictions of the individual potentials are monitored and if divergence is detected, the sampling run is stopped. The conformation for which the HDNNPs disagree is computed with the electronic structure reference method and added to the set of reference points. Subsequently, the HDNNP ensemble is retrained on the expanded data set and sampling is continued with the new potential. This procedure is repeated in an iterative manner, until the divergence stops to exceed a predetermined threshold.
Fig. 3
Fig. 3. In order to generate molecular fragments, first all atoms beyond a predetermined cutoff radius from the central atom are removed. Afterwards, free valencies are saturated with hydrogen atoms, unless the valency itself is situated on a hydrogen or corresponds to a double bond in the unfragmented molecule. In this case, the heavy atom connected to this atom in the original molecule is included in the fragment and the process is repeated iteratively. This procedure is performed for the whole system, leading to one fragment per atom.
Fig. 4
Fig. 4. Distribution of errors between the ML model based on the adaptive sampling scheme and the BP86 reference (blue). The deviations were computed based on the energies, forces and dipole moments (from top to bottom) of 60 000 configurations of methanol sampled with an AIMD simulation. The deviations obtained with a ML model trained on data points selected at random from a force field simulation are shown in grey (see ESI†).
Fig. 5
Fig. 5. IR spectra of the methanol molecule. The ML spectrum (red) is able to reproduce the AIMD spectrum (blue) obtained with BP86 with high accuracy. In addition, both theoretical spectra agree well with the experimental one recorded in the regions between 600 cm–1 to 4100 cm–1 (grey).
Fig. 6
Fig. 6. IR spectrum of the C69H140 alkane as predicted by the ML model based on the B2PLYP method.
Fig. 7
Fig. 7. IR spectrum of n-butane obtained via the ML model (red), compared to the static quantum mechanical spectrum computed at the B2PLYP level (blue) and convoluted with Gaussians. The peak positions in the ML and B2PLYP spectra agree closely, suggesting that the observed deviations from the experimental spectrum (grey) are not caused by the ML approximation. The overall structure of the peaks is reproduced much better by the ML accelerated AIMD simulation, especially in the region of the C–H stretching vibrations (see inset).
Fig. 8
Fig. 8. IR spectra of the protonated alanine tripeptide. The top panel shows the experimental spectrum (gray), as well as the ML spectra based on the BLYP (blue) and BP86 (red) reference methods. The lower panels depict the structures of the three main Ala3+ conformers, along with their respective contributions to the averaged BLYP ML spectrum.
Fig. 9
Fig. 9. Reaction barriers associated with the proton transfer from the N-terminal NH3 group in the NH3 conformer of Ala3+ to the neighboring carbonyl. The reaction coordinate is the distance between the transferred NH3 hydrogen and the carbonyl oxygen. The barriers computed with the electronic structure reference methods are shown as solid lines colored red for the BLYP method and blue in the case of the BP86 method. The dashed curves correspond to the predictions of the respective ML models, maintaining the above color scheme.

References

    1. Bishop C. M., Pattern Recognition and Machine Learning, Springer, New York, 1st edn, 2006.
    1. Goodfellow I., Bengio Y. and Courville A., Deep Learning, MIT Press, 2016.
    1. Schütt K. T., Arbabzadah F., Chmiela S., Müller K. R., Tkatchenko A. Nat. Commun. 2017;8:13890. - PMC - PubMed
    1. Wei J. N., Duvenaud D., Aspuru Guzik A. ACS Cent. Sci. 2016;2:725–732. - PMC - PubMed
    1. Faber F. A., Hutchison L., Huang B., Gilmer J., Schoenholz S. S., Dahl G. E., Vinyals O., Kearnes S., Riley P. F. and von Lilienfeld O. A., arXiv:1702.05532, 2017. - PubMed