Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Aug 9;5(8):eaav6490.
doi: 10.1126/sciadv.aav6490. eCollection 2019 Aug.

Accurate and transferable multitask prediction of chemical properties with an atoms-in-molecules neural network

Affiliations

Accurate and transferable multitask prediction of chemical properties with an atoms-in-molecules neural network

Roman Zubatyuk et al. Sci Adv. .

Abstract

Atomic and molecular properties could be evaluated from the fundamental Schrodinger's equation and therefore represent different modalities of the same quantum phenomena. Here, we present AIMNet, a modular and chemically inspired deep neural network potential. We used AIMNet with multitarget training to learn multiple modalities of the state of the atom in a molecular system. The resulting model shows on several benchmark datasets state-of-the-art accuracy, comparable to the results of orders of magnitude more expensive DFT methods. It can simultaneously predict several atomic and molecular properties without an increase in the computational cost. With AIMNet, we show a new dimension of transferability: the ability to learn new targets using multimodal information from previous training. The model can learn implicit solvation energy (SMD method) using only a fraction of the original training data and an archive median absolute deviation error of 1.1 kcal/mol compared to experimental solvation free energies in the MNSol database.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1. Architecture of the AIMNet model.
The model uses atomic numbers Z and coordinates R as input features. The coordinates are transformed with ANI-type symmetry functions into AEVs. Atom types are represented with learnable atomic feature vectors (AFV), which are used as embedding vectors for AEVs. The interaction of an atom with its environment produces the AIM representation of the atom used to predict a set of target atom properties {pk}. The environment-dependent update to AFV within N iterations is used to make the embedding vectors for each atom consistent with its environment. Input data are colored in blue, predicted endpoints are in orange, and neural network blocks are in green.
Fig. 2
Fig. 2. Performance of the AIMNet model on the DrugBank subset of the COMP6-SFCl benchmark.
Plots show correlation between ground-truth DFT (x axes) and predicted with AIMNet (y axes) values for total molecular energies, components of force vectors on atoms (∂E/∂R), atomic charges, and volumes. For each plot, units for both axes and for RMSD and mean absolute deviation values are the same. Logarithm of point density is shown with color.
Fig. 3
Fig. 3. AIMNet predictions with different number of iterative passes t evaluated on the DrugBank subset of the COMP6-SFCl benchmark.
(A) Comparison of AIMNet performance at different t values with ANI-1x model trained on exactly the same dataset for relative conformer energies (ΔE), total energies (E), and atomic forces (F). (B) AIMNet accuracy in prediction of total energies (E), relative conformer energies (ΔE), atomic forces (F), charges (q), and volumes (V) at different t values. Relative RMSE is calculated as ratio of RMSE at given t divided by RMSE at t = 3 (the values used to train model).
Fig. 4
Fig. 4. DFT ωB97x/def2-TZVPP atomic charges on the sulfur atom of substituted thioaldehyde and AIMNet prediction with a different number of iterative passes t.
Fig. 5
Fig. 5. pt-SNE of AFVs (t = 3) for a set of drug-like molecules.
(B) Feature vectors for different chemical elements. Values of feature vectors at t = 1 (before SCF-like update) marked with cross symbol. (A and C) Feature vectors for several of the most common types of the carbon and hydrogen atom environments.
Fig. 6
Fig. 6. Performance of the AIMNet model on several benchmark sets compared to MM, semiempirical, DFT, and ab initio QM methods.
(A and B) Torsion benchmark of Sellers et al. (29) for gas phase and solvent models. The dots correspond to mean absolute error (MAE) for each of the 62 torsion profiles for each method. (C) The dots correspond to absolute error for relative conformer energy for each of the 196 conformers in the dataset. (A to C) The boxes represent the upper and lower quartiles, while the whiskers represent 1.5 times the interquartile range or the minimum/maximum values. The methods are ordered in descending median errors from top to bottom. Boxes colored by the class of the computational method (ML, MM, SQM, DFT, and ab initio). The basis sets used to obtain reference energies are 6-311+G**, for (A) and (B), and def2-TZVPD, for (C), where applicable but not specified. (D) Correlation plot of experimental solvation energies for 238 neutral molecules from the MNSol database (31) and AIMNet predictions, calculated as the difference between the prediction of DFT and DFT(SMD) energies.

References

    1. Rupp M., Tkatchenko A., Müller K.-R., von Lilienfeld O. A., Fast and accurate modeling of molecular atomization energies with machine learning. Phys. Rev. Lett. 108, 058301 (2012). - PubMed
    1. Yao K., Herr J. E., Toth D. W., Mckintyre R., Parkhill J., The TensorMol-0.1 model chemistry: A neural network augmented with long-range physics. Chem. Sci. 9, 2261–2269 (2018). - PMC - PubMed
    1. Smith J. S., Isayev O., Roitberg A. E., ANI-1: An extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 8, 3192–3203 (2017). - PMC - PubMed
    1. Gastegger M., Behler J., Marquetand P., Machine learning molecular dynamics for the simulation of infrared spectra. Chem. Sci. 8, 6924–6935 (2017). - PMC - PubMed
    1. Chmiela S., Tkatchenko A., Sauceda H. E., Poltavsky I., Schütt K. T., Müller K. R., Machine learning of accurate energy-conserving molecular force fields. Sci. Adv. 3, e1603015 (2017). - PMC - PubMed