Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Oct 18;121(20):3883-3895.
doi: 10.1016/j.bpj.2022.08.045. Epub 2022 Sep 3.

Predicting accurate ab initio DNA electron densities with equivariant neural networks

Affiliations

Predicting accurate ab initio DNA electron densities with equivariant neural networks

Alex J Lee et al. Biophys J. .

Abstract

One of the fundamental limitations of accurately modeling biomolecules like DNA is the inability to perform quantum chemistry calculations on large molecular structures. We present a machine learning model based on an equivariant Euclidean neural network framework to obtain accurate ab initio electron densities for arbitrary DNA structures that are much too large for conventional quantum methods. The model is trained on representative B-DNA basepair steps that capture both base pairing and base stacking interactions. The model produces accurate electron densities for arbitrary B-DNA structures with typical errors of less than 1%. Crucially, the error does not increase with system size, which suggests that the model can extrapolate to large DNA structures with negligible loss of accuracy. The model also generalizes reasonably to other DNA structural motifs such as the A- and Z-DNA forms, despite being trained on only B-DNA configurations. The model is used to calculate electron densities of several large-scale DNA structures, and we show that the computational scaling for this model is essentially linear. We also show that this machine learning electron density model can be used to calculate accurate electrostatic potentials for DNA. These electrostatic potentials produce more accurate results compared with classical force fields and do not show the usual deficiencies at short range.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

Figure 1
Figure 1
Schematic for training a machine learning model for DNA. Illustrations were made using Discovery Studio Visualizer (63) and VMD (64,65). (A) Molecular dynamics simulations are run on B-DNA 12-mers. Quantum calculations are performed on representative snapshots extracted from the central two basepairs to obtain training densities. (B) The e3nn model takes atomic coordinates as input, converts them into a graph, which is fed through the network, and outputs the electron density for the given input coordinates. The model is trained on all ten combinations of basepair steps that make up a full DNA base sequence.
Figure 2
Figure 2
Training and test results for the DNA machine learning model. (A) Learning curves for increasing numbers of training samples. The errors are calculated against the test set of 2-mers. (B) Error ερtrue with respect to increasing base sequence length in the test set. The same model, trained on 4000 samples, was used in each case. Error bars represent the standard deviation over the last 50 training epochs, sampled every 5 epochs.
Figure 3
Figure 3
Comparing machine learning predicted densities to an isolated atom model for a representative test set 4-mer (base sequence: ATCT). Illustrations were made using VMD (64,65). (A) Machine learning predicted and isolated atom densities at an isovalue of 0.15 a.u. The density prediction errors are ερtrue=1.02% and 9.68%, respectively. (B) Density differences with the true quantum density have an isovalue of 0.01 a.u. Yellow and cyan surfaces represent positive and negative values, respectively.
Figure 4
Figure 4
Machine learning densities for large DNA structures. (A) Drew-Dickerson dodecamer, 758 atoms, 3780 electrons, (PDB: 4c64 (92)). (B) Stacked four-way junction, 1260 atoms, 6280 electrons (PDB: 1dcw (93)). (C) Timings for running the machine learning models. The machine learning model scales linearly, N=1.17. (D) Nucleosome core particle, 147 basepairs, 9346 atoms, 46,980 electrons (PDB: 1kx5 (94)). (E) DNA origami triangle (49,54), 340 basepairs, 21,658 atoms, 108,654 electrons. Illustrations were made using VMD (64,65).
Figure 5
Figure 5
Electrostatic potentials derived from the machine learning density model. Electrostatic potentials for the (A) A/T and (B) G/C basepairs. The dark red portions at the ends of the structures are the negatively charged phosphate groups. (C) RMSDs against quantum reference calculations for the machine learning electrostatic potential and the classical BSC1 force field (7,10). To get a sense of the range of interaction, the van der Waals radius for hydrogen is plotted at r=1.2 Å (dotted line), and one half of the distance for DNA base stacking is plotted at r=1.7 Å (dashed line). Error bars represent the standard deviation over the last 50 training epochs, sampled every 5 epochs. (D) Machine-learned electrostatic potentials on the major and minor groove sides of a DNA structure with an A-tract (PDB: 264d (97)). The isovalue of the density is set such that the potential is calculated at an average distance of 1.7 Å from the nearest atom. Electrostatic potential plots were made using Plotly (98). The units for the potential are given in a.u. The scales were selected manually to highlight key features and do not represent the ESP ranges for the structures.

Similar articles

Cited by

References

    1. Cole D.J., Hine N.D.M. Applications of large-scale density functional theory in biology. J. Phys. Condens. Matter. 2016;28:393001. - PubMed
    1. González J., Baños I., et al. Millán J. Unravelling protein–DNA interactions at molecular level: a DFT and NCI study. J. Chem. Theor. Comput. 2016;12:523–534. - PubMed
    1. Liu X.W., Li J., Ji L.N., et al. Experimental and theoretical study on DNA-binding and photocleavage properties of chiral complexes Δ- and Λ-(Ru(bpy)2L) (L = o-hpip, m-hpip and p-hpip) Dalton Trans. 2003:1352–1359.
    1. Hashemkhani Shahnazari G., Darvish Ganji M. Understanding structural and molecular properties of complexes of nucleobases and Au13 golden nanocluster by DFT calculations and DFT-MD simulation. Sci. Rep. 2021;11:435. - PMC - PubMed
    1. Kruse H., Banáš P., Šponer J. Investigations of stacked DNA base-pair steps: highly accurate stacking interaction energies, energy decomposition, and many-body stacking effects. J. Chem. Theor. Comput. 2019;15:95–115. - PubMed

Publication types

LinkOut - more resources