Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 19;13(1):7101.
doi: 10.1038/s41467-022-34603-z.

Deep learning to decompose macromolecules into independent Markovian domains

Affiliations

Deep learning to decompose macromolecules into independent Markovian domains

Andreas Mardt et al. Nat Commun. .

Abstract

The increasing interest in modeling the dynamics of ever larger proteins has revealed a fundamental problem with models that describe the molecular system as being in a global configuration state. This notion limits our ability to gather sufficient statistics of state probabilities or state-to-state transitions because for large molecular systems the number of metastable states grows exponentially with size. In this manuscript, we approach this challenge by introducing a method that combines our recent progress on independent Markov decomposition (IMD) with VAMPnets, a deep learning approach to Markov modeling. We establish a training objective that quantifies how well a given decomposition of the molecular system into independent subdomains with Markovian dynamics approximates the overall dynamics. By constructing an end-to-end learning framework, the decomposition into such subdomains and their individual Markov state models are simultaneously learned, providing a data-efficient and easily interpretable summary of the complex system dynamics. While learning the dynamical coupling between Markovian subdomains is still an open issue, the present results are a significant step towards learning Ising models of large molecular complexes from simulation data.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. The iVAMP concept as visualized by modeling dynamics of a protein that has two independent, flexible regions separated by a rigid barrel.
iVAMPnets learn an assignment of the C- (blue/top) and N-termini (green/bottom) into independent subsystems from molecular dynamics trajectories (left column). Moreover, the dynamics of both termini are modeled with statistically independent VAMPnets (right column).
Fig. 2
Fig. 2. Architecture of an iVAMPnet for N subsystems, where trainable parts are shaded green.
Two lobes are given for configuration pairs xt (light) and xt+τ (dark) where the weights are shared. Firstly, the input features are element wise weighted Y¯t=Gxt with a mask GRD×N, where each subsystem learns its individual weighting. The mask values can be interpreted as probabilities to which subsystem the input feature belongs. In order to prevent the subsequent neural network to reverse the effects of the mask, we draw for each input feature i and subsystem j an independent, normally distributed random variable ϵij~N(0,σ(1Gij)). This noise is added to the weighted features Yt=Y¯t+ϵ. Thereby, the attention weight linearly interpolates between input feature and Gaussian noise, i.e., if the attention weight Gij = 1, Yij carries exclusively the input feature xi, if Gij = 0, Yij is simple Gaussian noise. Afterwards, the transformed feature vector is split for each individual subsystem Yt=[Yt1,...,YtN] and passed through the subsystem specific neural network ηi. We call the whole transformation for a subsystem i the fuzzy state assignment χi(xt)=ηi(Yti).
Fig. 3
Fig. 3. Hidden Markov state model as a benchmark example for independent subsystems.
a 2 subsystems with 2 and 3 states emit independently to an x and y axis, respectively. The corresponding 2D space embeds all 6 global states. b The learned mask, depicted in gray-scale from 0 (white) to 1 (black), shows that each subsystem focuses on one input dimension. c The estimated subsystem transition matrices are compared with the ground truth (in percent). d Subsystem eigenfunctions (color-coded) and corresponding eigenvalues (number prints) as found by iVAMPnet. Independent processes are recovered from the 2D data.
Fig. 4
Fig. 4. Hidden Markov state model with 1024 global states forming a 10D hypercube embedded in a 20D space.
a The hypercube is composed of ten independent 2-state subsystems. A pair of two subsystems always lives in a common rotated 2D-manifold. Therefore, two subsystems need the same input features to be well approximated. b 2D depiction of the hypercube in an orthographic projection,, where the global system can jump freely between all 1024 vertices, and the ten 2-state models retrieved from it by the iVAMPnet (colors denote subsystem identity). c Learned mask, depicted in gray-scale from 0 (white) to 1 (black), assigning inputs to subsystems (color-coded). It shows that for each subsystem, the network assigns two highly important input features which are shared with exactly one other subsystem, mirroring the rotated input space. Noise dimensions (x10-x19) are assigned low importance values. d Implied timescales as a function of the model lag time (both in arbitrary units, a.u.) of all ten subsystems learned by our method (dots) approximate the underlying true timescales (lines). Time scales are color-coded by index.
Fig. 5
Fig. 5. iVAMPnet of synaptotagmin-C2A with two subsystems and twelve and six states, respectively.
a Importance values of the trainable mask depicted as color-coded protein secondary structure, indicating assignment to subsystem I (II) in green (blue). b Implied timescales of the two subsystems with a 90% percentile over 20 runs (dot markers denote means), color-coded by index. c Superposed representative structures of both extrema of the slowest resolved eigenfunctions of each subsystem (residues not assigned a high importance value or not showing significant movement are omitted for clarity). The slowest process of subsystem I changes between green and gray structures showing an orchestrated movement of the full Calcium Binding Region (CBR1, CBR2, and CBR3). The slowest process of subsystem II occurs between the blue and gray structures and describes a combined movement of the loops C78 and C34.
Fig. 6
Fig. 6. Attention scheme for amino acid chain.
Windows of size B are placed along the chain with a step size of s resulting into W many windows. A trainable weight gRW×N is assigned for a window in each subsystem which are made positive and normalized along the window axis through a softmax g¯=softmax(g,dim=0). Here a window size of B = 4 and a step size of s = 2 is chosen. As a consequence the weight of the amino acid glutamine (Q) is given as the product of the two windows it is part of g1(Q)=g¯ig¯i+1, where the multiplication is executed element wise for each subsystem. The choice of the step size determines how many neighboring amino acids have the exact same weight within a subsystem, which applies here for the tyrosine (Y). Together with the window size it is regulated how many residues share parts of their weights. Hence, the serine (S) shares the weight g¯i+1 with the previous two amino acids g1(S)=g¯i+1g¯i+2, which has a smoothing effect on the attention mechanism along the chain.

Similar articles

Cited by

References

    1. Phillips JC, et al. Scalable molecular dynamics on cpu and gpu architectures with namd. J. Chem. Phys. 2020;153:044130. doi: 10.1063/5.0014475. - DOI - PMC - PubMed
    1. Vant, J. W. et al. Protein Structure Prediction 301–315 (Springer, 2020).
    1. Buch I, Harvey MJ, Giorgino T, Anderson DP, De Fabritiis G. High-throughput all-atom molecular dynamics simulations using distributed computing. J. Chem. Inform. Modeling. 2010;50:397–403. doi: 10.1021/ci900455r. - DOI - PubMed
    1. Eastman P, et al. Openmm 7: Rapid development of high performance algorithms for molecular dynamics. PLoS Comput. Biol. 2017;13:e1005659. doi: 10.1371/journal.pcbi.1005659. - DOI - PMC - PubMed
    1. Salomon-Ferrer R, Gotz AW, Poole D, Le Grand S, Walker RC. Routine microsecond molecular dynamics simulations with amber on gpus. 2. explicit solvent particle mesh Ewald. J. Chem. Theory Comput. 2013;9:3878–3888. doi: 10.1021/ct400314y. - DOI - PubMed

Publication types

Substances