Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Mar 21;120(12):e2221048120.
doi: 10.1073/pnas.2221048120. Epub 2023 Mar 15.

Building insightful, memory-enriched models to capture long-time biochemical processes from short-time simulations

Affiliations

Building insightful, memory-enriched models to capture long-time biochemical processes from short-time simulations

Anthony J Dominic 3rd et al. Proc Natl Acad Sci U S A. .

Abstract

The ability to predict and understand complex molecular motions occurring over diverse timescales ranging from picoseconds to seconds and even hours in biological systems remains one of the largest challenges to chemical theory. Markov state models (MSMs), which provide a memoryless description of the transitions between different states of a biochemical system, have provided numerous important physically transparent insights into biological function. However, constructing these models often necessitates performing extremely long molecular simulations to converge the rates. Here, we show that by incorporating memory via the time-convolutionless generalized master equation (TCL-GME) one can build a theoretically transparent and physically intuitive memory-enriched model of biochemical processes with up to a three order of magnitude reduction in the simulation data required while also providing a higher temporal resolution. We derive the conditions under which the TCL-GME provides a more efficient means to capture slow dynamics than MSMs and rigorously prove when the two provide equally valid and efficient descriptions of the slow configurational dynamics. We further introduce a simple averaging procedure that enables our TCL-GME approach to quickly converge and accurately predict long-time dynamics even when parameterized with noisy reference data arising from short trajectories. We illustrate the advantages of the TCL-GME using alanine dipeptide, the human argonaute complex, and FiP35 WW domain.

Keywords: Markov state models; biomolecular dynamics; generalized master equations; memory effects; protein folding.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

Fig. 1.
Fig. 1.
Application of the TCL-GME to alanine dipeptide with comparisons to the MSM and qMSM (A) Root mean square error (RMSE) curves for the MSM, qMSM, and TCL-GME quantifying the deviation from the MD data (open circles) as the model is parameterized with increasing amounts of data (RMSE Analysis). Vertical lines show the errors associated with cutoffs (τ) of 1.5 ps and 10 ps. Alanine dipeptide is shown (2 residues). (B) State 1 TPM dynamics, C11(t), computed with MSM, qMSM, and TCL-GME approaches parameterized with 1.5 ps of MD data, i.e., τL = τK = τR = 1.5 ps. (C) State 1 TPM dynamics computed with τL = τK = τR = 10 ps. The 4-state TPMs parameterized with τK = τR = 1.5 ps and τL = 10 ps are shown in SI Appendix, Fig. S1. MD error bars were obtained using a bootstrapping approach as discussed in ref. .
Fig. 2.
Fig. 2.
Demonstration that the massive spatial and temporal scales of the argonaute protein present a challenge to MSMs. Left: Implied timescales (ITS) plot of Eq. 3, for the three nonunitary eigenvalues, whose plateau time corresponds to the Markovian lag time, τL. Diamonds show the choice of τL in Fig. 3, but one can appreciate that no choice for this window of MD data would be satisfactory. Using the U-GME approach (discussed in this section), Markovianity is found to require ∼1, 200 times as much simulation data. Right: Rendering of the argonaute protein containing the mRNA strand used to obtain the MD data. The protein itself is composed of 831 residues.
Fig. 3.
Fig. 3.
Instability of the qMSM and TCL-GME in the case of the argonaute protein and demonstration of the robustness of our U-GME approach. (A) The transparent line shows the state 2 memory kernel K22(t) as a function of time. From the RMSE SI Appendix, Fig. S2A, we observe that K(t) converges by 35 ns. The solid line shows the replacement of K22(t) with zero after this time. (B) Time-dependent conditional probability of starting in state 2 and remaining in state 2 (state 2 dynamics) predicted using the qMSM with τK ∈ {25, 35, 45, 55} ns, where increasing transparency corresponds to decreasing values of τK. (C) Similar to (A), the transparent line shows the state 2 time-local generator R22(t) as a function of time, and the solid line shows the replacement of R(t) with R(τR) after τR = 30 ns. (D) State 2 dynamics predicted using the TCL-GME with τR ∈ {25, 35, 45, 55} ns, where increasing transparency corresponds to decreasing values of τR. (E) Like (C), the transparent line shows R22(t) as a function of time. Here, the solid line is instead illustrating the replacement of R(t) with its time average over the window [20, 30] ns after τR = 30 ns, i.e., (tr, τR)=(20, 30) ns. (F) Dynamics predicted using the R-GME. (G) The transparent line shows propagator U22(t) as a function of time, and the solid line shows the replacement of U(t) with its average over the window [20, 30] ns after τR = 30 ns. (H) Dynamics predicted using the U-GME. In (B), (D), (F), and (H), we show an MSM parameterized with τL = 50 ns. The MD data and error bars were computed using the bootstrapping approach (ref. for details).
Fig. 4.
Fig. 4.
Ability of our U-GME to accurately predict the dynamics of the FiP35 WW domain. (A) RMSE curves for the MSM and the U-GME as a function of τL and τR, while varying choices of tr to illustrate convergence. The structure of the FiP35 WW domain is shown (35 residues). (B) TPM dynamics (C22(t)) computed using U-GME and MSM approaches with τR = 25 ns (ℓ = 5 ns) and τL = 25 ns. (C) The propagator U22(t) as a function of time, showing that U has been replaced with its average at 25 ns.

References

    1. Chaudhuri T. K., Paul S., Protein-misfolding diseases and chaperone-based therapeutic approaches. FEBS J. 273, 1331 (2006). - PubMed
    1. Schwantes C. R., McGibbon R. T., Pande V. S., Perspective: Markov models for long-timescale biomolecular dynamics. J. Chem. Phys. 141, 090902 (2014). - PMC - PubMed
    1. Wang W., Cao S., Zhu L., Huang X., Constructing Markov state models to elucidate the functional conformational changes of complex biomolecules. WIREs Comput. Mol. Sci. 8 (2018).
    1. Pande V. S., Beauchamp K., Bowman G. R., Everything you wanted to know about Markov State Models but were afraid to ask. Methods 52, 99 (2010). - PMC - PubMed
    1. Husic B. E., Pande V. S., Markov state models: From an art to a science. J. Am. Chem. Soc. 140, 2386 (2018). - PubMed

Publication types