. 2023 Mar 21;120(12):e2221048120.

doi: 10.1073/pnas.2221048120. Epub 2023 Mar 15.

Building insightful, memory-enriched models to capture long-time biochemical processes from short-time simulations

Anthony J Dominic 3rd¹, Thomas Sayer¹, Siqin Cao², Thomas E Markland³, Xuhui Huang², Andrés Montoya-Castillo¹

Affiliations

¹ Department of Chemistry, University of Colorado, Boulder, CO 80309.
² Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706.
³ Department of Chemistry, Stanford University, Stanford, CA 94305.

PMID: 36920924
PMCID: PMC10041170
DOI: 10.1073/pnas.2221048120

Building insightful, memory-enriched models to capture long-time biochemical processes from short-time simulations

Anthony J Dominic 3rd et al. Proc Natl Acad Sci U S A. 2023.

. 2023 Mar 21;120(12):e2221048120.

doi: 10.1073/pnas.2221048120. Epub 2023 Mar 15.

Authors

Anthony J Dominic 3rd¹, Thomas Sayer¹, Siqin Cao², Thomas E Markland³, Xuhui Huang², Andrés Montoya-Castillo¹

Affiliations

¹ Department of Chemistry, University of Colorado, Boulder, CO 80309.
² Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706.
³ Department of Chemistry, Stanford University, Stanford, CA 94305.

PMID: 36920924
PMCID: PMC10041170
DOI: 10.1073/pnas.2221048120

Abstract

The ability to predict and understand complex molecular motions occurring over diverse timescales ranging from picoseconds to seconds and even hours in biological systems remains one of the largest challenges to chemical theory. Markov state models (MSMs), which provide a memoryless description of the transitions between different states of a biochemical system, have provided numerous important physically transparent insights into biological function. However, constructing these models often necessitates performing extremely long molecular simulations to converge the rates. Here, we show that by incorporating memory via the time-convolutionless generalized master equation (TCL-GME) one can build a theoretically transparent and physically intuitive memory-enriched model of biochemical processes with up to a three order of magnitude reduction in the simulation data required while also providing a higher temporal resolution. We derive the conditions under which the TCL-GME provides a more efficient means to capture slow dynamics than MSMs and rigorously prove when the two provide equally valid and efficient descriptions of the slow configurational dynamics. We further introduce a simple averaging procedure that enables our TCL-GME approach to quickly converge and accurately predict long-time dynamics even when parameterized with noisy reference data arising from short trajectories. We illustrate the advantages of the TCL-GME using alanine dipeptide, the human argonaute complex, and FiP35 WW domain.

Keywords: Markov state models; biomolecular dynamics; generalized master equations; memory effects; protein folding.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

**Fig. 1.**
Application of the TCL-GME to alanine dipeptide with comparisons to the MSM and qMSM (A) Root mean square error (RMSE) curves for the MSM, qMSM, and TCL-GME quantifying the deviation from the MD data (open circles) as the model is parameterized with increasing amounts of data (*RMSE Analysis*). Vertical lines show the errors associated with cutoffs (τ) of 1.5 ps and 10 ps. Alanine dipeptide is shown (2 residues). (B) State 1 TPM dynamics, $C_{11} (t)$ , computed with MSM, qMSM, and TCL-GME approaches parameterized with 1.5 ps of MD data, i.e., τ_L = τ_K = τ_R = 1.5 ps. (C) State 1 TPM dynamics computed with τ_L = τ_K = τ_R = 10 ps. The 4-state TPMs parameterized with τ_K = τ_R = 1.5 ps and τ_L = 10 ps are shown in *SI Appendix*, Fig. S1. MD error bars were obtained using a bootstrapping approach as discussed in ref. .

**Fig. 2.**
Demonstration that the massive spatial and temporal scales of the argonaute protein present a challenge to MSMs. *Left*: Implied timescales (ITS) plot of Eq. 3, for the three nonunitary eigenvalues, whose plateau time corresponds to the Markovian lag time, τ_L. Diamonds show the choice of τ_L in Fig. 3, but one can appreciate that no choice for this window of MD data would be satisfactory. Using the $⟨ U ⟩$ -GME approach (discussed in this section), Markovianity is found to require ∼1, 200 times as much simulation data. *Right*: Rendering of the argonaute protein containing the mRNA strand used to obtain the MD data. The protein itself is composed of 831 residues.

**Fig. 3.**
Instability of the qMSM and TCL-GME in the case of the argonaute protein and demonstration of the robustness of our $⟨ U ⟩$ -GME approach. (A) The transparent line shows the state 2 memory kernel $K_{22} (t)$ as a function of time. From the RMSE *SI Appendix*, Fig. S2A, we observe that $K (t)$ converges by 35 ns. The solid line shows the replacement of $K_{22} (t)$ with zero after this time. (B) Time-dependent conditional probability of starting in state 2 and remaining in state 2 (state 2 dynamics) predicted using the qMSM with τ_K ∈ {25, 35, 45, 55} ns, where increasing transparency corresponds to decreasing values of τ_K. (C) Similar to (A), the transparent line shows the state 2 time-local generator $R_{22} (t)$ as a function of time, and the solid line shows the replacement of $R (t)$ with $R (τ_{R})$ after τ_R = 30 ns. (D) State 2 dynamics predicted using the TCL-GME with τ_R ∈ {25, 35, 45, 55} ns, where increasing transparency corresponds to decreasing values of τ_R. (E) Like (C), the transparent line shows $R_{22} (t)$ as a function of time. Here, the solid line is instead illustrating the replacement of $R (t)$ with its time average over the window [20, 30] ns after τ_R = 30 ns, i.e., (t_r, τ_R)=(20, 30) ns. (F) Dynamics predicted using the $⟨ R ⟩$ -GME. (G) The transparent line shows propagator $U_{22} (t)$ as a function of time, and the solid line shows the replacement of $U (t)$ with its average over the window [20, 30] ns after τ_R = 30 ns. (H) Dynamics predicted using the $⟨ U ⟩$ -GME. In (B), (D), (F), and (H), we show an MSM parameterized with τ_L = 50 ns. The MD data and error bars were computed using the bootstrapping approach (ref. for details).

**Fig. 4.**
Ability of our $⟨ U ⟩$ -GME to accurately predict the dynamics of the FiP35 WW domain. (A) RMSE curves for the MSM and the $⟨ U ⟩$ -GME as a function of τ_L and τ_R, while varying choices of t_r to illustrate convergence. The structure of the FiP35 WW domain is shown (35 residues). (B) TPM dynamics ( $C_{22} (t)$ ) computed using $⟨ U ⟩$ -GME and MSM approaches with τ_R = 25 ns (ℓ = 5 ns) and τ_L = 25 ns. (C) The propagator $U_{22} (t)$ as a function of time, showing that $U$ has been replaced with its average at 25 ns.

See this image and copyright information in PMC

References

1. Chaudhuri T. K., Paul S., Protein-misfolding diseases and chaperone-based therapeutic approaches. FEBS J. 273, 1331 (2006). - PubMed
1. Schwantes C. R., McGibbon R. T., Pande V. S., Perspective: Markov models for long-timescale biomolecular dynamics. J. Chem. Phys. 141, 090902 (2014). - PMC - PubMed
1. Wang W., Cao S., Zhu L., Huang X., Constructing Markov state models to elucidate the functional conformational changes of complex biomolecules. WIREs Comput. Mol. Sci. 8 (2018).
1. Pande V. S., Beauchamp K., Bowman G. R., Everything you wanted to know about Markov State Models but were afraid to ask. Methods 52, 99 (2010). - PMC - PubMed
1. Husic B. E., Pande V. S., Markov state models: From an art to a science. J. Am. Chem. Soc. 140, 2386 (2018). - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

CHE-2154291/National Science Foundation (NSF)

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Building insightful, memory-enriched models to capture long-time biochemical processes from short-time simulations

Affiliations

Building insightful, memory-enriched models to capture long-time biochemical processes from short-time simulations

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources