Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Sep 28;131(12):124101.
doi: 10.1063/1.3216567.

Progress and challenges in the automated construction of Markov state models for full protein systems

Affiliations

Progress and challenges in the automated construction of Markov state models for full protein systems

Gregory R Bowman et al. J Chem Phys. .

Abstract

Markov state models (MSMs) are a powerful tool for modeling both the thermodynamics and kinetics of molecular systems. In addition, they provide a rigorous means to combine information from multiple sources into a single model and to direct future simulations/experiments to minimize uncertainties in the model. However, constructing MSMs is challenging because doing so requires decomposing the extremely high dimensional and rugged free energy landscape of a molecular system into long-lived states, also called metastable states. Thus, their application has generally required significant chemical intuition and hand-tuning. To address this limitation we have developed a toolkit for automating the construction of MSMs called MSMBUILDER (available at https://simtk.org/home/msmbuilder). In this work we demonstrate the application of MSMBUILDER to the villin headpiece (HP-35 NleNle), one of the smallest and fastest folding proteins. We show that the resulting MSM captures both the thermodynamics and kinetics of the original molecular dynamics of the system. As a first step toward experimental validation of our methodology we show that our model provides accurate structure prediction and that the longest timescale events correspond to folding.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Scatter plots of the free energy of each microstate (in kcal∕mol) vs its RMSD. (a) The initial 10 000 state model, (b) the 30 000 state model, (c) the final 10 000 state model, and (d) the final 10 000 state model except that the average RMSD across five structures in each state is used instead of the RMSD of the state center.
Figure 2
Figure 2
Top ten implied timescales for the initial 10 000 state model.
Figure 3
Figure 3
Three representative structures for (a) the lowest RMSD state in the final model and (b) the most probable state in the final model overlaid with the crystal structure (red). The phenylalanine core is shown explicitly for each molecule.
Figure 4
Figure 4
Top ten implied timescales for the final model. (a) The implied timescales at intervals of 1 ns. (b) The implied timescales with error bars obtained by doing five iterations of bootstrapping at an interval of 5 ns.
Figure 5
Figure 5
The average RMSD of each state in the final model vs its left eigenvector component in the longest timescale transition showing that this transition corresponds to folding.
Figure 6
Figure 6
Comparison between the time evolution of the native population in the MSM (blue) and the raw data (black) for the entire data set. The error bars represent the standard error.
Figure 7
Figure 7
Comparison between the time evolution of the RMSD in the MSM (blue), the reduced representation (yellow), and the raw data (black) for (a) an example of good agreement and (b) an example of the worst case scenario. The error bars represent one standard deviation in the RMSD.
Figure 8
Figure 8
Improved agreement between the MSM and raw data for the example of poor agreement from Fig. 7b obtained by building the transition probability matrix from simulations started from this starting structure alone. The error bars represent one standard deviation in the RMSD.
Figure 9
Figure 9
Graph depiction of the model system defined in Appendix B with edges labeled by (a) their probability and (b) their average timescale under a two-state assumption.

Similar articles

Cited by

References

    1. Anfinsen C. B., Haber E., Sela M., and F. H.White, Jr., Proc. Natl. Acad. Sci. U.S.A. PNASA6 47, 1309 (1961).10.1073/pnas.47.9.1309 - DOI - PMC - PubMed
    1. Klein W. L., W. B.Stine, Jr., and Teplow D. B., Neurobiol. Aging ZZZZZZ 25, 569 (2004).10.1016/j.neurobiolaging.2004.02.010 - DOI - PubMed
    1. Simons K. T., Kooperberg C., Huang E., and Baker D., J. Mol. Biol. JMOBAK 268, 209 (1997).10.1006/jmbi.1997.0959 - DOI - PubMed
    1. Bowman G. R. and Pande V. S., Proteins 18PLAF 74, 777 (2009).10.1002/prot.22210 - DOI - PubMed
    1. Krivov S. V. and Karplus M., Proc. Natl. Acad. Sci. U.S.A. PNASA6 101, 14766 (2004).10.1073/pnas.0406234101 - DOI - PMC - PubMed

Publication types