Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Aug 4;1(9):1330-1341.
doi: 10.1021/jacsau.1c00254. eCollection 2021 Sep 27.

Markov State Models to Study the Functional Dynamics of Proteins in the Wake of Machine Learning

Affiliations
Review

Markov State Models to Study the Functional Dynamics of Proteins in the Wake of Machine Learning

Kirill A Konovalov et al. JACS Au. .

Abstract

Markov state models (MSMs) based on molecular dynamics (MD) simulations are routinely employed to study protein folding, however, their application to functional conformational changes of biomolecules is still limited. In the past few years, the field of computational chemistry has experienced a surge of advancements stemming from machine learning algorithms, and MSMs have not been left out. Unlike global processes, such as protein folding, the application of MSMs to functional conformational changes is challenging because they mostly consist of localized structural transitions. Therefore, it is critical to properly select a subset of structural features that can describe the slowest dynamics of these functional conformational changes. To address this challenge, we recommend several automatic feature selection methods such as Spectral-OASIS. To identify states in MSMs, the chosen features can be subject to dimensionality reduction methods such as TICA or deep learning based VAMPNets to project MD conformations onto a few collective variables for subsequent clustering. Another challenge for the application of MSMs to the study of functional conformational changes is the ability to comprehend their biophysical mechanisms, as MSMs built for these processes often require a large number of states. We recommend the recently developed quasi-MSMs (qMSMs) to address this issue. Compared to MSMs, qMSMs encode the non-Markovian dynamics via the generalized master equation and can significantly reduce the number of states. As a result, qMSMs can be built with a handful of states to facilitate the interpretation of functional conformational changes. In the wake of machine learning, we believe that the rapid advancement in the MSM methodology will lead to their wider application in studying functional conformational changes of biomolecules.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
Key steps in MSM construction for studying functional conformational changes in proteins. (A) Pathways between two or more end points of the functional conformational changes are generated and optimized to obtain minimum free energy pathways. (B) Extensive MD simulations are performed starting from these pathways. (C) Several relevant features or physical coordinates are selected. (D) Dimensionality reduction is performed using the selected features as input. (E) Reduced dimension data is discretized to obtain microstates and the MSM is estimated. (F) Kinetic lumping is performed to group microstates to macrostates.
Figure 2
Figure 2
Examples of functional conformational changes elucidated by MSMs. (A) MSMs describe the mechanism of thymine DNA glycosylase sliding along double-stranded DNA to detect the mismatched pair (target site, S5). (B) Conformational change of the NarK transporter during substrate exchange is shown. Panel (A) is reproduced with permission from ref (68). Oxford University Press, 2021. Panel (B) is adapted with permission from ref (69). Elsevier, 2021.
Figure 3
Figure 3
Feature selection for functional conformational change. (A) Overview of the Spectral-oASIS algorithm. (B) Time scales of the first three TICs of the trypsin-benzamidine system are calculated using a subset of features selected by Spectral-oASIS. The optimal number of features is selected when the time scales plot levels off, which is at around 5000 out of 24 533 features. (C) Active site opening in trypsin-benzamidine can be described by the first TIC, which is calculated by using the selected features from Spectral-oASIS. The motion of the critical Trp215 is shown with sticks. (D) Overview of the feature importance selection algorithm. (E) The accuracy of T4 lysozyme is plotted as a function of the number of discarded features. Individual curves correspond to a different number of metastable states in the partitioning of the dynamics. The selected essential features are the ones after the accuracy plot begins to drop. (F) The functional change of T4 lysozyme is shown by the essential feature set. Panels (B) and (C) are reproduced from ref (39). Copyright 2018 American Chemical Society. Panels (E) and (F) are reproduced from ref (40). Copyright 2018 American Chemical Society.
Figure 4
Figure 4
VAMPNets based CVs offer superior performance compared to TICA. (A) Schematic of the VAMPNets architecture. (B) Structure of the Trp-cage protein. The green spheres highlight Cα atoms. Representative pairwise distance features between some of the Cα atoms are shown as yellow dashed lines. (C) MD simulation structures of Trp-cage proteins are projected onto TICA coordinates and colored according to the eigenvectors discovered by SRV (Top) and TICA-MSM (Bottom). (D) Model performance is scored based on cross-validation with VAMP-2. Panels (C) and (D) are reproduced with permission from ref (83). Copyright 2019 American Chemical Society.
Figure 5
Figure 5
qMSMs afford precise models with a handful of states. (A) Schematic of a simple three-state model. (B) Memory kernel tensor (K) of the three-state model. (C) The mechanism of the bacterial RNA polymerase clamp domain opening is shown, where four macrostates and the MFPTs between them are identified and estimated by the qMSM. (D) Chapman–Kolmogorov tests of the qMSM and four-state MSM are compared to MD simulations. (E) MFPTs from S4 to S1 estimated using the qMSM (left) and four-state MSM (right) are shown as a function of lag time. Panels (A) and (B) are reproduced with permission from ref (43). AIP Publishing, 2020. Panels (C)–(E) are adapted with permission from ref (90). National Academy of Sciences, 2021.

Similar articles

Cited by

References

    1. Henzler-Wildman K.; Kern D. Dynamic personalities of proteins. Nature 2007, 450 (7172), 964–972. 10.1038/nature06522. - DOI - PubMed
    1. Bahar I.; Lezon T. R.; Yang L. W.; Eyal E. Global Dynamics of Proteins: Bridging Between Structure and Function. Annu. Rev. Biophys. 2010, 39, 23–42. 10.1146/annurev.biophys.093008.131258. - DOI - PMC - PubMed
    1. Wei G. H.; Xi W. H.; Nussinov R.; Ma B. Y. Protein Ensembles: How Does Nature Harness Thermodynamic Fluctuations for Life? The Diverse Functional Roles of Conformational Ensembles in the Cell. Chem. Rev. 2016, 116 (11), 6516–6551. 10.1021/acs.chemrev.5b00562. - DOI - PMC - PubMed
    1. Zimmerman M. I.; Porter J. R.; Ward M. D.; Singh S.; Vithani N.; Meller A.; Mallimadugula U. L.; Kuhn C. E.; Borowsky J. H.; Wiewiora R. P.; Hurley M. F. D.; Harbison A. M.; Fogarty C. A.; Coffland J. E.; Fadda E.; Voelz V. A.; Chodera J. D.; Bowman G. R. SARS-CoV-2 simulations go exascale to predict dramatic spike opening and cryptic pockets across the proteome. Nat. Chem. 2021, 13, 651–659. 10.1038/s41557-021-00707-0. - DOI - PMC - PubMed
    1. Silva D. A.; Weiss D. R.; Avila F. P.; Da L. T.; Levitt M.; Wang D.; Huang X. H. Millisecond dynamics of RNA polymerase II translocation at atomic resolution. Proc. Natl. Acad. Sci. U. S. A. 2014, 111 (21), 7665–7670. 10.1073/pnas.1315751111. - DOI - PMC - PubMed