Review

. 2017 May 23;112(10):2021-2029.

doi: 10.1016/j.bpj.2017.04.027.

An Introduction to Infinite HMMs for Single-Molecule Data Analysis

Ioannis Sgouralis¹, Steve Pressé²

Affiliations

¹ Department of Physics, Arizona State University, Tempe, Arizona.
² Department of Physics, Arizona State University, Tempe, Arizona; Department of Molecular Sciences, Arizona State University, Tempe, Arizona. Electronic address: spresse@asu.edu.

PMID: 28538142
PMCID: PMC5448313
DOI: 10.1016/j.bpj.2017.04.027

Review

An Introduction to Infinite HMMs for Single-Molecule Data Analysis

Ioannis Sgouralis et al. Biophys J. 2017.

. 2017 May 23;112(10):2021-2029.

doi: 10.1016/j.bpj.2017.04.027.

Authors

Ioannis Sgouralis¹, Steve Pressé²

Affiliations

¹ Department of Physics, Arizona State University, Tempe, Arizona.
² Department of Physics, Arizona State University, Tempe, Arizona; Department of Molecular Sciences, Arizona State University, Tempe, Arizona. Electronic address: spresse@asu.edu.

PMID: 28538142
PMCID: PMC5448313
DOI: 10.1016/j.bpj.2017.04.027

Abstract

The hidden Markov model (HMM) has been a workhorse of single-molecule data analysis and is now commonly used as a stand-alone tool in time series analysis or in conjunction with other analysis methods such as tracking. Here, we provide a conceptual introduction to an important generalization of the HMM, which is poised to have a deep impact across the field of biophysics: the infinite HMM (iHMM). As a modeling tool, iHMMs can analyze sequential data without a priori setting a specific number of states as required for the traditional (finite) HMM. Although the current literature on the iHMM is primarily intended for audiences in statistics, the idea is powerful and the iHMM's breadth in applicability outside machine learning and data science warrants a careful exposition. Here, we explain the key ideas underlying the iHMM, with a special emphasis on implementation, and provide a description of a code we are making freely available. In a companion article, we provide an important extension of the iHMM to accommodate complications such as drift.

PubMed Disclaimer

Figures

**Figure 1**
A synthetic time trace illustrating measurements of a hypothetical biomolecule that undergoes conformational transitions. (*Left*) The state space consists of conformations depicted discretely as $σ_{1}, σ_{2}, \dots$ . (*Middle*) Time series of noisy observations, $x_{n}$ , produced by the biomolecule (*blue*) and the corresponding noiseless trace (*red*). Over the time course of the measurements, the biomolecule attains only conformations $σ_{1}$ – $σ_{5}$ , though additional conformations might be visited at subsequent times. For the sake of concreteness only, we label these states in order of appearance from 1 through 5. (*Right*) Binning the collected observations reveals “emission distributions,” $F_{σ_{k}}$ , associated with each conformation. These distributions are highlighted with red lines. The centers (mean values) of the emission distributions are used to obtain the noiseless trace in the middle panel. The illustration on the left is created using data from (47) (PDB: 2N4G). To see this figure in color, go online.

**Figure 2**
Graphical representation of the HMM. In the HMM, a biomolecule of interest transitions between unobserved states $s_{n}$ according to the probability vectors ${\tilde{π}}_{s_{n}}$ and generates observations $x_{n}$ according to the probability distributions $F_{s_{n}}$ that depend on the parameter $ϕ_{s_{n}}$ . Here, following convention, the $x_{n}$ values are shaded to denote that these quantities are observed, whereas the $s_{n}$ values are hidden. Arrows denote the dependences among the model variables and red lines denote the model parameters. To see this figure in color, go online.

**Figure 3**
Graphical representation of the iHMM. The hidden Markov model that formulates the observations to be analyzed (*black lines*) is shown together with its priors (*red lines*). For completeness, we also show the concentration parameters α and γ and the prior probability distribution on the emission parameters, H, that fully characterize the iHMM. The key difference from the HMM shown in Fig. 2 is that now the model parameters ${\tilde{π}}_{σ_{k}}$ and $ϕ_{σ_{k}}$ are treated as random variables similar to the hidden states, $s_{n}$ , and observations, $x_{n}$ . For details, see the main text. To see this figure in color, go online.

**Figure 4**
Synthetic data sets resembling a hypothetical biomolecule undergoing transitions between discrete states that we analyzed with the iHMM. (*Left*) Time series $\bar{x} = (x_{1}, \dots, x_{N})$ of noisy observations. During the measuring period, the biomolecule attains five conformations, $σ_{1}, \dots, σ_{5}$ . The number of conformations are a priori unknown and the iHMM seeks to determine the probability over the number of states, as well as their properties, given the data available. In data set 1, the biomolecule transitions often through every state. By contrast, in data set 2, transitions to some states are rare. As a result, all states in data set 1 are almost equally visited throughout the experiment time course, whereas in data set 2, higher states are visited, by chance, only toward the end of the trace. (*Right*) The corresponding emission distributions, $F_{σ_{k}}$ , as obtained by simply binning the observations (*blue*) and plotting the exact ones used for the simulations (*red*). For both data sets, the emission distributions show significant overlap. In all panels, dotted lines indicate the exact mean values, $μ_{σ_{k}}$ , of the emission distributions. To see this figure in color, go online.

**Figure 5**
After some iterations, the sampler used in the iHMM to analyze data set 1 of Fig. 4 eventually converges to the correct number of states. The number of visited states, $K^{(r)}$ (*top*), and the means of the emission distributions, $μ_{σ_{k}}^{(r)}$ (*bottom*), change throughout the sampler’s iterations. Unlike the HMM, which uses a finite and fixed state space, the iHMM learns the number of available states and grows/shrinks the state space as required by the data.

**Figure 6**
We may use samples from the iHMM posterior probability to infer the size of the state space and the location of each state. In particular, we illustrate histograms for $P (K | \bar{x})$ (*top*) and $P (μ_{σ_{k}} | \bar{x})$ (*bottom*) using data set 1 of Fig. 4. In both panels, dashed lines indicate the exact (ground-truth) values used to produce the data in Fig. 4. To see this figure in color, go online.

**Figure 7**
We may use the iHMM to estimate portions of the complete state space such as those contained in different segments of data set 2 provided in Fig. 4. (*Upper*) Estimated noiseless traces for two cases: 1) using a limited segment of the full trace; and 2) using the full trace. Although only the latter case allows an estimate of all five states, both cases provide similar estimates over those states that they mutually visit. (*Lower*) Corresponding estimates of the number of states contained in each trace. To see this figure in color, go online.

See this image and copyright information in PMC

Cited by

Direct observation of Thermomyces lanuginosus lipase diffusional states by Single Particle Tracking and their remodeling by mutations and inhibition.
Bohr SS, Lund PM, Kallenbach AS, Pinholt H, Thomsen J, Iversen L, Svendsen A, Christensen SM, Hatzakis NS. Bohr SS, et al. Sci Rep. 2019 Nov 7;9(1):16169. doi: 10.1038/s41598-019-52539-1. Sci Rep. 2019. PMID: 31700110 Free PMC article.
Phosphorylation Induces Conformational Rigidity at the C-Terminal Domain of AMPA Receptors.
Chatterjee S, Ade C, Nurik CE, Carrejo NC, Dutta C, Jayaraman V, Landes CF. Chatterjee S, et al. J Phys Chem B. 2019 Jan 10;123(1):130-137. doi: 10.1021/acs.jpcb.8b10749. Epub 2018 Dec 27. J Phys Chem B. 2019. PMID: 30537817 Free PMC article.
Inferring effective forces for Langevin dynamics using Gaussian processes.
Bryan JS 4th, Sgouralis I, Pressé S. Bryan JS 4th, et al. J Chem Phys. 2020 Mar 31;152(12):124106. doi: 10.1063/1.5144523. J Chem Phys. 2020. PMID: 32241120 Free PMC article.
AutoStepfinder: A fast and automated step detection method for single-molecule analysis.
Loeff L, Kerssemakers JWJ, Joo C, Dekker C. Loeff L, et al. Patterns (N Y). 2021 Apr 30;2(5):100256. doi: 10.1016/j.patter.2021.100256. eCollection 2021 May 14. Patterns (N Y). 2021. PMID: 34036291 Free PMC article.
Bayesian inference of kinetic schemes for ion channels by Kalman filtering.
Münch JL, Paul F, Schmauder R, Benndorf K. Münch JL, et al. Elife. 2022 May 4;11:e62714. doi: 10.7554/eLife.62714. Elife. 2022. PMID: 35506659 Free PMC article.

See all "Cited by" articles

References

1. Rabiner L., Juang B. An introduction to hidden Markov models. IEEE ASSP Mag. 1986;3:4–16.
1. Eddy S.R. What is a hidden Markov model? Nat. Biotechnol. 2004;22:1315–1316. - PubMed
1. Yoon B.J. Hidden Markov models and their applications in biological sequence analysis. Curr. Genomics. 2009;10:402–415. - PMC - PubMed
1. Krogh A., Brown M., Haussler D. Hidden Markov models in computational biology. Applications to protein modeling. J. Mol. Biol. 1994;235:1501–1531. - PubMed
1. Streit R.L., Barrett R.F. Frequency line tracking using hidden Markov models. IEEE Trans. Acoust. Speech Signal Process. 1990;38:586–598.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

An Introduction to Infinite HMMs for Single-Molecule Data Analysis

Affiliations

An Introduction to Infinite HMMs for Single-Molecule Data Analysis

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials