Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 May-Jun;53(3):430-451.
doi: 10.1080/00273171.2018.1428892. Epub 2018 Feb 9.

Bayesian Latent Class Analysis Tutorial

Affiliations

Bayesian Latent Class Analysis Tutorial

Yuelin Li et al. Multivariate Behav Res. 2018 May-Jun.

Abstract

This article is a how-to guide on Bayesian computation using Gibbs sampling, demonstrated in the context of Latent Class Analysis (LCA). It is written for students in quantitative psychology or related fields who have a working knowledge of Bayes Theorem and conditional probability and have experience in writing computer programs in the statistical language R . The overall goals are to provide an accessible and self-contained tutorial, along with a practical computation tool. We begin with how Bayesian computation is typically described in academic articles. Technical difficulties are addressed by a hypothetical, worked-out example. We show how Bayesian computation can be broken down into a series of simpler calculations, which can then be assembled together to complete a computationally more complex model. The details are described much more explicitly than what is typically available in elementary introductions to Bayesian modeling so that readers are not overwhelmed by the mathematics. Moreover, the provided computer program shows how Bayesian LCA can be implemented with relative ease. The computer program is then applied in a large, real-world data set and explained line-by-line. We outline the general steps in how to extend these considerations to other methodological applications. We conclude with suggestions for further readings.

Keywords: Bayesian analysis; Gibbs sampling; Latent Class Analysis; Markov chain Monte Carlo.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest Disclosures: Each author signed a form for disclosure of potential conflicts of interest. No authors reported any financial or other conflicts of interest in relation to the work described.

Figures

Figure 1.
Figure 1.
Examples of Beta distributions representing varying degrees of evidence strength over all possible values of the proportion. Subplot (a) shows a variable in which 3 of the 10 persons in the first latent group endorse the symptom; thus the peak is at around 30%. There is uncertainty because an observation of 3 out of 10 can still arise from an underlying probability of 50% (although much less likely). Subplot (b) is flatter, indicating a weaker evidence from 4 persons. Subplot (c) shows a flat Beta prior representing complete uncertainty about the proportion. Subplots (d) to (f) show how the priors above converge to posteriors of a similar shape if combined with a new, larger sample of 120 patients with 45 endorsing a symptom.
Figure 2.
Figure 2.
A Dirichlet distribution for a multinomial variable with three categories. Subplot (a) shows the Dirichlet density in 3-dimensions, where the overall shape of the distribution is defined by the prior sample size u = [10, 6, 4], over the first and second percentages, labeled as p1 and p2, respectively. In (b), the contour of the 3-dimensional surface is plotted to visualize the mode of the distribution near (0.5, 0.3, 0.2).
Figure 3.
Figure 3.
Illustration of the label switching problem. Plotted here are MCMC simulation traces for πj using a randomly selected subset of 200 observations from the full Add Health data. We fitted a simpler model with 3 latent classes instead of 4 to show a more visible pattern. Subplots (a) – (c) are the traces for the original MCMC chain. Subplots (d) – (f) are the relabeled chains. Signs of label switching are present in (a) where the chain first hovers around π1 0.60 and then it drifts to the proximity of 0.15 near the 2,000th iteration. This is corroborated by (c) where π3 drifts up to 0.60 exactly when π1 drifts down to 0.15, an indication that π1 and π3 switched labels during simulation. Similarly, signs of label switching are also visible in the π2 chain, although not as pronounced. The relabeled MCMC chains in (d) – (f) show visibly improved simulation traces.

References

    1. Albert J (2007). Bayesian computation with R. New York: Springer. doi:10.1007/978-0-387-92298-0 - DOI
    1. Andrieu C, De Freitas N, Doucet A, & Jordan MI (2003). An introduction to mcmc for machine learning. Machine Learning, 50, 5–43. doi:10.1023/A:1020281327116 - DOI
    1. Berry DA (1995). Statistics: a bayesian perspective. Belmont, CA: Duxbury Press.
    1. Celeux G, Hurn M, & Robert CP (2000). Computational and inferential difficulties with mixture posterior distributions. Journal of the American Statistical Association, 95 (451), 957–970. doi:10.1080/00031305.1992.10475878 - DOI
    1. Collins LM & Lanza ST [Stephanie T.]. (2010). Latent class and latent transition analysis. Hoboken, NJ: John Wiley & Sons, Ltd. doi:10.1002/9780470567333 - DOI