Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Apr:89:67-86.
doi: 10.1016/j.jmp.2019.01.005. Epub 2019 Feb 13.

Thermodynamic Integration and Steppingstone Sampling Methods for Estimating Bayes Factors: A Tutorial

Affiliations

Thermodynamic Integration and Steppingstone Sampling Methods for Estimating Bayes Factors: A Tutorial

Jeffrey Annis et al. J Math Psychol. 2019 Apr.

Abstract

One of the more principled methods of performing model selection is via Bayes factors. However, calculating Bayes factors requires marginal likelihoods, which are integrals over the entire parameter space, making estimation of Bayes factors for models with more than a few parameters a significant computational challenge. Here, we provide a tutorial review of two Monte Carlo techniques rarely used in psychology that efficiently compute marginal likelihoods: thermodynamic integration (Friel & Pettitt, 2008; Lartillot & Philippe, 2006) and steppingstone sampling (Xie, Lewis, Fan, Kuo, & Chen, 2011). The methods are general and can be easily implemented in existing MCMC code; we provide both the details for implementation and associated R code for the interested reader. While Bayesian toolkits implementing standard statistical analyses (e.g., JASP Team, 2017; Morey & Rouder, 2015) often compute Bayes factors for the researcher, those using Bayesian approaches to evaluate cognitive models are usually left to compute Bayes factors for themselves. Here, we provide examples of the methods by computing marginal likelihoods for a moderately complex model of choice response time, the Linear Ballistic Accumulator model (Brown & Heathcote, 2008), and compare them to findings of Evans and Brown (2017), who used a brute force technique. We then present a derivation of TI and SS within a hierarchical framework, provide results of a model recovery case study using hierarchical models, and show an application to empirical data. A companion R package is available at the Open Science Framework: https://osf.io/jpnb4.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Estimated number of days to collect the corresponding sample size using a brute-force Monte Carlo approach to estimating the marginal likelihood with a GPU vs. a CPU. The points represent actual data and the dotted line represents the predictions. Evans and Brown (2017) found sample sizes of approximately 1e8 are sufficient for accurate estimates of marginal likelihoods for single participant LBA models with 6 parameters. For hierarchical models, more may be needed.
Figure 2.
Figure 2.
The evolution of several superimposed MCMC chains running under different temperatures. Black lines represent the location of each temperature index along the chains. The dashed lines represent the burn-in period after initializing each temperature. The initial temperature is 1 (posterior sampling) and the final temperature is 0 (prior sampling). The samples become increasingly spread out as the posterior transitions to the prior.
Figure 3.
Figure 3.
The mean log-likelihood computed under samples drawn from a posterior raised to the corresponding temperature. The area under the curve, shown in grey, is the estimate of the marginal likelihood produced from the thermodynamic integration method. Note the Monte Carlo standard error bars are not plotted because they too small to be displayed.
Figure 4.
Figure 4.
The estimated log marginal likelihood, ln p(D), plotted as a function of the number of temperature rungs, estimation method, and model type. The solid black lines and dashed black lines, show the estimated mean and standard deviation of ln p(D), respectively, from Evans and Brown (2017) who used a brute force GPU method. All means and standard deviations are based on 10 independent replications of the respective method. All error bars represent standard deviations.
Figure 5.
Figure 5.
Evidence for and against the complex model plotted in terms of the log Bayes factor as a function of the number of temperature rungs. All error bars represent standard deviations.
Figure 6.
Figure 6.
The top panel plots the marginal likelihood obtained for each model under the different methods given a null data set. The bottom panel plots the Bayes factor in terms of the null model across temperature rungs and methods. Negative Bayes factors represent evidence against the corresponding model when compared to the null model. All error bars represent standard deviations.
Figure 7.
Figure 7.
The top panel plots the marginal likelihood obtained for each model under the different methods given a data set in which drift rate varied across conditions. The bottom panel plots the Bayes factor in terms of the drift rate model across temperature rungs and methods. Negative Bayes factors represent evidence against the corresponding model when compared to the drift rate model. All error bars represent standard deviations.
Figure 8.
Figure 8.
The top panel plots the marginal likelihood for each model under the different methods given the Rae et al. data set. The bottom panel plots the Bayes factor obtained for each method and model. Negative evidence represent evidence against the corresponding model when compared to the drift rate + threshold model. All error bars represent standard deviations.

References

    1. Annis J, & Palmeri TJ (2017). Bayesian statistical approaches to evaluating cognitive models. Wiley Interdisciplinary Reviews: Cognitive Science, 9(April), e1458 10.1002/wcs.1458 - DOI - PMC - PubMed
    1. Brooks S, Gelman A, Jones G, & Meng X-L (2011). Handbook of Markov Chain Monte Carlo. Boca Raton: Chapman & Hall/CRC Press.
    1. Brown SD, & Heathcote A (2008). The simplest complete model of choice response time: Linear ballistic accumulation. Cognitive Psychology, 57(3), 153–178. 10.1016/j.cogpsych.2007.12.002 - DOI - PubMed
    1. Busemeyer JR, & Diederich A (2010). Cognitive Modeling. Thousand Oaks, CA: Sage Publications.
    1. Carlin BP, & Chib S (1995). Bayesian model choice via Markov Chain Monte Carlo methods. Journal of the Royal Statistical Society Series B (Statistical Methodology), 57(3), 473–484.

LinkOut - more resources