Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2025 Jun;32(3):1007-1031.
doi: 10.3758/s13423-024-02590-5. Epub 2024 Nov 7.

Model-averaged Bayesian t tests

Affiliations
Review

Model-averaged Bayesian t tests

Maximilian Maier et al. Psychon Bull Rev. 2025 Jun.

Abstract

One of the most common statistical analyses in experimental psychology concerns the comparison of two means using the frequentist t test. However, frequentist t tests do not quantify evidence and require various assumption tests. Recently, popularized Bayesian t tests do quantify evidence, but these were developed for scenarios where the two populations are assumed to have the same variance. As an alternative to both methods, we outline a comprehensive t test framework based on Bayesian model averaging. This new t test framework simultaneously takes into account models that assume equal and unequal variances, and models that use t-likelihoods to improve robustness to outliers. The resulting inference is based on a weighted average across the entire model ensemble, with higher weights assigned to models that predicted the observed data well. This new t test framework provides an integrated approach to assumption checks and inference by applying a series of pertinent models to the data simultaneously rather than sequentially. The integrated Bayesian model-averaged t tests achieve robustness without having to commit to a single model following a series of assumption checks. To facilitate practical applications, we provide user-friendly implementations in JASP and via the RoBTT package in R . A tutorial video is available at https://www.youtube.com/watch?v=EcuzGTIcorQ.

Keywords: t-likelihood; t test; Bayes factor; Bayesian model-averaging; Robust inference; Unequal variances.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing Interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Social punishment in the control condition vs. the biological markets condition. Data from Pleasant and Barclay (2018), available at https://tinyurl.com/mwpuhpx8. Figure created in JASP: JASP Team, (2022)
Fig. 2
Fig. 2
Results from a sequential Bayesian equal-variance t test applied to the data from Pleasant and Barclay (2018). The left panel shows the Bayes factor in favor of an effect and the right panel shows the probability of H1 and H0 as the data accumulate
Fig. 3
Fig. 3
Prior and posterior distribution for Cohen’s δ for the Bayesian equal-variance t test under H1
Fig. 4
Fig. 4
Results from a sequential Bayesian Welch t test applied to the data from Pleasant and Barclay (2018). The left panel shows the Bayes factor in favor of an effect and the right panel shows the probability of H1 and H0 as the data accumulate
Fig. 5
Fig. 5
Prior and posterior distribution for Cohen’s δ and ρ for the Bayesian Welch t test under H1. The left panel shows the prior and posterior distribution for δ under H1; the right panel shows the prior and posterior distribution for the standard deviation ratio (note the logarithmic scaling of x-axis)
Fig. 6
Fig. 6
Default prior model probabilities of the model-averaged Bayesian t test. Marginal model probabilities are displayed on the nodes and conditional model probabilities are displayed on the edges
Fig. 7
Fig. 7
Results from a sequential model-averaged Bayesian t test applied to the data from Pleasant and Barclay (2018). The left panel shows the inclusion Bayes factor in favor of a difference in means and in favor of unequal variances. The right panel shows the probability of the four different models as the data accumulate. Note that the last two Bayes factors in favor of unequal variances are 147.33 and 256.00 and therefore outside the plotting range
Fig. 8
Fig. 8
Posterior model probabilities of the model-averaged Bayesian t test applied to the data from Pleasant and Barclay (2018). Total probabilities are displayed on the nodes and conditional probabilities on the edges. H0 denotes the models assuming the null hypotheses to be true, H1 denotes the models assuming the alternative hypotheses to be true; Hρ¯ denotes equal-variance models, and Hρ denotes the unequal-variance models
Fig. 9
Fig. 9
Prior and posterior distribution for Cohen’s δ and ρ for the Bayesian model-averaged t test. The left panel shows the conditional prior and posterior distribution for δ assuming an effect to be present; the right panel shows the conditional prior and posterior distribution for the standard deviation ratio assuming unequal variance (note the logarithmic scaling of x-axis)
Fig. 10
Fig. 10
The t-distribution has thicker tails than the normal distribution
Fig. 11
Fig. 11
Prior model probabilities of the robust model-averaged t test. Marginal model probabilities are displayed on the nodes and conditional model probabilities on the edges. H0 denotes the models assuming the null hypotheses of equal means to be true, H1 denotes the models assuming the alternative hypotheses to be true. Hρ¯ denotes equal-variance models, Hρ denotes the unequal-variance models. Ht denotes the models using t-likelihoods and Hn denotes the models using normal likelihood
Fig. 12
Fig. 12
Results from a sequential robust model-averaged Bayesian t test applied to the data from Pleasant and Barclay (2018). The left panel shows the inclusion Bayes factor in favor of an effect, unequal variances, and outliers. The right panel shows the probability of the eight different models as the data accumulate
Fig. 13
Fig. 13
Posterior model probabilities of the robust model-averaged Bayesian t test applied to the data from Pleasant and Barclay (2018). Total probabilities are displayed on the nodes and conditional probabilities on the edges. H0 denotes the models assuming the null hypotheses to be true, H1 denotes the models assuming the alternative hypotheses to be true. Hρ¯ denotes equal-variance models, Hρ denotes the unequal-variance models. Ht denotes the models using t-likelihoods and Hn denotes the models using normal likelihood
Fig. 14
Fig. 14
Prior and posterior distribution for Cohen’s δ, ρ, and t for the robust model-averaged Bayesian t test. All panels show the conditional prior and posterior distributions assuming the parameter to be present
Fig. 15
Fig. 15
Discernment scores for the accuracy nudge treatment and control condition are better captured by a t-distribution than a normal distribution. t-distributions are displayed as full lines and normal distributions as dashed lines. Data from Roozenbeek et al. (2021)
Fig. 16
Fig. 16
Evidence distortion of the Bayes factor for the difference in means for different methods and conditions under equal sample sizes. The four methods are the Student’s t test (green), the Welch t test (yellow), a model-averaged version of t test that combines Student’s and Welch’s t test (MB t test; blue) and a version that also incorporates uncertainty about the outliers (RoMB t test; red). Whenever the difference in means is present (second row) then δ = 0.5. Whenever the variances are unequal (columns 2 and 3) SDR is 2, whenever the data are simulated from a t-distribution (column 3) this was done with ν = 5 degrees of freedom
Fig. 17
Fig. 17
Evidence distortion of the Bayes factor for the difference in means for different methods and conditions under unequal sample sizes. The four methods are the Student’s t test (green), the Welch’s t test (yellow), a model-averaged version of t test that combines Student’s and Welch’s t test (MB t test; blue) and a version that also incorporates uncertainty about the outliers (RoMB t test; red). Whenever the difference in means is present (second row) then δ = 0.5. Whenever the variances are unequal (columns 2 and 3) SDR is 2, whenever the data are simulated from a t-distribution (column 3) this was done with ν = 5 degrees of freedom.
Fig. 18
Fig. 18
Comparison of Bayes factors from Bayesian Student’s t test and Bayesian Welch t test on a sample with unequal sample size. Top left: Bayesian Student’s t test; top right: Bayesian Welch t test; bottom left: ratio of evidence from the two tests (log scaled). Positive mean difference and standard deviation ratios larger than one correspond to larger means and standard deviations in the larger sample group
Fig. 19
Fig. 19
Prior posterior plot for Cohen’s δ, ρ, and t. All panels show the conditional prior and posterior distributions assuming the parameter to be present
Fig. 20
Fig. 20
Root mean squared error (RMSE with 95% CI) of the posterior δ for different methods and conditions. The root mean squared error (and 95% CI) for the posterior distributions of the effect size δ (y-axis) across samples sizes (x-axis) for different version of the Bayesian t tests (in color); Student-t test (green), Welch t test (yellow), and the model-averaged version of t test that combines Student’s and Welch t test (MB; blue) and also incorporates uncertainty about the outliers (RoBM; red). Panel A corresponds to a condition with an effect (δ = 0.5) equal-variances and no outliers, panel B corresponds to a condition with an effect (δ = 0.5), unequal variances (SDR = 2), and the absence of outliers, and Panel C corresponds to a condition with the effect (δ = 0.5), unequal variances (standard deviation ratio = 2), and outliers (data simulated from a Student-t distribution with t=5 degrees of freedom)
Fig. 21
Fig. 21
The four methods are the Student’s-t test (green), the Welch t test (yellow), a model-averaged version of t test that combines Student’s and Welch t test (MB t test; blue) and a version that also incorporates uncertainty about the outliers (RoMB t test; red). Whenever the difference in means is present (second row) then δ = 0.5. Whenever the variances are unequal (columns 2 and 3) SDR is 2, whenever the data are simulated from a t-distribution (column 3) this was done with ν = 5 degrees of freedom

References

    1. Alipourfard, N., Arendt, B., Benjamin, D. M., Benkler, N., Bishop, M., Burstein, M., ... Clark, C., Et al. (2021). Systematizing confidence in open research and evidence (score).
    1. Barbieri, A., Marin, J. M., & Florin, K. (2016). A fully objective Bayesian approach for the Behrens-Fisher problem using historical studies. arXiv:1611.06873
    1. Bartolucci, A. A., Blanchard, P. D., Howell, W. M., & Singh, K. P. (1998). A Bayesian Behrens-Fisher solution to a problem in taxonomy. Environmental Modelling & Software,13(1), 25–29. 10.1016/S1364-8152(97)00033-9
    1. Bartoš, F., & Maier, M. (2022). RoBTT: An R package for robust Bayesian t-test.[SPACE]https://CRAN.R-project.org/package=RoBTT. (R package)
    1. Bartoš, F., Gronau, Q. F., Timmers, B., Otte, W. M., Ly, A., & Wagenmakers, E. J. (2021). Bayesian model-averaged meta-analysis in medicine. Statistics in Medicine,40(30), 6743–6761. 10.1002/sim.9170 - PMC - PubMed

LinkOut - more resources