Review

. 2025 Jun;32(3):1007-1031.

doi: 10.3758/s13423-024-02590-5. Epub 2024 Nov 7.

Model-averaged Bayesian t tests

Maximilian Maier^#^{1

2}, František Bartoš^#^{3

4}, Daniel S Quintana^{5

6

7}, Fabian Dablander^{8

9}, Don van den Bergh³, Maarten Marsman³, Alexander Ly^{3

10}, Eric-Jan Wagenmakers³

Affiliations

¹ Department of Experimental Psychology, University College London, 26 Bedford Way 129-B, WC1H 0AP, London, UK. maximilian.maier.20@ucl.ac.uk.
² Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands. maximilian.maier.20@ucl.ac.uk.
³ Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands.
⁴ Institute of Computer Science, Czech Academy of Sciences, Prague, Czechia.
⁵ Department of Psychology, University of Oslo, Oslo, Norway.
⁶ NevSom, Department of Rare Disorders, Oslo University Hospital, Oslo, Norway.
⁷ KG Jebsen Centre for Neurodevelopmental Disorders, University of Oslo, Oslo, Norway.
⁸ Institute for Advanced Study, University of Amsterdam, Amsterdam, Netherlands.
⁹ Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Amsterdam, Netherlands.
¹⁰ Machine Learning Group, CWI Amsterdam, Amsterdam, The Netherlands.

^# Contributed equally.

PMID: 39511109
PMCID: PMC12092555
DOI: 10.3758/s13423-024-02590-5

Review

Model-averaged Bayesian t tests

Maximilian Maier et al. Psychon Bull Rev. 2025 Jun.

. 2025 Jun;32(3):1007-1031.

doi: 10.3758/s13423-024-02590-5. Epub 2024 Nov 7.

Authors

Maximilian Maier^#^{1

2}, František Bartoš^#^{3

4}, Daniel S Quintana^{5

6

7}, Fabian Dablander^{8

9}, Don van den Bergh³, Maarten Marsman³, Alexander Ly^{3

10}, Eric-Jan Wagenmakers³

Affiliations

¹ Department of Experimental Psychology, University College London, 26 Bedford Way 129-B, WC1H 0AP, London, UK. maximilian.maier.20@ucl.ac.uk.
² Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands. maximilian.maier.20@ucl.ac.uk.
³ Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands.
⁴ Institute of Computer Science, Czech Academy of Sciences, Prague, Czechia.
⁵ Department of Psychology, University of Oslo, Oslo, Norway.
⁶ NevSom, Department of Rare Disorders, Oslo University Hospital, Oslo, Norway.
⁷ KG Jebsen Centre for Neurodevelopmental Disorders, University of Oslo, Oslo, Norway.
⁸ Institute for Advanced Study, University of Amsterdam, Amsterdam, Netherlands.
⁹ Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Amsterdam, Netherlands.
¹⁰ Machine Learning Group, CWI Amsterdam, Amsterdam, The Netherlands.

^# Contributed equally.

PMID: 39511109
PMCID: PMC12092555
DOI: 10.3758/s13423-024-02590-5

Abstract

One of the most common statistical analyses in experimental psychology concerns the comparison of two means using the frequentist t test. However, frequentist t tests do not quantify evidence and require various assumption tests. Recently, popularized Bayesian t tests do quantify evidence, but these were developed for scenarios where the two populations are assumed to have the same variance. As an alternative to both methods, we outline a comprehensive t test framework based on Bayesian model averaging. This new t test framework simultaneously takes into account models that assume equal and unequal variances, and models that use t-likelihoods to improve robustness to outliers. The resulting inference is based on a weighted average across the entire model ensemble, with higher weights assigned to models that predicted the observed data well. This new t test framework provides an integrated approach to assumption checks and inference by applying a series of pertinent models to the data simultaneously rather than sequentially. The integrated Bayesian model-averaged t tests achieve robustness without having to commit to a single model following a series of assumption checks. To facilitate practical applications, we provide user-friendly implementations in JASP and via the $RoBTT$ package in $R$ . A tutorial video is available at https://www.youtube.com/watch?v=EcuzGTIcorQ.

Keywords: t-likelihood; t test; Bayes factor; Bayesian model-averaging; Robust inference; Unequal variances.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing Interests: The authors declare no competing interests.

Figures

**Fig. 1**
Social punishment in the control condition vs. the biological markets condition. Data from Pleasant and Barclay (2018), available at https://tinyurl.com/mwpuhpx8. Figure created in JASP: JASP Team, (2022)

**Fig. 2**
Results from a sequential Bayesian equal-variance t test applied to the data from Pleasant and Barclay (2018). The *left panel* shows the Bayes factor in favor of an effect and the *right panel* shows the probability of $H_{1}$ and $H_{0}$ as the data accumulate

**Fig. 3**
Prior and posterior distribution for Cohen’s $δ$ for the Bayesian equal-variance t test under $H_{1}$

**Fig. 4**
Results from a sequential Bayesian Welch t test applied to the data from Pleasant and Barclay (2018). The *left panel* shows the Bayes factor in favor of an effect and the *right panel* shows the probability of $H_{1}$ and $H_{0}$ as the data accumulate

**Fig. 5**
Prior and posterior distribution for Cohen’s $δ$ and $ρ$ for the Bayesian Welch t test under $H_{1}$ . The *left panel* shows the prior and posterior distribution for $δ$ under $H_{1}$ ; the *right panel* shows the prior and posterior distribution for the standard deviation ratio (note the logarithmic scaling of x-axis)

**Fig. 6**
Default prior model probabilities of the model-averaged Bayesian t test. Marginal model probabilities are displayed on the nodes and conditional model probabilities are displayed on the edges

**Fig. 7**
Results from a sequential model-averaged Bayesian t test applied to the data from Pleasant and Barclay (2018). The *left panel* shows the inclusion Bayes factor in favor of a difference in means and in favor of unequal variances. The *right panel* shows the probability of the four different models as the data accumulate. Note that the last two Bayes factors in favor of unequal variances are 147.33 and 256.00 and therefore outside the plotting range

**Fig. 8**
Posterior model probabilities of the model-averaged Bayesian t test applied to the data from Pleasant and Barclay (2018). Total probabilities are displayed on the nodes and conditional probabilities on the edges. $H_{0}$ denotes the models assuming the null hypotheses to be true, $H_{1}$ denotes the models assuming the alternative hypotheses to be true; $H^{\bar{ρ}}$ denotes equal-variance models, and $H^{ρ}$ denotes the unequal-variance models

**Fig. 9**
Prior and posterior distribution for Cohen’s $δ$ and $ρ$ for the Bayesian model-averaged t test. The *left panel* shows the conditional prior and posterior distribution for $δ$ assuming an effect to be present; the *right panel* shows the conditional prior and posterior distribution for the standard deviation ratio assuming unequal variance (note the logarithmic scaling of x-axis)

**Fig. 10**
The t-distribution has thicker tails than the normal distribution

**Fig. 11**
Prior model probabilities of the robust model-averaged t test. Marginal model probabilities are displayed on the nodes and conditional model probabilities on the edges. $H_{0}$ denotes the models assuming the null hypotheses of equal means to be true, $H_{1}$ denotes the models assuming the alternative hypotheses to be true. $H^{\bar{ρ}}$ denotes equal-variance models, $H^{ρ}$ denotes the unequal-variance models. $H^{t}$ denotes the models using t-likelihoods and $H^{n}$ denotes the models using normal likelihood

**Fig. 12**
Results from a sequential robust model-averaged Bayesian t test applied to the data from Pleasant and Barclay (2018). The *left panel* shows the inclusion Bayes factor in favor of an effect, unequal variances, and outliers. The *right panel* shows the probability of the eight different models as the data accumulate

**Fig. 13**
Posterior model probabilities of the robust model-averaged Bayesian t test applied to the data from Pleasant and Barclay (2018). Total probabilities are displayed on the nodes and conditional probabilities on the edges. $H_{0}$ denotes the models assuming the null hypotheses to be true, $H_{1}$ denotes the models assuming the alternative hypotheses to be true. $H^{\bar{ρ}}$ denotes equal-variance models, $H^{ρ}$ denotes the unequal-variance models. $H^{t}$ denotes the models using t-likelihoods and $H^{n}$ denotes the models using normal likelihood

**Fig. 14**
Prior and posterior distribution for Cohen’s $δ$ , $ρ$ , and t for the robust model-averaged Bayesian t test. All panels show the conditional prior and posterior distributions assuming the parameter to be present

**Fig. 15**
Discernment scores for the accuracy nudge treatment and control condition are better captured by a t-distribution than a normal distribution. t-distributions are displayed as *full lines* and normal distributions as *dashed lines*. Data from Roozenbeek et al. (2021)

**Fig. 16**
Evidence distortion of the Bayes factor for the difference in means for different methods and conditions under equal sample sizes. The four methods are the Student’s t test (*green*), the Welch t test (*yellow*), a model-averaged version of t test that combines Student’s and Welch’s t test (MB t test; *blue*) and a version that also incorporates uncertainty about the outliers (RoMB t test; *red*). Whenever the difference in means is present (second row) then $δ$ = 0.5. Whenever the variances are unequal (columns 2 and 3) SDR is 2, whenever the data are simulated from a t-distribution (column 3) this was done with $ν$ = 5 degrees of freedom

**Fig. 17**
Evidence distortion of the Bayes factor for the difference in means for different methods and conditions under unequal sample sizes. The four methods are the Student’s t test (*green*), the Welch’s t test (*yellow*), a model-averaged version of t test that combines Student’s and Welch’s t test (MB t test; *blue*) and a version that also incorporates uncertainty about the outliers (RoMB t test; *red*). Whenever the difference in means is present (second row) then $δ$ = 0.5. Whenever the variances are unequal (columns 2 and 3) SDR is 2, whenever the data are simulated from a t-distribution (column 3) this was done with $ν$ = 5 degrees of freedom.

**Fig. 18**
Comparison of Bayes factors from Bayesian Student’s t test and Bayesian Welch t test on a sample with unequal sample size. *Top left*: Bayesian Student’s t test; *top right*: Bayesian Welch t test; *bottom left*: ratio of evidence from the two tests (log scaled). Positive mean difference and standard deviation ratios larger than one correspond to larger means and standard deviations in the larger sample group

**Fig. 19**
Prior posterior plot for Cohen’s $δ$ , $ρ$ , and t. All panels show the conditional prior and posterior distributions assuming the parameter to be present

**Fig. 20**
Root mean squared error (RMSE with 95% CI) of the posterior $δ$ for different methods and conditions. The root mean squared error (and 95% CI) for the posterior distributions of the effect size $δ$ (y-axis) across samples sizes (x-axis) for different version of the Bayesian t tests (in color); Student-t test (*green*), Welch t test (*yellow*), and the model-averaged version of t test that combines Student’s and Welch t test (MB; *blue*) and also incorporates uncertainty about the outliers (RoBM; *red*). Panel A corresponds to a condition with an effect ( $δ$ = 0.5) equal-variances and no outliers, panel B corresponds to a condition with an effect ( $δ$ = 0.5), unequal variances (SDR = 2), and the absence of outliers, and Panel C corresponds to a condition with the effect ( $δ$ = 0.5), unequal variances (standard deviation ratio = 2), and outliers (data simulated from a Student-t distribution with $t = 5$ degrees of freedom)

**Fig. 21**
The four methods are the Student’s-t test (*green*), the Welch t test (*yellow*), a model-averaged version of t test that combines Student’s and Welch t test (MB t test; *blue*) and a version that also incorporates uncertainty about the outliers (RoMB t test; *red*). Whenever the difference in means is present (second row) then $δ$ = 0.5. Whenever the variances are unequal (columns 2 and 3) SDR is 2, whenever the data are simulated from a t-distribution (column 3) this was done with $ν$ = 5 degrees of freedom

See this image and copyright information in PMC

References

1. Alipourfard, N., Arendt, B., Benjamin, D. M., Benkler, N., Bishop, M., Burstein, M., ... Clark, C., Et al. (2021). Systematizing confidence in open research and evidence (score).
1. Barbieri, A., Marin, J. M., & Florin, K. (2016). A fully objective Bayesian approach for the Behrens-Fisher problem using historical studies. arXiv:1611.06873
1. Bartolucci, A. A., Blanchard, P. D., Howell, W. M., & Singh, K. P. (1998). A Bayesian Behrens-Fisher solution to a problem in taxonomy. Environmental Modelling & Software,13(1), 25–29. 10.1016/S1364-8152(97)00033-9
1. Bartoš, F., & Maier, M. (2022). RoBTT: An R package for robust Bayesian t-test.[SPACE]https://CRAN.R-project.org/package=RoBTT. (R package)
1. Bartoš, F., Gronau, Q. F., Timmers, B., Otte, W. M., Ly, A., & Wagenmakers, E. J. (2021). Bayesian model-averaged meta-analysis in medicine. Statistics in Medicine,40(30), 6743–6761. 10.1002/sim.9170 - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- PubMed Central
- Springer
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Model-averaged Bayesian t tests

Affiliations

Model-averaged Bayesian t tests

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials