Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2023 Apr;30(2):516-533.
doi: 10.3758/s13423-022-02069-1. Epub 2022 Aug 15.

A Bayesian perspective on severity: risky predictions and specific hypotheses

Affiliations
Review

A Bayesian perspective on severity: risky predictions and specific hypotheses

Noah van Dongen et al. Psychon Bull Rev. 2023 Apr.

Abstract

A tradition that goes back to Sir Karl R. Popper assesses the value of a statistical test primarily by its severity: was there an honest and stringent attempt to prove the tested hypothesis wrong? For "error statisticians" such as Mayo (1996, 2018), and frequentists more generally, severity is a key virtue in hypothesis tests. Conversely, failure to incorporate severity into statistical inference, as allegedly happens in Bayesian inference, counts as a major methodological shortcoming. Our paper pursues a double goal: First, we argue that the error-statistical explication of severity has substantive drawbacks; specifically, the neglect of research context and the specificity of the predictions of the hypothesis. Second, we argue that severity matters for Bayesian inference via the value of specific, risky predictions: severity boosts the expected evidential value of a Bayesian hypothesis test. We illustrate severity-based reasoning in Bayesian statistics by means of a practical example and discuss its advantages and potential drawbacks.

Keywords: Bayes factors; Deborah Mayo; Error statistics; Karl Popper; Null hypothesis significance testing; Severity; Statistical test.

PubMed Disclaimer

Conflict of interest statement

There are no conflicts of interest.

Figures

Fig. 1
Fig. 1
SEV results for the original water plant example. In this case, H0 : μ = 150, C : μ = 153, x¯=152, and SEV = 0.159. This image is a screenshot from the Severity Demonstration application. This Shiny App was developed by Morey (2020) and can be accessed via https://richarddmorey.shinyapps.io/severity/?mu0=150&mu1=153&sigma=10&n=100&xbar=152&xmin=150&xmax=155&alpha=0.025&dir=%3E
Fig. 2
Fig. 2
SEV results for the original water plant example. In this case, H0 : μ = 100, C : μ = 153, x¯=152, and SEV = 0.159. This image is a screenshot from the Severity Demonstration application. This Shiny App was developed by Morey (2020) and can be accessed via https://richarddmorey.shinyapps.io/severity/?mu0=100&mu1=153&sigma=10&n=100&xbar=152&xmin=150&xmax=155&alpha=0.025&dir=%3E
Fig. 3
Fig. 3
Four possible relations between data and theory. Measures P and Q are both measures of some observable. The axes cover the range of possible values. The highlighted areas indicate the outcomes that are consistent with the theory. Standard errors of the observation are indicated by the error bars. In this example, the theory fits the data, though only when both theory and data are sufficiently constraint (upper left) does this provide significant evidence for the theory (this figure is published under CC-BY 4.0 and is adapted from: Roberts & Pashler, , p. 360)
Fig. 4
Fig. 4
Data predictions for the vague hypothesis. If twenty patients were to be tested in both Treatment A and Treatments B, then according to Hv we would expect to see these numbers of success with these probabilities
Fig. 5
Fig. 5
Data predictions for the specific hypothesis. If twenty patients were to be tested in both Treatment A and Treatments B, then according to Hs we would expect to see these numbers of success with these probabilities
Fig. 6
Fig. 6
Four relations between hypothesis and data. The top two graphs show the results of specific hypothesis Hs. The bottom two graphs show the results of the vague hypothesis Hv. The gradient gray areas depict the probabilities mass with respect to the hypotheses’ predicted outcomes (a top view of Figs. 5 and 4). The dotted-lines in gray visualize the hypotheses’ restrictions on the parameter values. The point estimates and standard errors are visualized as black crosses. The graphs in the left column display the results of the large data set (Ni = 20) and the graphs in the right column display the results of the small data set (Ni = 4). In this example, the data align perfectly with the hypotheses. The evidential support for the hypotheses in comparison to the encompassing model is quantified as Bayes factors (top-left of each plot). From top-left to bottom-right, these Bayes factors are 16.53, 4.37, 3.89, and 2.64 respectively
Fig. 7
Fig. 7
Specific hypotheses can yield more evidence than vague hypotheses, though in most situations this is evidence against the hypothesis. The graphs displays the size of the Bayes factor with respect to possible combinations of outcomes for the specific hypothesis Hs (left) and the vague hypothesis Hv (right). The x-axis and y-axis indicate, for treatment A and treatment B respectively, the number of successful treatments (Si) out of the 20 participants that are treated (Ni = 20). The values within the lattice indicate the evidence for the hypothesis that results from the particular combination of SA and SB. This evidence is quantified in terms of the Bayes factor for the hypothesis (left: specific, right: vague) with respect to the encompassing model. The size of the Bayes factor is also indicated as the intensity in color
Fig. 8
Fig. 8
Graphical illustration of the distribution of Bayes factors and probability of misleading evidence in the Bayesian framework for N = 36, a two-sided t-test and hypothesized effects of d = 0 and d = 0.4. Figure produced with the BFDA app https://shinyapps.org/apps/BFDA/ based on Schönbrodt and Wagenmakers (2018)
Fig. 9
Fig. 9
Graphical illustration of the distribution of Bayes factors and probability of misleading evidence in the Bayesian framework for N = 190, a two-sided t-test and hypothesized effects of d = 0 and d = 0.4. Figure produced with the BFDA app https://shinyapps.org/apps/BFDA/ based on Schönbrodt and Wagenmakers (2018)

References

    1. Ahn W-Y, Gu H, Shen Y, Haines N, Hahn HA, Teater JE, Myung JI, Pitt MA. Rapid, precise, and reliable measurement of delay discounting using a Bayesian learning algorithm. Scientific Reports. 2020;10:12091. doi: 10.1038/s41598-020-68587-x. - DOI - PMC - PubMed
    1. Berger JO, Wolpert RL. The Likelihood Principle. Hayward: Institute of Mathematical Statistics; 1984.
    1. Bernardo JM, Smith AFM. Bayesian theory. New York: Wiley; 1994.
    1. Birnbaum A. On the foundations of statistical inference (with discussion) Journal of the American Statistical Association. 1962;53:259–326.
    1. Carnap R. Logical foundations of probability. Chicago: The University of Chicago Press; 1950.

LinkOut - more resources