Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Aug 6;121(32):e2403490121.
doi: 10.1073/pnas.2403490121. Epub 2024 Jul 30.

Heterogeneity in effect size estimates

Affiliations

Heterogeneity in effect size estimates

Felix Holzmeister et al. Proc Natl Acad Sci U S A. .

Abstract

A typical empirical study involves choosing a sample, a research design, and an analysis path. Variation in such choices across studies leads to heterogeneity in results that introduce an additional layer of uncertainty, limiting the generalizability of published scientific findings. We provide a framework for studying heterogeneity in the social sciences and divide heterogeneity into population, design, and analytical heterogeneity. Our framework suggests that after accounting for heterogeneity, the probability that the tested hypothesis is true for the average population, design, and analysis path can be much lower than implied by nominal error rates of statistically significant individual studies. We estimate each type's heterogeneity from 70 multilab replication studies, 11 prospective meta-analyses of studies employing different experimental designs, and 5 multianalyst studies. In our data, population heterogeneity tends to be relatively small, whereas design and analytical heterogeneity are large. Our results should, however, be interpreted cautiously due to the limited number of studies and the large uncertainty in the heterogeneity estimates. We discuss several ways to parse and account for heterogeneity in the context of different methodologies.

Keywords: generalizability; heterogeneity; metascience.

PubMed Disclaimer

Conflict of interest statement

Competing interests statement:The authors declare no competing interest.

Figures

Fig. 1.
Fig. 1.
The figure illustrates the effective PSPG, i.e., the ratio of true positive results to the total number of positive classifications in the presence of heterogeneity, for different prior probabilities for the alternative hypothesis being genuinely true (ϕ), as a function of the heterogeneity factor H for a two-tailed z-test with nominal statistical power of π = 90% at the nominal α = 5% level.
Fig. 2.
Fig. 2.
Empirical estimates of population, design, and analytical heterogeneity. (A) The figure shows estimates of the heterogeneity factor H for 70 estimates from 13 papers isolating population heterogeneity (–66), 11 estimates from two papers isolating design heterogeneity (67, 68), and five estimates from three papers isolating analytical heterogeneity (–71). The vertical reference lines indicate benchmark levels for small, medium, and large heterogeneity based on I2 values of 25% (H = 1.15), 50% (H = 1.41), and 75% (H = 2), respectively. (B) The figure shows box plots of the distribution of heterogeneity factors H, separated by the source of heterogeneity, illustrated in panel (A).

References

    1. Wicherts J. M., et al. , Degrees of freedom in planning, running, analyzing, and reporting psychological studies: A checklist to avoid p-hacking. Front. Psychol. 7, 1832 (2016). - PMC - PubMed
    1. Gelman A., Loken E., The statistical crisis in science. Am. Sci. 102, 460 (2014).
    1. Martinson B. C., Anderson M. S., de Vries R., Scientists behaving badly. Nature 435, 737–738 (2005). - PubMed
    1. Giles J., Breeding cheats. Nature 445, 242–243 (2007). - PubMed
    1. Bakker M., van Dijk A., Wicherts J. M., The rules of the game called psychological science. Perspect. Psychol. Sci. 7, 543–554 (2012). - PubMed

LinkOut - more resources