Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017;112(517):1-10.
doi: 10.1080/01621459.2016.1240079. Epub 2016 Oct 7.

On the Reproducibility of Psychological Science

Affiliations

On the Reproducibility of Psychological Science

Valen E Johnson et al. J Am Stat Assoc. 2017.

Abstract

Investigators from a large consortium of scientists recently performed a multi-year study in which they replicated 100 psychology experiments. Although statistically significant results were reported in 97% of the original studies, statistical significance was achieved in only 36% of the replicated studies. This article presents a reanalysis of these data based on a formal statistical model that accounts for publication bias by treating outcomes from unpublished studies as missing data, while simultaneously estimating the distribution of effect sizes for those studies that tested nonnull effects. The resulting model suggests that more than 90% of tests performed in eligible psychology experiments tested negligible effects, and that publication biases based on p-values caused the observed rates of nonreproducibility. The results of this reanalysis provide a compelling argument for both increasing the threshold required for declaring scientific discoveries and for adopting statistical summaries of evidence that account for the high proportion of tested hypotheses that are false. Supplementary materials for this article are available online.

Keywords: Bayes factor; Null hypothesis significance test; Posterior model probability; Publication bias; Reproducibility; Significance test.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Normal moment prior. This density function is used to model the marginal distribution of z-transformed effect sizes when the alternative hypothesis is true. The curves in blue, black, and red represent the moment priors corresponding to τ = 0.060, 0.088, and 0.125, respectively. These values correspond to the lower (blue) and upper (red) boundaries of the 95% credible interval and posterior mean (black) for τ based on the OSC data.
Figure 2
Figure 2
Cells used to compute Pearson’s chi-squared goodness-of-fit statistic. Left panel: Cells used for moment prior on nonnull effect sizes. Right panel: Cells used for normal prior on nonnull effect sizes. Under both models, the probability assigned to the cells A, B, and C is 1/3.
Figure 3
Figure 3
Histogram of posterior samples of Pearson’s chi-squared test for goodness of fit under the moment (left panel) and normal prior (right panel) models for the nonnull effect sizes.
Figure 4
Figure 4
Posterior probabilities of null hypotheses versus p-values based on the posterior means of the parameters π0 and τ estimated from the OSC data. Based on a moment prior model for the nonnull effect sizes. The sample sizes upon which the comparisons are based (n = 10, 30, or 100) are indicated in the plot. The curve labeled UMPBT was obtained by replacing the moment prior density on the nonnull effect sizes with the uniformly most powerful Bayesian test alternative that has the same rejection region as a frequentist test of size 0.005.
Figure 5
Figure 5
Posterior probabilities of null hypotheses versus p-values based on the posterior means of the parameters π0 and τ estimated from the OSC data. Similar to Figure 4, except that a normal distribution was imposed on the distribution of the nonnull effect sizes.

References

    1. Begley CG, Ellis LM. Drug Development: Raise Standards for Preclinical Cancer Research. Nature. 2012;483:531–533. - PubMed
    1. Berger JO, Delampady M. Testing Precise Hypotheses. Statistical Science. 1987;2:317–335.
    1. Caraux G, Gascuel O. Bounds on Distributions Functions of Order Statistics for Dependent Variates. Statistics & Probability Letters. 1992;14:103–105.
    1. Fanelli D. ‘Positive’ Results Increase Down the Hierarchy of the Sciences. PLoS One. 2010;5:e10068. doi: 10.1371/journal.pone. - DOI - PMC - PubMed
    1. Fisher RA. Frequency Distribution of the Values of the Correlation Coefficient in Samples from an Indefinitely Large Population. Biometrika. 1915;10:507–521.