Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Aug;74(3):497-507.
doi: 10.1177/08465371221139418. Epub 2022 Nov 22.

Is There Evidence of P-Hacking in Imaging Research?

Affiliations

Is There Evidence of P-Hacking in Imaging Research?

Paul Rooprai et al. Can Assoc Radiol J. 2023 Aug.

Abstract

Background: P-hacking, the tendency to run selective analyses until they become significant, is prevalent in many scientific disciplines.

Purpose: This study aims to assess if p-hacking exists in imaging research.

Methods: Protocol, data, and code available here https://osf.io/xz9ku/?view_only=a9f7c2d841684cb7a3616f567db273fa. We searched imaging journals Ovid MEDLINE from 1972 to 2021. Text mining using Python script was used to collect metadata: journal, publication year, title, abstract, and P-values from abstracts. One P-value was randomly sampled per abstract. We assessed for evidence of p-hacking using a p-curve, by evaluating for a concentration of P-values just below .05. We conducted a one-tailed binomial test (α = .05 level of significance) to assess whether there were more P-values falling in the upper range (e.g., .045 < P < .05) than in the lower range (e.g., .04 < P < .045). To assess variation in results introduced by our random sampling of a single P-value per abstract, we repeated the random sampling process 1000 times and pooled results across the samples. Analysis was done (divided into 10-year periods) to determine if p-hacking practices evolved over time.

Results: Our search of 136 journals identified 967,981 abstracts. Text mining identified 293,687 P-values, and a total of 4105 randomly sampled P-values were included in the p-hacking analysis. The number of journals and abstracts that were included in the analysis as a fraction and percentage of the total number was, respectively, 108/136 (80%) and 4105/967,981 (.4%). P-values did not concentrate just under .05; in fact, there were more P-values falling in the lower range (e.g., .04 < P < .045) than falling just below .05 (e.g., .045 < P < .05), indicating lack of evidence for p-hacking. Time trend analysis did not identify p-hacking in any of the five 10-year periods.

Conclusion: We did not identify evidence of p-hacking in abstracts published in over 100 imaging journals since 1972. These analyses cannot detect all forms of p-hacking, and other forms of bias may exist in imaging research such as publication bias and selective outcome reporting.

Keywords: epidemiology; Evidence-Based Practice; Reporting Bias; statistics.

PubMed Disclaimer

Conflict of interest statement

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

Figure 1.
Figure 1.
Hypothetical distributions of effects across studies and the effect of p-hacking on the p-curve. Adapted from Head et al1. Copyright: © 2015 Head et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. (A) Black line—the hypothetical distribution of P-values is uniform when the effect size for a studied phenomenon is zero; red line—the effect of p-hacking shifts the distribution from being flat to left skewed. (B) Black line—the hypothetical distribution of P-values is exponential with right skew when the true effect size is nonzero; red line—the effect of p-hacking results in an exponential distribution with right skew, but there is an overrepresentation of P-values in the tail of the distribution just below the significance threshold (P = .05).
Figure 2.
Figure 2.
Filtering process. The number of journals, abstracts, and P-values included after each step in the filtering process is outlined above. This process includes P-values that are <.05, have equality expression (i.e., “p =”), are in the range we have designated for the p-hacking analysis (e.g., .04 < P < .05), and random sampling of one P-value per abstract.
Figure 3.
Figure 3.
P-curve for a subset of the dataset (P < .05). P-curve constructed from over 100,000 P-values demonstrates a right skew. The shape of the p-curve is consistent with no p-hacking.

References

    1. Head ML, Holman L, Lanfear R, Kahn AT, Jennions MD. The extent and consequences of p-hacking in science. PLoS Biol. 2015;13(3):e1002106. - PMC - PubMed
    1. Rosenthal R The file drawer problem and tolerance for null results. Psychol Bull. 1979;86(3):638–641.
    1. Mathur M Sensitivity Analysis for P-Hacking in Meta-Analyses. OSF Preprints; 2022. https://osf.io/ezjsx/
    1. Simonsohn U, Nelson LD, Simmons JP. P-curve: a key to the file-drawer. J Exp Psychol Gen. 2014;143(2):534–547. - PubMed
    1. Winter JC, Dodou D. A surge of p-values between 0.041 and 0.049 in recent decades (but negative results are increasing rapidly too). PeerJ. 2015;3:e733. - PMC - PubMed