Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Feb 18:4:e1715.
doi: 10.7717/peerj.1715. eCollection 2016.

Problems in using p-curve analysis and text-mining to detect rate of p-hacking and evidential value

Affiliations

Problems in using p-curve analysis and text-mining to detect rate of p-hacking and evidential value

Dorothy V M Bishop et al. PeerJ. .

Abstract

Background. The p-curve is a plot of the distribution of p-values reported in a set of scientific studies. Comparisons between ranges of p-values have been used to evaluate fields of research in terms of the extent to which studies have genuine evidential value, and the extent to which they suffer from bias in the selection of variables and analyses for publication, p-hacking. Methods. p-hacking can take various forms. Here we used R code to simulate the use of ghost variables, where an experimenter gathers data on several dependent variables but reports only those with statistically significant effects. We also examined a text-mined dataset used by Head et al. (2015) and assessed its suitability for investigating p-hacking. Results. We show that when there is ghost p-hacking, the shape of the p-curve depends on whether dependent variables are intercorrelated. For uncorrelated variables, simulated p-hacked data do not give the "p-hacking bump" just below .05 that is regarded as evidence of p-hacking, though there is a negative skew when simulated variables are inter-correlated. The way p-curves vary according to features of underlying data poses problems when automated text mining is used to detect p-values in heterogeneous sets of published papers. Conclusions. The absence of a bump in the p-curve is not indicative of lack of p-hacking. Furthermore, while studies with evidential value will usually generate a right-skewed p-curve, we cannot treat a right-skewed p-curve as an indicator of the extent of evidential value, unless we have a model specific to the type of p-values entered into the analysis. We conclude that it is not feasible to use the p-curve to estimate the extent of p-hacking and evidential value unless there is considerable control over the type of data entered into the analysis. In particular, p-hacking with ghost variables is likely to be missed.

Keywords: Correlation; Ghost variables; Power; Reproducibility; Simulation; Text-mining; p-curve; p-hacking.

PubMed Disclaimer

Conflict of interest statement

Dorothy V. Bishop is an Academic Advisor and an Academic Editor for PeerJ.

Figures

Figure 1
Figure 1. P-curve: expected distribution of p-values when no effect (null) vs true effect size of 0.3 with low (N = 20 per group) or high power (N = 200 per group).
Figure 2
Figure 2. P-curve for ghost p-hacked data when true effect size is zero (A and C) versus when true effect is 0.3 (B and D).
Continuous line for low power (N = 20 per group) and dashed line for high power (N = 200 per group). Different levels of correlation between variables are colour coded.
Figure 3
Figure 3. Illustration of how right skew showing evidential value can be masked if there is a high proportion of p-hacked studies and low statistical power.
Colours show N, and continuous line is non-hacked, dotted line is p-hacked.
Figure 4
Figure 4. Power curve for detecting difference between near and far p-value bins in case with null effect, 100% ghost p-hacking, and eight variables with intercorrelation of 0.8.
N.B. the saw-tooth pattern is typical for this kind of power curve (Chernick & Liu, 2002).

References

    1. Academy of Medical Sciences. BBSRC. MRC. Wellcome Trust . Reproducibility and reliability of biomedical research: improving research practice. London: Academy of Medical Sciences; 2015. Available at http://www.acmedsci.ac.uk/policy/policy-projects/reproducibility-and-rel...
    1. Altman DG. Statistics in medical journals: developments in the 1980s. Statistics in Medicine. 1991;10:1897–1913. doi: 10.1002/sim.4780101206. - DOI - PubMed
    1. Begg CB, Berlin JA. Publication bias: a problem in interpreting medical data. Journal of the Royal Statistical Society: Series A. 1988;151(3):419–463. doi: 10.2307/2982993. - DOI
    1. Bishop DV, Thompson PA. Problems in using text-mining and p-curve analysis to detect rate of p-hacking. PeerJ PrePrints. 2015;3:e1643. doi: 10.7287/peerj.preprints.1266v2. - DOI - PMC - PubMed
    1. Chernick MR, Liu CY. The saw-toothed behavior of power versus sample size and software solutions. The American Statistician. 2002;56:149–155. doi: 10.1198/000313002317572835. - DOI

LinkOut - more resources