Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2025 Dec;32(6):2633-2647.
doi: 10.3758/s13423-025-02740-3. Epub 2025 Jul 16.

Can I trust this paper?

Affiliations
Review

Can I trust this paper?

Andrey Anikin. Psychon Bull Rev. 2025 Dec.

Abstract

After a decade of data falsification scandals and replication failures in psychology and related empirical disciplines, there are urgent calls for open science and structural reform in the publishing industry. In the meantime, however, researchers need to learn how to recognize tell-tale signs of methodological and conceptual shortcomings that make a published claim suspect. I review four key problems and propose simple ways to detect them. First, the study may be fake; if in doubt, inspect the authors' and journal's profiles and request to see the raw data to check for inconsistencies. Second, there may be too little data; low precision of effect sizes is a clear warning sign of this. Third, the data may not be analyzed correctly; excessive flexibility in data analysis can be deduced from signs of data dredging and convoluted post hoc theorizing in the text, while violations of model assumptions can be detected by examining plots of observed data and model predictions. Fourth, the conclusions may not be justified by the data; common issues are inappropriate acceptance of the null hypothesis, biased meta-analyses, over-generalization over unmodeled variance, hidden confounds, and unspecific theoretical predictions. The main takeaways are to verify that the methodology is robust and to distinguish between what the actual results are and what the authors claim these results mean when citing empirical work. Critical evaluation of published evidence is an essential skill to develop as it can prevent researchers from pursuing unproductive avenues and ensure better trustworthiness of science as a whole.

Keywords: Power; Replication; Research integrity; Statistics.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval: Not applicable. Consent to participate: Not applicable. Consent for publication: Not applicable. Competing interests: No competing interests.

Figures

Fig. 1
Fig. 1
Model fit can be checked by plotting the observations (black points) together with fitted values (blue lines) and model predictions (blue points). Linear regression with a fixed standard deviation in this case underfits the data, whereas a Generalized Additive Model with spline smoothing captures the underlying sinusoidal trend perfectly. If only 10 datapoints are sampled from the same generative process, LOESS smoothing with a fixed span of 0.5 overfits the data. Inference from severely underfit or overfit models is highly suspect. Note that Pearson’s correlation assumes a linear relationship between variables, so it would be meaningless to report Pearson’s r between these two variables
Fig. 2
Fig. 2
A simulation demonstrating some perils of inappropriate hypothesis testing in combination with insufficient data. It is tempting to interpret the non-significant p-value or the 95% credible interval (CI) that includes zero as evidence of no effect in group 2, but these are not valid ways to prove the null hypothesis (Bayes factors or equivalence testing could be used instead). Note also that the power is just 18% and the CIs are very wide, suggesting that there is simply too little data to estimate the effects in both groups with any certainty. If we specifically want to compare the effect in groups 1 and 2, we need to test for a treatment x group interaction or obtain the posterior distribution of the difference in the effect between groups, as shown above. NHST = null-hypothesis significance testing. Code: https://osf.io/n7r2y/
Fig. 3
Fig. 3
Meta-analyses can reach overly optimistic conclusions if studies with significant and large effects are more likely to be published. Salami slicing a single dataset into multiple publications and including fraudulent data can further skew the results. Source of images: https://clipart-library.com

References

    1. Abalkina, A., & Bishop, D. (2023). Paper mills: A novel form of publishing malpractice affecting psychology. Meta-Psychology, 7. 10.15626/MP.2022.3422
    1. Abdullah, H. O., Abdalla, B. A., Kakamad, F. H., Ahmed, J. O., Baba, H. O., Hassan, M. N., Bapir, R., Rahim, H. M., Omar, D. A., Kakamad, S. H., et al. (2024). Predatory publishing lists: A review on the ongoing battle against fraudulent actions. Barw Medical Journal,2(2), 26–30.
    1. Adler, S. J., Röseler, L., & Schöniger, M. K. (2023). A toolbox to evaluate the trustworthiness of published findings. Journal of Business Research,167, 114189. - DOI
    1. Al-Marzouki, S., Evans, S., Marshall, T., & Roberts, I. (2005). Are these data real? Statistical methods for the detection of data fabrication in clinical trials. Bmj,331(7511), 267–270. - DOI - PMC - PubMed
    1. Antonakis, J. (2017). On doing better science: From thrill of discovery to policy implications. The Leadership Quarterly,28(1), 5–21. - DOI

LinkOut - more resources