Review

. 2025 Jan 31;41(4):422-430.

doi: 10.17843/rpmesp.2024.414.14285..

The null hypothesis significance test and the dichotomization of the p-value: Errare Humanum Est

[Article in Spanish, English]

Edward Mezones-Holguín¹, Ali Al-Kassab-Córdova¹, Percy Soto-Becerra², Sonia Hernández-Díaz³, Jay S Kaufman⁴

Affiliations

¹ Centro de Excelencia en Investigaciones Económicas y Sociales en Salud, Universidad San Ignacio de Loyola, Lima, Perú.
² Vicerrectorado de Investigación, Universidad Continental, Huancayo, Perú.
³ Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, EE.UU.
⁴ Department of Epidemiology, Biostatistics, & Occupational Health, McGill University, Montreal, Canada.

PMID: 39936767
PMCID: PMC11797584
DOI: 10.17843/rpmesp.2024.414.14285.

Review

The null hypothesis significance test and the dichotomization of the p-value: Errare Humanum Est

[Article in Spanish, English]

Edward Mezones-Holguín et al. Rev Peru Med Exp Salud Publica. 2025.

. 2025 Jan 31;41(4):422-430.

doi: 10.17843/rpmesp.2024.414.14285..

Authors

Edward Mezones-Holguín¹, Ali Al-Kassab-Córdova¹, Percy Soto-Becerra², Sonia Hernández-Díaz³, Jay S Kaufman⁴

Affiliations

¹ Centro de Excelencia en Investigaciones Económicas y Sociales en Salud, Universidad San Ignacio de Loyola, Lima, Perú.
² Vicerrectorado de Investigación, Universidad Continental, Huancayo, Perú.
³ Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, EE.UU.
⁴ Department of Epidemiology, Biostatistics, & Occupational Health, McGill University, Montreal, Canada.

PMID: 39936767
PMCID: PMC11797584
DOI: 10.17843/rpmesp.2024.414.14285.

Abstract
in English, Spanish, Spanish

Decision-making in healthcare is complex and needs to be based on the best scientific evidence. In this process, information derived from statistical analysis of data is crucial, which can be developed from either frequentist or Bayesian perspectives. When it comes to the frequentist field, the null hypothesis significance test (NHST) and its p-value is one of the most widely used techniques in different disciplines. However, NHST has been subjected to questioning from different academic points of view, which has led to it being considered as one of the causes of the so-called replicability crisis in science. In this review article, we provide a brief historical account of its development, summarize the underlying methods, describe some controversies and limitations, address misuse and misinterpretation, and finally give some scopes and reflections in the context of biomedical research.

La toma de decisiones en salud es compleja y requiere informarse en la mejor evidencia científica. En este proceso, la información generada a partir del análisis estadístico de los datos es crucial, el cual puede desarrollarse desde las perspectivas frecuentista o bayesiana. En la arena frecuentista, la prueba de significancia de la hipótesis nula (PSHN) y el valor p es una de las técnicas de mayor uso en diferentes disciplinas. No obstante, la PSHN desde la academia ha sido sometida a una serie de cuestionamientos desde diversas aristas, lo cual ha conllevado a situarla como una de las causantes de la denominada crisis de replicabilidad en la ciencia. En este artículo de revisión, realizamos un breve recuento histórico sobre su desarrollo, resumimos los métodos subyacentes, describimos algunas controversias y limitaciones, abordamos el mal uso y mala interpretación, para finalmente dar algunos alcances y reflexiones en el contexto de la investigación biomédica.

RESUMEN: La toma de decisiones en salud es compleja y requiere informarse en la mejor evidencia científica. En este proceso, la información generada a partir del análisis estadístico de los datos es crucial, el cual puede desarrollarse desde las perspectivas frecuentista o bayesiana. En la arena frecuentista, la prueba de significancia de la hipótesis nula (PSHN) y el valor p es una de las técnicas de mayor uso en diferentes disciplinas. No obstante, la PSHN desde la academia ha sido sometida a una serie de cuestionamientos desde diversas aristas, lo cual ha conllevado a situarla como una de las causantes de la denominada crisis de replicabilidad en la ciencia. En este artículo de revisión, realizamos un breve recuento histórico sobre su desarrollo, resumimos los métodos subyacentes, describimos algunas controversias y limitaciones, abordamos el mal uso y mala interpretación, para finalmente dar algunos alcances y reflexiones en el contexto de la investigación biomédica.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest.: The authors declare that they have no conflicts of interest.

Figures

**Figure 1. Probability distribution of randomly obtained test sample statistics in null or true alternate hypothesis scenarios.**

**Figure 2. Probability curve under the null hypothesis of all possible values of the one-tailed and two-tailed test statistic.**

**Figure 3. Clinically irrelevant or statistically significant scenarios for example in changes in arterial hypertension.**

**Figura 1. Distribución de probabilidades de los estadísticos muestrales de prueba obtenidos aleatoriamente en escenarios de hipótesis nula o alterna verdadera.**

**Figura 2. Curva de probabilidad bajo la hipótesis nula de todos los valores posibles del estadístico de prueba con una y dos colas.**

**Figura 3. Escenarios clínicamente irrelevantes o estadísticamente significativos a propósito de ejemplo en los cambios en la hipertensión arterial**

See this image and copyright information in PMC

References

1. Fardet A, Lebredonchel L, Rock E. Empirico-inductive and/or hypothetico-deductive methods in food science and nutrition research which one to favor for a better global health? Crit Rev Food Sci Nutr. 2023;63(15):2480–2493. doi: 10.1080/10408398.2021.1976101. - DOI - PubMed
1. Lash TL, VanderWeele TJ, Haneause S, Rothman K. Modern Epidemiology. Wolters Kluwer Health; 2020. pp. 1340–1340.
1. Hubbard R, Haig BD, Parsa RA. The Limited Role of Formal Statistical Inference in Scientific Inference. Am Stat. 2019;73(sup1):91–98. doi: 10.1080/00031305.2018.1464947. - DOI
1. Lin H. To Be a Frequentist or Bayesian? Five Positions in a Spectrum. [4 de agosto de 2024];Harv Data Sci Rev. 2024 6(3) doi: 10.1162/99608f92.9a53b923. Internet. - DOI
1. Chavalarias D, Wallach JD, Li AHT, Ioannidis JPA. Evolution of Reporting P Values in the Biomedical Literature, 1990-2015. JAMA. 2016;315(11):1141–1148. doi: 10.1001/jama.2016.1952. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- PubMed Central
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The null hypothesis significance test and the dichotomization of the p-value: Errare Humanum Est

Affiliations

The null hypothesis significance test and the dichotomization of the p-value: Errare Humanum Est

Authors

Affiliations

Abstract
in English, Spanish, Spanish

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Research Materials

Abstract in English, Spanish, Spanish

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Research Materials

Abstract
in English, Spanish, Spanish