Combining exchangeable P-values
- PMID: 40085658
- PMCID: PMC11929381
- DOI: 10.1073/pnas.2410849122
Combining exchangeable P-values
Erratum in
-
Correction for Gasparin et al., Combining exchangeable P-values.Proc Natl Acad Sci U S A. 2025 Apr 22;122(16):e2507343122. doi: 10.1073/pnas.2507343122. Epub 2025 Apr 18. Proc Natl Acad Sci U S A. 2025. PMID: 40249792 Free PMC article. No abstract available.
Abstract
The problem of combining P-values is an old and fundamental one, and the classic assumption of independence is often violated or unverifiable in many applications. There are many well-known rules that can combine a set of arbitrarily dependent P-values (for the same hypothesis) into a single P-value. We show that essentially all these existing rules can be strictly improved when the P-values are exchangeable, or when external randomization is allowed (or both). For example, we derive randomized and/or exchangeable improvements of well-known rules like "twice the median" and "twice the average," as well as geometric and harmonic means. Exchangeable P-values are often produced one at a time (for example, under repeated tests involving data splitting), and our rules can combine them sequentially as they are produced, stopping when the combined P-values stabilize. Our work also improves rules for combining arbitrarily dependent P-values, since the latter becomes exchangeable if they are presented to the analyst in a random order. The main technical advance is to show that all existing combination rules can be obtained by calibrating the P-values to e-values (using an [Formula: see text]-dependent calibrator), averaging those e-values, converting to a level-[Formula: see text] test using Markov's inequality, and finally obtaining P-values by combining this family of tests; the improvements are delivered via recent randomized and exchangeable variants of Markov's inequality.
Keywords: dependent P-values; e-values; global null testing; multiple testing; randomization.
Conflict of interest statement
Competing interests statement:The authors declare no competing interest.
Figures
References
-
- Fisher R. A., Statistical Methods for Research Workers (Oliver and Boyd, 1934), vol. 5.
-
- Pearson K., On a new method of determining “goodness of fit”. Biometrika 26, 425–442 (1934).
-
- Simes R. J., An improved Bonferroni procedure for multiple tests of significance. Biometrika 73, 751–754 (1986).
-
- Sarkar S. K., Some probability inequalities for ordered MTP2 random variables: A proof of the simes conjecture. Ann. Stat. 26, 494–504 (1998).
-
- Benjamini Y., Yekutieli D., The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 29, 1165–1188 (2001).
Grants and funding
- RGPIN-2024-03728/Canadian Government | Natural Sciences and Engineering Research Council of Canada (NSERC)
- CRC-2022-00141/Canadian Government | Natural Sciences and Engineering Research Council of Canada (NSERC)
- FG-2024-22012/Alfred P. Sloan Foundation (APSF)
- DMS-2310718/NSF | MPS | Division of Mathematical Sciences (DMS)
LinkOut - more resources
Full Text Sources
