Cauchy combination test: a powerful test with analytic p-value calculation under arbitrary dependency structures
- PMID: 33012899
- PMCID: PMC7531765
- DOI: 10.1080/01621459.2018.1554485
Cauchy combination test: a powerful test with analytic p-value calculation under arbitrary dependency structures
Abstract
Combining individual p-values to aggregate multiple small effects has a long-standing interest in statistics, dating back to the classic Fisher's combination test. In modern large-scale data analysis, correlation and sparsity are common features and efficient computation is a necessary requirement for dealing with massive data. To overcome these challenges, we propose a new test that takes advantage of the Cauchy distribution. Our test statistic has a simple form and is defined as a weighted sum of Cauchy transformation of individual p-values. We prove a non-asymptotic result that the tail of the null distribution of our proposed test statistic can be well approximated by a Cauchy distribution under arbitrary dependency structures. Based on this theoretical result, the p-value calculation of our proposed test is not only accurate, but also as simple as the classic z-test or t-test, making our test well suited for analyzing massive data. We further show that the power of the proposed test is asymptotically optimal in a strong sparsity setting. Extensive simulations demonstrate that the proposed test has both strong power against sparse alternatives and a good accuracy with respect to p-value calculations, especially for very small p-values. The proposed test has also been applied to a genome-wide association study of Crohn's disease and compared with several existing tests.
Keywords: Cauchy distribution; Correlation matrix; Global hypothesis testing; High dimensional data; Non-asymptotic approximation; Sparse alternative.
Figures
References
-
- Arias-Castro E, Candès EJ, and Plan Y (2011). Global testing under sparse alternatives: Anova, multiple comparisons and the higher criticism. The Annals of Statistics, 2533–2556.
-
- Berk RH and Jones DH (1979). Goodness-of-fit test statistics that dominate the kolmogorov statistics. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 47 (1), 47–59.
-
- Cai T, Liu W, and Xia Y (2014). Two-sample test of high dimensional means under dependence. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76 (2), 349–372.
-
- Chernozhukov V, Chetverikov D, and Kato K (2013). Supplemental material to “gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors”. The Annals of Statistics 41 (6), 2786–2819.
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources