Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020;115(529):393-402.
doi: 10.1080/01621459.2018.1554485. Epub 2019 Apr 25.

Cauchy combination test: a powerful test with analytic p-value calculation under arbitrary dependency structures

Affiliations

Cauchy combination test: a powerful test with analytic p-value calculation under arbitrary dependency structures

Yaowu Liu et al. J Am Stat Assoc. 2020.

Abstract

Combining individual p-values to aggregate multiple small effects has a long-standing interest in statistics, dating back to the classic Fisher's combination test. In modern large-scale data analysis, correlation and sparsity are common features and efficient computation is a necessary requirement for dealing with massive data. To overcome these challenges, we propose a new test that takes advantage of the Cauchy distribution. Our test statistic has a simple form and is defined as a weighted sum of Cauchy transformation of individual p-values. We prove a non-asymptotic result that the tail of the null distribution of our proposed test statistic can be well approximated by a Cauchy distribution under arbitrary dependency structures. Based on this theoretical result, the p-value calculation of our proposed test is not only accurate, but also as simple as the classic z-test or t-test, making our test well suited for analyzing massive data. We further show that the power of the proposed test is asymptotically optimal in a strong sparsity setting. Extensive simulations demonstrate that the proposed test has both strong power against sparse alternatives and a good accuracy with respect to p-value calculations, especially for very small p-values. The proposed test has also been applied to a genome-wide association study of Crohn's disease and compared with several existing tests.

Keywords: Cauchy distribution; Correlation matrix; Global hypothesis testing; High dimensional data; Non-asymptotic approximation; Sparse alternative.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
The ratio of empirical size to significance level for Model 1–4, summarized by boxplots. The x-axis is the significance level at α = 10−1, 10−2, 10−3, 10−4, 10−5.
Figure 2:
Figure 2:
The ratio of empirical size to significance level (dashed lines) for Model 5. The straight line in each plot is the reference line. The plot on the right is a zoom-in image of the plot on the left. Note that the non-smoothness and fluctuation of the dashed curve in the right plot is due to the Monte Carlo errors.
Figure 3:
Figure 3:
Power comparison of CCT, MinP, HC and BJ. The x-axis is the correlation strength ρ. The columns from left to right correspond to the dimension d = 20, 40, 60. The rows from top to bottom correspond to the signal percentage 5%, 10% and 20%. The signal strength is chosen to make the power in every setting comparable.

References

    1. Arias-Castro E, Candès EJ, and Plan Y (2011). Global testing under sparse alternatives: Anova, multiple comparisons and the higher criticism. The Annals of Statistics, 2533–2556.
    1. Barnett I, Mukherjee R, and Lin X (2017). The generalized higher criticism for testing snp-set effects in genetic association studies. Journal of the American Statistical Association 112 (517), 64–76. - PMC - PubMed
    1. Berk RH and Jones DH (1979). Goodness-of-fit test statistics that dominate the kolmogorov statistics. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 47 (1), 47–59.
    1. Cai T, Liu W, and Xia Y (2014). Two-sample test of high dimensional means under dependence. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76 (2), 349–372.
    1. Chernozhukov V, Chetverikov D, and Kato K (2013). Supplemental material to “gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors”. The Annals of Statistics 41 (6), 2786–2819.

LinkOut - more resources