Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Dec 2;105(48):18718-23.
doi: 10.1073/pnas.0808709105. Epub 2008 Nov 24.

A general framework for multiple testing dependence

Affiliations

A general framework for multiple testing dependence

Jeffrey T Leek et al. Proc Natl Acad Sci U S A. .

Abstract

We develop a general framework for performing large-scale significance testing in the presence of arbitrarily strong dependence. We derive a low-dimensional set of random vectors, called a dependence kernel, that fully captures the dependence structure in an observed high-dimensional dataset. This result shows a surprising reversal of the "curse of dimensionality" in the high-dimensional hypothesis testing setting. We show theoretically that conditioning on a dependence kernel is sufficient to render statistical tests independent regardless of the level of dependence in the observed data. This framework for multiple testing dependence has implications in a variety of common multiple testing problems, such as in gene expression studies, brain imaging, and spatial epidemiology.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
A schematic of the general steps of multiple hypothesis testing. We directly account for multiple testing dependence in the model-fitting step, where all the downstream steps in the analysis are not affected by dependence and have the same operating characteristics as independent tests. Our approach differs from current methods, which address dependence indirectly by modifying the test statistics, adaptively modifying the null distribution, or altering significance cutoffs. For these downstream methods the multiple testing dependence is not directly modeled from the data, so distortions of the signal of interest and the null distribution may be present regardless of which correction is implemented.
Fig. 2.
Fig. 2.
Simulated examples of multiple testing dependence. A and B consist of spatial dependence examples as simplified versions of that encountered in brain imaging, and C and D consist of latent structure examples as encountered in gene expression studies. In all examples, the data and the null P values are plotted both before and after subtracting the dependence kernel. The data are plotted in the form of a heat map (red, high numerical value; white, middle; blue, low). The signal is clearer and the true null tests' P values are unbiased after the dependence kernel is subtracted. (A and B) Each point in the heat map represents the data for one spatial variable. The two true signals are in the diamond and circle shapes, and there is autoregressive spatial dependence between the pixels. (A) An example where the spatial dependence confounds the true signal, and the null P values are anticonservatively biased. (B) An example where the spatial dependence is nearly orthogonal to the true signal, and the null P values are conservatively biased. (C and D) Each row of the heat map corresponds to a gene's expression values, where the first 400 rows are genes simulated to be truly associated with the dichotomous primary variable. Dependence across tests is induced by common unmodeled variables that also influence expression, as described in the text. (C) An example where dependence due to latent structure confounds the true signal, and the null P values are anticonservatively biased. (D) An example where dependence due to latent structure is nearly orthogonal to the true signal, and the null P values are conservatively biased.

References

    1. Storey JD, Tibshirani R. Statistical significance for genome-wide studies. Proc Natl Acad Sci USA. 2003;100:9440–9445. - PMC - PubMed
    1. Dudoit S, Shaffer JP, Boldrick JC. Multiple hypothesis testing in microarray experiments. Stat Sci. 2003;18(1):71–103.
    1. Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci, USA. 2001;98:5116–5121. - PMC - PubMed
    1. Miller CJ, et al. Controlling the false-discovery rate in astrophysical data analysis. Astron J. 2001;122:3492–3505.
    1. Starck JL, Pires S, Refregier A. Weak lensing mass reconstruction using wavelets. Astron Astrophys. 2006;451:1139–1150.

Publication types