A general framework for multiple testing dependence

Jeffrey T Leek¹, John D Storey

Affiliations

PMID: 19033188
PMCID: PMC2586646
DOI: 10.1073/pnas.0808709105

A general framework for multiple testing dependence

Jeffrey T Leek et al. Proc Natl Acad Sci U S A. 2008.

. 2008 Dec 2;105(48):18718-23.

doi: 10.1073/pnas.0808709105. Epub 2008 Nov 24.

Authors

Jeffrey T Leek¹, John D Storey

Affiliation

¹ Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA.

PMID: 19033188
PMCID: PMC2586646
DOI: 10.1073/pnas.0808709105

Abstract

We develop a general framework for performing large-scale significance testing in the presence of arbitrarily strong dependence. We derive a low-dimensional set of random vectors, called a dependence kernel, that fully captures the dependence structure in an observed high-dimensional dataset. This result shows a surprising reversal of the "curse of dimensionality" in the high-dimensional hypothesis testing setting. We show theoretically that conditioning on a dependence kernel is sufficient to render statistical tests independent regardless of the level of dependence in the observed data. This framework for multiple testing dependence has implications in a variety of common multiple testing problems, such as in gene expression studies, brain imaging, and spatial epidemiology.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Fig. 1.**
A schematic of the general steps of multiple hypothesis testing. We directly account for multiple testing dependence in the model-fitting step, where all the downstream steps in the analysis are not affected by dependence and have the same operating characteristics as independent tests. Our approach differs from current methods, which address dependence indirectly by modifying the test statistics, adaptively modifying the null distribution, or altering significance cutoffs. For these downstream methods the multiple testing dependence is not directly modeled from the data, so distortions of the signal of interest and the null distribution may be present regardless of which correction is implemented.

**Fig. 2.**
Simulated examples of multiple testing dependence. A and B consist of spatial dependence examples as simplified versions of that encountered in brain imaging, and C and D consist of latent structure examples as encountered in gene expression studies. In all examples, the data and the null P values are plotted both before and after subtracting the dependence kernel. The data are plotted in the form of a heat map (red, high numerical value; white, middle; blue, low). The signal is clearer and the true null tests' P values are unbiased after the dependence kernel is subtracted. (A and B) Each point in the heat map represents the data for one spatial variable. The two true signals are in the diamond and circle shapes, and there is autoregressive spatial dependence between the pixels. (A) An example where the spatial dependence confounds the true signal, and the null P values are anticonservatively biased. (B) An example where the spatial dependence is nearly orthogonal to the true signal, and the null P values are conservatively biased. (C and D) Each row of the heat map corresponds to a gene's expression values, where the first 400 rows are genes simulated to be truly associated with the dichotomous primary variable. Dependence across tests is induced by common unmodeled variables that also influence expression, as described in the text. (C) An example where dependence due to latent structure confounds the true signal, and the null P values are anticonservatively biased. (D) An example where dependence due to latent structure is nearly orthogonal to the true signal, and the null P values are conservatively biased.

See this image and copyright information in PMC

References

1. Storey JD, Tibshirani R. Statistical significance for genome-wide studies. Proc Natl Acad Sci USA. 2003;100:9440–9445. - PMC - PubMed
1. Dudoit S, Shaffer JP, Boldrick JC. Multiple hypothesis testing in microarray experiments. Stat Sci. 2003;18(1):71–103.
1. Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci, USA. 2001;98:5116–5121. - PMC - PubMed
1. Miller CJ, et al. Controlling the false-discovery rate in astrophysical data analysis. Astron J. 2001;122:3492–3505.
1. Starck JL, Pires S, Refregier A. Weak lensing mass reconstruction using wavelets. Astron Astrophys. 2006;451:1139–1150.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

R01 HG002913/HG/NHGRI NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A general framework for multiple testing dependence

Affiliation

A general framework for multiple testing dependence

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources