Comparative Study

. 2008 Feb 25:9:114.

doi: 10.1186/1471-2105-9-114.

Effects of dependence in high-dimensional multiple testing problems

Kyung In Kim¹, Mark A van de Wiel

Affiliations

PMID: 18298808
PMCID: PMC2375137
DOI: 10.1186/1471-2105-9-114

Comparative Study

Effects of dependence in high-dimensional multiple testing problems

Kyung In Kim et al. BMC Bioinformatics. 2008.

. 2008 Feb 25:9:114.

doi: 10.1186/1471-2105-9-114.

Authors

Kyung In Kim¹, Mark A van de Wiel

Affiliation

¹ Department of Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, The Netherlands. k.i.kim@tue.nl

PMID: 18298808
PMCID: PMC2375137
DOI: 10.1186/1471-2105-9-114

Abstract

Background: We consider effects of dependence among variables of high-dimensional data in multiple hypothesis testing problems, in particular the False Discovery Rate (FDR) control procedures. Recent simulation studies consider only simple correlation structures among variables, which is hardly inspired by real data features. Our aim is to systematically study effects of several network features like sparsity and correlation strength by imposing dependence structures among variables using random correlation matrices.

Results: We study the robustness against dependence of several FDR procedures that are popular in microarray studies, such as Benjamin-Hochberg FDR, Storey's q-value, SAM and resampling based FDR procedures. False Non-discovery Rates and estimates of the number of null hypotheses are computed from those methods and compared. Our simulation study shows that methods such as SAM and the q-value do not adequately control the FDR to the level claimed under dependence conditions. On the other hand, the adaptive Benjamini-Hochberg procedure seems to be most robust while remaining conservative. Finally, the estimates of the number of true null hypotheses under various dependence conditions are variable.

Conclusion: We discuss a new method for efficient guided simulation of dependent data, which satisfy imposed network constraints as conditional independence structures. Our simulation set-up allows for a structural study of the effect of dependencies on multiple testing criterions and is useful for testing a potentially new method on pi0 or FDR estimation in a dependency context.

PubMed Disclaimer

Figures

**Figure 1**
**Average FDR results under dependence when π₀= 0.8**. The x-axis corresponds to the proportion of edges in the network and the y-axis corresponds to FDR estimates for various procedures. Testing cut-off c is tuned such that true FDR is 0.1 under independence. FDR(c) (solid black) represents true FDR values in terms of (5) using the fixed c. The FDR procedures and corresponding lines in this figure are the following ones: BH (dashed red), BY (dotted green), SAM (dot-dashed blue), Qvalue (dashed cyan), ABH (purple), the upper limit RBH (dashed black), the point RBH (dotted red).

**Figure 2**
**Average FDR results under dependence when π₀= 0.95**. See Figure 1 for explanation.

**Figure 3**
**Average FNR results under dependence when π₀= 0.8**. The y-axis corresponds to FNR estimates for various procedures. For the other explanation, see Figure 1.

**Figure 4**
**Average π₀estimates under dependence when π₀= 0.8**. The x-axis corresponds to the proportion of edges in the network and the y-axis corresponds to π₀estimates for various procedures. The π₀estimators and corresponding lines are SAM (solid black), Qvalue (dashed red), ABH (dotted green) and the convex estimator of Langaas et al [10] (dot-dashed).

**Figure 5**
**Variances of correlations and FDR(c) when π₀= 0.8**. The solid line represents variance of correlations and the dashed line represents FDR(c). For comparison, we transform *var*(ρ_ij) to *var*(ρ_ij)/10 + 0.1 so that two quantities have same scale.

**Figure 6**
**FDR(c) with different M values**. For various M - m values, FDR(c) is computed. The M - m values and corresponding lines are 1001 (solid black), 1010 (dashed red), 1025 (dotted green), 1046 (dot-dashed blue) and 1073 (dashed cyan).

**Figure 7**
**Graphical representation of conditional independence structures when m = 4**. A sequence of possible nested structure is depicted when the number of nodes is 4. The left most graph represents complete independence between variables and the right most graph represents complete dependence between variables. The dependence structure of every left graph is contained to the structure of the graph right to it.

See this image and copyright information in PMC

References

1. Wille A, Zimmermann P, Vranova E, Furholz A, Laule O, Bleuler S, Hennig L, Prelic A, von Rohr P, Thiele L, Zitzler E, Gruissem W, Buhlmann P. Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana. Genome Biol. 2004;5:R92. doi: 10.1186/gb-2004-5-11-r92. - DOI - PMC - PubMed
1. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Statist Soc Ser B. 1995;57:289–300.
1. Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Ann Statist. 2001;29:1165–1188. doi: 10.1214/aos/1013699998. - DOI
1. Storey JD. The positive false discovery rate: a Bayesian interpretation and the q -value. Ann Statist. 2003;31:2013–2035. doi: 10.1214/aos/1074290335. - DOI
1. Storey J, Tibshirani R. Tech Rep 2001–12. Stanford University; 2001. Estimating false discovery rates under dependence, with applications to DNA microarrays.http://www-stat.stanford.edu/reports/papers2001.html

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Effects of dependence in high-dimensional multiple testing problems

Affiliation

Effects of dependence in high-dimensional multiple testing problems

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources