Estimation of false discovery proportion under general dependence
- PMID: 17046978
- DOI: 10.1093/bioinformatics/btl527
Estimation of false discovery proportion under general dependence
Abstract
Motivation: Wide-scale correlations between genes are commonly observed in gene expression data, due to both biological and technical reasons. These correlations increase the variability of the standard estimate of the false discovery rate (FDR). We highlight the false discovery proportion (FDP, instead of the FDR) as the suitable quantity for assessing differential expression in microarray data, demonstrate the deleterious effects of correlation on FDP estimation and propose an improved estimation method that accounts for the correlations.
Methods: We analyse the variation pattern of the distribution of test statistics under permutation using the singular value decomposition. The results suggest a latent FDR model that accounts for the effects of correlation, and is statistically closer to the FDP. We develop a procedure for estimating the latent FDR (ELF) based on a Poisson regression model.
Results: For simulated data based on the correlation structure of real datasets, we find that ELF performs substantially better than the standard FDR approach in estimating the FDP. We illustrate the use of ELF in the analysis of breast cancer and lymphoma data.
Availability: R code to perform ELF is available in http://www.meb.ki.se/~yudpaw.
Similar articles
-
Multidimensional local false discovery rate for microarray studies.Bioinformatics. 2006 Mar 1;22(5):556-65. doi: 10.1093/bioinformatics/btk013. Epub 2005 Dec 20. Bioinformatics. 2006. PMID: 16368770
-
Bias in the estimation of false discovery rate in microarray studies.Bioinformatics. 2005 Oct 15;21(20):3865-72. doi: 10.1093/bioinformatics/bti626. Epub 2005 Aug 16. Bioinformatics. 2005. PMID: 16105901
-
Unequal group variances in microarray data analyses.Bioinformatics. 2008 May 1;24(9):1168-74. doi: 10.1093/bioinformatics/btn100. Epub 2008 Mar 14. Bioinformatics. 2008. PMID: 18344518
-
Classification based upon gene expression data: bias and precision of error rates.Bioinformatics. 2007 Jun 1;23(11):1363-70. doi: 10.1093/bioinformatics/btm117. Epub 2007 Mar 28. Bioinformatics. 2007. PMID: 17392326 Review.
-
Incorporating the empirical null hypothesis into the Benjamini-Hochberg procedure.Stat Appl Genet Mol Biol. 2012 Jul 26;11(4):/j/sagmb.2012.11.issue-4/1544-6115.1735/1544-6115.1735.xml. doi: 10.1515/1544-6115.1735. Stat Appl Genet Mol Biol. 2012. PMID: 22850065 Review.
Cited by
-
Empirical null distribution based modeling of multi-class differential gene expression detection.J Appl Stat. 2013 Feb 1;40(2):347-357. doi: 10.1080/02664763.2012.743976. Epub 2012 Nov 21. J Appl Stat. 2013. PMID: 23538964 Free PMC article.
-
An efficient method to identify differentially expressed genes in microarray experiments.Bioinformatics. 2008 Jul 15;24(14):1583-9. doi: 10.1093/bioinformatics/btn215. Epub 2008 May 3. Bioinformatics. 2008. PMID: 18453554 Free PMC article.
-
Sources of variation in false discovery rate estimation include sample size, correlation, and inherent differences between groups.BMC Bioinformatics. 2012;13 Suppl 13(Suppl 13):S1. doi: 10.1186/1471-2105-13-S13-S1. Epub 2012 Aug 24. BMC Bioinformatics. 2012. PMID: 23320794 Free PMC article.
-
Comments on the analysis of unbalanced microarray data.Bioinformatics. 2009 Aug 15;25(16):2035-41. doi: 10.1093/bioinformatics/btp363. Epub 2009 Jun 15. Bioinformatics. 2009. PMID: 19528084 Free PMC article.
-
Identification of significant features in DNA microarray data.Wiley Interdiscip Rev Comput Stat. 2013 Jul;5(4):10.1002/wics.1260. doi: 10.1002/wics.1260. Wiley Interdiscip Rev Comput Stat. 2013. PMID: 24244802 Free PMC article.
MeSH terms
LinkOut - more resources
Full Text Sources