Comments on the analysis of unbalanced microarray data
- PMID: 19528084
- PMCID: PMC2732368
- DOI: 10.1093/bioinformatics/btp363
Comments on the analysis of unbalanced microarray data
Abstract
Motivation: Permutation testing is very popular for analyzing microarray data to identify differentially expressed (DE) genes; estimating false discovery rates (FDRs) is a very popular way to address the inherent multiple testing problem. However, combining these approaches may be problematic when sample sizes are unequal.
Results: With unbalanced data, permutation tests may not be suitable because they do not test the hypothesis of interest. In addition, permutation tests can be biased. Using biased P-values to estimate the FDR can produce unacceptable bias in those estimates. Results also show that the approach of pooling permutation null distributions across genes can produce invalid P-values, since even non-DE genes can have different permutation null distributions. We encourage researchers to use statistics that have been shown to reliably discriminate DE genes, but caution that associated P-values may be either invalid, or a less-effective metric for discriminating DE genes.
Figures





Similar articles
-
Construction of null statistics in permutation-based multiple testing for multi-factorial microarray experiments.Bioinformatics. 2006 Jun 15;22(12):1486-94. doi: 10.1093/bioinformatics/btl109. Epub 2006 Mar 30. Bioinformatics. 2006. PMID: 16574697
-
A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data.Bioinformatics. 2005 Dec 1;21(23):4280-8. doi: 10.1093/bioinformatics/bti685. Epub 2005 Sep 27. Bioinformatics. 2005. PMID: 16188930
-
Estimating p-values in small microarray experiments.Bioinformatics. 2007 Jan 1;23(1):38-43. doi: 10.1093/bioinformatics/btl548. Epub 2006 Oct 30. Bioinformatics. 2007. PMID: 17077100
-
On correcting the overestimation of the permutation-based false discovery rate estimator.Bioinformatics. 2008 Aug 1;24(15):1655-61. doi: 10.1093/bioinformatics/btn310. Epub 2008 Jun 23. Bioinformatics. 2008. PMID: 18573796 Free PMC article.
-
Multidimensional local false discovery rate for microarray studies.Bioinformatics. 2006 Mar 1;22(5):556-65. doi: 10.1093/bioinformatics/btk013. Epub 2005 Dec 20. Bioinformatics. 2006. PMID: 16368770
Cited by
-
ParaSAM: a parallelized version of the significance analysis of microarrays algorithm.Bioinformatics. 2010 Jun 1;26(11):1465-7. doi: 10.1093/bioinformatics/btq161. Epub 2010 Apr 15. Bioinformatics. 2010. PMID: 20400455 Free PMC article.
-
Enhanced T cell lymphoma in NOD.Stat5b transgenic mice is caused by hyperactivation of Stat5b in CD8+ thymocytes.PLoS One. 2013;8(2):e56600. doi: 10.1371/journal.pone.0056600. Epub 2013 Feb 14. PLoS One. 2013. PMID: 23457589 Free PMC article.
-
Investigating the Role of Gene-Gene Interactions in TB Susceptibility.PLoS One. 2015 Apr 28;10(4):e0123970. doi: 10.1371/journal.pone.0123970. eCollection 2014. PLoS One. 2015. PMID: 25919455 Free PMC article.
-
Controlling bias and inflation in epigenome- and transcriptome-wide association studies using the empirical null distribution.Genome Biol. 2017 Jan 27;18(1):19. doi: 10.1186/s13059-016-1131-9. Genome Biol. 2017. PMID: 28129774 Free PMC article.
-
Tandem Mass Spectrum Identification via Cascaded Search.J Proteome Res. 2015 Aug 7;14(8):3027-38. doi: 10.1021/pr501173s. Epub 2015 Jun 30. J Proteome Res. 2015. PMID: 26084232 Free PMC article.
References
-
- Allison DB, et al. A mixture model approach for the analysis of microarray gene expression data. Comput. Stat. Data Anal. 2002;39:1–20.
-
- Allison DB, et al. Microarray data analysis: from disarray to consolidation and consensus. Nat. Rev. Genet. 2006;7:55–65. - PubMed
-
- Benjamini Y, Hochberg Y. Controlling the false discovery rate—a practical and powerful approach to multiple testing. J. R. Stat. Soc. B Methodol. 1995;57:289–300.
-
- Calian V, et al. Partitioning to uncover conditions for permutation tests to control multiple testing error rates. Biom. J. 2008;50:756–766. - PubMed
-
- Cheng C, et al. Statistical significance threshold criteria for analysis of microarray gene expression data. Stat. Appl. Genet. Mol. Biol. 2004;3:36. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources