Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Aug 15;25(16):2035-41.
doi: 10.1093/bioinformatics/btp363. Epub 2009 Jun 15.

Comments on the analysis of unbalanced microarray data

Affiliations

Comments on the analysis of unbalanced microarray data

Kathleen F Kerr. Bioinformatics. .

Abstract

Motivation: Permutation testing is very popular for analyzing microarray data to identify differentially expressed (DE) genes; estimating false discovery rates (FDRs) is a very popular way to address the inherent multiple testing problem. However, combining these approaches may be problematic when sample sizes are unequal.

Results: With unbalanced data, permutation tests may not be suitable because they do not test the hypothesis of interest. In addition, permutation tests can be biased. Using biased P-values to estimate the FDR can produce unacceptable bias in those estimates. Results also show that the approach of pooling permutation null distributions across genes can produce invalid P-values, since even non-DE genes can have different permutation null distributions. We encourage researchers to use statistics that have been shown to reliably discriminate DE genes, but caution that associated P-values may be either invalid, or a less-effective metric for discriminating DE genes.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
QValue estimates of π1 for the Δμ-statistic. A reference line is drawn at π1μ, the proportion of genes DE in the mean. However, π1μ1 only for the EV simulations. For the UV simulations, π1μ1. CDE, an alternative MMM, gives similar results (Supplementary Fig. 3). See also Supplementary Figures 4 and 5 for the s-statistic.
Fig. 2.
Fig. 2.
ROC curves for the Δμ-statistic (dashed line), s-statistic (solid line) and t-statistic (dotted line) in a type EV simulation with 10 000 genes. The ROC curves for the corresponding permutation test P-values are the same as for the t-statistic and are not shown.
Fig. 3.
Fig. 3.
Histograms of ‘per gene’ permutation null distributions for nine genes in UV simulations. All nine genes have the same mean in both populations, samples sizes are 60 and 45, and histograms are based on 2000 random permutations. There is clearly gene-to-gene variation in the permutation null distributions of these test statistics.
Fig. 4.
Fig. 4.
A comparison of the P-values from a ‘per gene’ permutation test and a ‘pooled null’ permutation test. The inset magnifies the plot for small P-values. These results are for the real CEU/CHB data (not simulated data) with 10 000 randomly chosen genes to facilitate presentation. See Supplementary Figure 7 for the corresponding plots for other test statistics.
Fig. 5.
Fig. 5.
MLE estimates of π1μ using locfdr. A reference line is drawn at π1μ, the proportion of genes DE in the mean. The s-statistics were scaled by 1/0.55 and then treated as a t-distribution with 8 , 9 or 11 degrees of freedom for π1μ=0.01, 0.05, and 0.2, respectively.

Similar articles

Cited by

References

    1. Allison DB, et al. A mixture model approach for the analysis of microarray gene expression data. Comput. Stat. Data Anal. 2002;39:1–20.
    1. Allison DB, et al. Microarray data analysis: from disarray to consolidation and consensus. Nat. Rev. Genet. 2006;7:55–65. - PubMed
    1. Benjamini Y, Hochberg Y. Controlling the false discovery rate—a practical and powerful approach to multiple testing. J. R. Stat. Soc. B Methodol. 1995;57:289–300.
    1. Calian V, et al. Partitioning to uncover conditions for permutation tests to control multiple testing error rates. Biom. J. 2008;50:756–766. - PubMed
    1. Cheng C, et al. Statistical significance threshold criteria for analysis of microarray gene expression data. Stat. Appl. Genet. Mol. Biol. 2004;3:36. - PubMed

Publication types