. 2006 Feb 22:7:84.

doi: 10.1186/1471-2105-7-84.

The PowerAtlas: a power and sample size atlas for microarray experimental design and research

Grier P Page¹, Jode W Edwards, Gary L Gadbury, Prashanth Yelisetti, Jelai Wang, Prinal Trivedi, David B Allison

Affiliations

PMID: 16504070
PMCID: PMC1395338
DOI: 10.1186/1471-2105-7-84

The PowerAtlas: a power and sample size atlas for microarray experimental design and research

Grier P Page et al. BMC Bioinformatics. 2006.

. 2006 Feb 22:7:84.

doi: 10.1186/1471-2105-7-84.

Authors

Grier P Page¹, Jode W Edwards, Gary L Gadbury, Prashanth Yelisetti, Jelai Wang, Prinal Trivedi, David B Allison

Affiliation

¹ Department of Biostatistics, University of Alabama, Birmingham, AL, USA. gpage@uab.edu

PMID: 16504070
PMCID: PMC1395338
DOI: 10.1186/1471-2105-7-84

Abstract

Background: Microarrays permit biologists to simultaneously measure the mRNA abundance of thousands of genes. An important issue facing investigators planning microarray experiments is how to estimate the sample size required for good statistical power. What is the projected sample size or number of replicate chips needed to address the multiple hypotheses with acceptable accuracy? Statistical methods exist for calculating power based upon a single hypothesis, using estimates of the variability in data from pilot studies. There is, however, a need for methods to estimate power and/or required sample sizes in situations where multiple hypotheses are being tested, such as in microarray experiments. In addition, investigators frequently do not have pilot data to estimate the sample sizes required for microarray studies.

Results: To address this challenge, we have developed a Microrarray PowerAtlas. The atlas enables estimation of statistical power by allowing investigators to appropriately plan studies by building upon previous studies that have similar experimental characteristics. Currently, there are sample sizes and power estimates based on 632 experiments from Gene Expression Omnibus (GEO). The PowerAtlas also permits investigators to upload their own pilot data and derive power and sample size estimates from these data. This resource will be updated regularly with new datasets from GEO and other databases such as The Nottingham Arabidopsis Stock Center (NASC).

Conclusion: This resource provides a valuable tool for investigators who are planning efficient microarray studies and estimating required sample sizes.

PubMed Disclaimer

Figures

**Figure 1**
Estimated PTP, PTN, and EDR for the GDS486 [17] dataset for a variety of samples sizes at an alpha level of 0.05.

**Figure 2**
For the GDS486 dataset [18] the EDR is presented across a variety of sample sizes and alpha levels.

**Figure 3**
For the GDS486 dataset [19] the PTP is presented across a variety of sample sizes and alpha levels.

**Figure 4**
Estimated PTP, PTN, and EDR for the GDS75 [20] dataset for a variety of samples sizes at an alpha level of 0.05.

**Figure 5**
For the GDS75 dataset [21] the EDR is presented across a variety of sample sizes and alpha levels.

**Figure 6**
For the GDS75 dataset [22] the PTP is presented across a variety of sample sizes and alpha levels.

**Figure 7**
Idealized representation of the distribution of p-values under the null hypothesis (no difference in gene expression between the two groups) for a valid test. The dotted Blue line is the expected distribution of p-values if the treatment has no effect and the solid Green line is the mixed-model [23] fit of the p-value constrained to be monotonically non-decreasing.

**Figure 8**
More realistic representation of the distribution of p-values under the null hypothesis (no difference in gene expression between the two groups) for a valid test. The dotted Blue line is the expected distribution of p-values if the treatment has no effect and the solid Green line is the mixed-model [23] fit of the p-value constrained to be monotonically non-decreasing.

**Figure 9**
Distribution of p-values when there is a difference in the gene expression between the two groups for some of the genes, but not all of the genes. This distribution is monotonically non-increasing from 0 to 1. The dotted Blue line is the expected distribution of p-values if the treatment has no effect and the solid Green line is the mixed-model [23] fit of the p-value constrained to be monotonically non-decreasing.

**Figure 10**
Distribution of p-values for a dataset in GEO that does not follow one of the possible distributions for p-values The dotted Blue line is the expected distribution of p-values if the treatment has no effect and the solid Green line is the mixed-model [23] fit of the p-value constrained to be monotonically non-decreasing.

**Figure 11**
Distribution of p-values for a dataset in GEO that does not follow one of the possible distributions for p-values. The dotted Blue line is the expected distribution of p-values if the treatment has no effect and the solid Green line is the mixed-model [23] fit of the p-value constrained to be monotonically non-decreasing. The dotted Blue line is the expected distribution of p-values if the treatment has no effect and the solid Green line is the mixed-model [23] fit of the p-value constrained to be monotonically non-decreasing.

**Figure 12**
Distribution of p-values for a dataset in GEO that does not follow one of the possible distributions for p-values. The dotted Blue line is the expected distribution of p-values if the treatment has no effect and the solid Green line is the mixed-model [23] fit of the p-value constrained to be monotonically non-decreasing.

**Figure 13**
Distribution of p-values for the comparison of RNA from 3 murine with homozygous PKDH mutations and a high kidney length-to-width ratio and RNA from 3 mice with homozygous PKDH mutations and a low kidney length-to-width ratios. The dotted Blue line is the expected distribution of p-values if the treatment has no effect and the solid Green line is the mixed-model [23] fit of the p-value constrained to be monotonically non-decreasing.

**Figure 14**
Distribution of p-values for the comparison of RNA from 7 mice with PKDH mutations and a high kidney length to width ratio and RNA from 7 mice with PKDH mutations and a low kidney length to width ratio. The dotted blue line is the expected distribution of p-values if the treatment has no effect and the solid green line is the mixed-model [23] fit of the p-value constrained to be monotonically non-decreasing.

See this image and copyright information in PMC

References

1. PowerAtlas T. http://www.poweratlas.org. 2006. http://www.poweratlas.org
1. Pan W, Lin J, Le CT. How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach. Genome Biology. 2002;3:RESEARCH0022. - PMC - PubMed
1. Lee ML, Whitmore GA. Power and sample size for DNA microarray studies. Stat Med. 2002;21:3543–3570. doi: 10.1002/sim.1335. - DOI - PubMed
1. Wang SJ, Chen JJ. Sample size for identifying differentially expressed genes in microarray experiments. J Comput Biol. 2004;11:714–726. doi: 10.1089/cmb.2004.11.714. - DOI - PubMed
1. Gadbury GL, Page GP, Edwards J, Kayo T, Weindruch R, Permana PA, Mountz J, Allison DB. Power Analysis and Sample Size Estimation in the Age of High Dimensional Biology. Stat Meth Med Res. 2004;13:325–338.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The PowerAtlas: a power and sample size atlas for microarray experimental design and research

Affiliation

The PowerAtlas: a power and sample size atlas for microarray experimental design and research

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources