Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Feb 22:7:84.
doi: 10.1186/1471-2105-7-84.

The PowerAtlas: a power and sample size atlas for microarray experimental design and research

Affiliations

The PowerAtlas: a power and sample size atlas for microarray experimental design and research

Grier P Page et al. BMC Bioinformatics. .

Abstract

Background: Microarrays permit biologists to simultaneously measure the mRNA abundance of thousands of genes. An important issue facing investigators planning microarray experiments is how to estimate the sample size required for good statistical power. What is the projected sample size or number of replicate chips needed to address the multiple hypotheses with acceptable accuracy? Statistical methods exist for calculating power based upon a single hypothesis, using estimates of the variability in data from pilot studies. There is, however, a need for methods to estimate power and/or required sample sizes in situations where multiple hypotheses are being tested, such as in microarray experiments. In addition, investigators frequently do not have pilot data to estimate the sample sizes required for microarray studies.

Results: To address this challenge, we have developed a Microrarray PowerAtlas. The atlas enables estimation of statistical power by allowing investigators to appropriately plan studies by building upon previous studies that have similar experimental characteristics. Currently, there are sample sizes and power estimates based on 632 experiments from Gene Expression Omnibus (GEO). The PowerAtlas also permits investigators to upload their own pilot data and derive power and sample size estimates from these data. This resource will be updated regularly with new datasets from GEO and other databases such as The Nottingham Arabidopsis Stock Center (NASC).

Conclusion: This resource provides a valuable tool for investigators who are planning efficient microarray studies and estimating required sample sizes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Estimated PTP, PTN, and EDR for the GDS486 [17] dataset for a variety of samples sizes at an alpha level of 0.05.
Figure 2
Figure 2
For the GDS486 dataset [18] the EDR is presented across a variety of sample sizes and alpha levels.
Figure 3
Figure 3
For the GDS486 dataset [19] the PTP is presented across a variety of sample sizes and alpha levels.
Figure 4
Figure 4
Estimated PTP, PTN, and EDR for the GDS75 [20] dataset for a variety of samples sizes at an alpha level of 0.05.
Figure 5
Figure 5
For the GDS75 dataset [21] the EDR is presented across a variety of sample sizes and alpha levels.
Figure 6
Figure 6
For the GDS75 dataset [22] the PTP is presented across a variety of sample sizes and alpha levels.
Figure 7
Figure 7
Idealized representation of the distribution of p-values under the null hypothesis (no difference in gene expression between the two groups) for a valid test. The dotted Blue line is the expected distribution of p-values if the treatment has no effect and the solid Green line is the mixed-model [23] fit of the p-value constrained to be monotonically non-decreasing.
Figure 8
Figure 8
More realistic representation of the distribution of p-values under the null hypothesis (no difference in gene expression between the two groups) for a valid test. The dotted Blue line is the expected distribution of p-values if the treatment has no effect and the solid Green line is the mixed-model [23] fit of the p-value constrained to be monotonically non-decreasing.
Figure 9
Figure 9
Distribution of p-values when there is a difference in the gene expression between the two groups for some of the genes, but not all of the genes. This distribution is monotonically non-increasing from 0 to 1. The dotted Blue line is the expected distribution of p-values if the treatment has no effect and the solid Green line is the mixed-model [23] fit of the p-value constrained to be monotonically non-decreasing.
Figure 10
Figure 10
Distribution of p-values for a dataset in GEO that does not follow one of the possible distributions for p-values The dotted Blue line is the expected distribution of p-values if the treatment has no effect and the solid Green line is the mixed-model [23] fit of the p-value constrained to be monotonically non-decreasing.
Figure 11
Figure 11
Distribution of p-values for a dataset in GEO that does not follow one of the possible distributions for p-values. The dotted Blue line is the expected distribution of p-values if the treatment has no effect and the solid Green line is the mixed-model [23] fit of the p-value constrained to be monotonically non-decreasing. The dotted Blue line is the expected distribution of p-values if the treatment has no effect and the solid Green line is the mixed-model [23] fit of the p-value constrained to be monotonically non-decreasing.
Figure 12
Figure 12
Distribution of p-values for a dataset in GEO that does not follow one of the possible distributions for p-values. The dotted Blue line is the expected distribution of p-values if the treatment has no effect and the solid Green line is the mixed-model [23] fit of the p-value constrained to be monotonically non-decreasing.
Figure 13
Figure 13
Distribution of p-values for the comparison of RNA from 3 murine with homozygous PKDH mutations and a high kidney length-to-width ratio and RNA from 3 mice with homozygous PKDH mutations and a low kidney length-to-width ratios. The dotted Blue line is the expected distribution of p-values if the treatment has no effect and the solid Green line is the mixed-model [23] fit of the p-value constrained to be monotonically non-decreasing.
Figure 14
Figure 14
Distribution of p-values for the comparison of RNA from 7 mice with PKDH mutations and a high kidney length to width ratio and RNA from 7 mice with PKDH mutations and a low kidney length to width ratio. The dotted blue line is the expected distribution of p-values if the treatment has no effect and the solid green line is the mixed-model [23] fit of the p-value constrained to be monotonically non-decreasing.

References

    1. PowerAtlas T. http://www.poweratlas.org. 2006. http://www.poweratlas.org
    1. Pan W, Lin J, Le CT. How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach. Genome Biology. 2002;3:RESEARCH0022. - PMC - PubMed
    1. Lee ML, Whitmore GA. Power and sample size for DNA microarray studies. Stat Med. 2002;21:3543–3570. doi: 10.1002/sim.1335. - DOI - PubMed
    1. Wang SJ, Chen JJ. Sample size for identifying differentially expressed genes in microarray experiments. J Comput Biol. 2004;11:714–726. doi: 10.1089/cmb.2004.11.714. - DOI - PubMed
    1. Gadbury GL, Page GP, Edwards J, Kayo T, Weindruch R, Permana PA, Mountz J, Allison DB. Power Analysis and Sample Size Estimation in the Age of High Dimensional Biology. Stat Meth Med Res. 2004;13:325–338.

Publication types

MeSH terms