Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2001 Jul 31;98(16):8961-5.
doi: 10.1073/pnas.161273698. Epub 2001 Jul 24.

Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments

Affiliations

Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments

M K Kerr et al. Proc Natl Acad Sci U S A. .

Abstract

We introduce a general technique for making statistical inference from clustering tools applied to gene expression microarray data. The approach utilizes an analysis of variance model to achieve normalization and estimate differential expression of genes across multiple conditions. Statistical inference is based on the application of a randomization technique, bootstrapping. Bootstrapping has previously been used to obtain confidence intervals for estimates of differential expression for individual genes. Here we apply bootstrapping to assess the stability of results from a cluster analysis. We illustrate the technique with a publicly available data set and draw conclusions about the reliability of clustering results in light of variation in the data. The bootstrapping procedure relies on experimental replication. We discuss the implications of replication and good design in microarray experiments.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Temporal profiles for select genes. The solid line gives the profile estimated by using Model 3. Error bars around the profiles are 99% bootstrap confidence intervals. The dotted line gives the temporal profile estimated with log ratios, rescaled to have the same standard deviation as the solid line.
Figure 2
Figure 2
The seven model profiles used for clustering. The profiles are rescaled to a have standard deviation of 1, which does not affect the clustering results because clustering is based on correlations.
Figure 3
Figure 3
Ninety-five-percent-stable genes for the seven model profiles based on 500 bootstrap clusterings. The plotted profiles have been rescaled to have a standard deviation of 1. See Fig. 5, an expanded version of this figure, which is published as supplemental data on the PNAS web site.
Figure 4
Figure 4
An alternative experimental design for the sporulation study, represented as a directed graph. The boxes represent RNA samples and the arrows represent microarrays. The tail of an arrow is, say, the “red” dye and the head of an arrow is the “green” dye. Thus, the arrow from the time-0 sample to the half-hour sample means to hybridize an array with red-labeled time-0 mRNA and green-labeled mRNA from the half-hour sample. Such a design has advantages over the plan used by Chu et al. in balance, precision of estimates, and distribution of residuals.

References

    1. Brown P O, Botstein D. Nat Genet Suppl. 1999;21:33–37. - PubMed
    1. Chu S, DeRisi J, Eisen M, Mulholland J, Botstein D, Brown P O. Science. 1998;282:699–705. - PubMed
    1. Eisen M, Spellman P T, Brown P O, Botstein D. Proc Natl Acad Sci USA. 1998;95:14863–14868. - PMC - PubMed
    1. Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander E S, Golub T R. Proc Natl Acad Sci USA. 1999;96:2907–2912. - PMC - PubMed
    1. Hastie, T., Tibshirani, R., Eisen, M. B., Alizadeh, A., Levy, R., Staudt, L., Chan, W. C., Botstein, D. & Brown, P. (August 4, 2000) Genome Biol., http://genomebiology.com/2000/1/2/research/0003/. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources