Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Jan 24:9:42.
doi: 10.1186/1471-2105-9-42.

New resampling method for evaluating stability of clusters

Affiliations

New resampling method for evaluating stability of clusters

Irina M Gana Dresen et al. BMC Bioinformatics. .

Abstract

Background: Hierarchical clustering is a widely applied tool in the analysis of microarray gene expression data. The assessment of cluster stability is a major challenge in clustering procedures. Statistical methods are required to distinguish between real and random clusters. Several methods for assessing cluster stability have been published, including resampling methods such as the bootstrap. We propose a new resampling method based on continuous weights to assess the stability of clusters in hierarchical clustering. While in bootstrapping approximately one third of the original items is lost, continuous weights avoid zero elements and instead allow non integer diagonal elements, which leads to retention of the full dimensionality of space, i.e. each variable of the original data set is represented in the resampling sample.

Results: Comparison of continuous weights and bootstrapping using real datasets and simulation studies reveals the advantage of continuous weights especially when the dataset has only few observations, few differentially expressed genes and the fold change of differentially expressed genes is low.

Conclusion: We recommend the use of continuous weights in small as well as in large datasets, because according to our results they produce at least the same results as conventional bootstrapping and in some cases they surpass it.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Hierarchical clustering of chromosome 6 (412 probe sets) from the uveal melanoma data. a) original dendrogram, b) consensus tree continuous weights, c) consensus tree bootstrap.
Figure 2
Figure 2
Hierarchical clustering of chromosome Y (8 probe sets) from the uveal melanoma data. a) original dendrogram, b) consensus tree continuous weights, c) consensus tree bootstrap.
Figure 3
Figure 3
Hierarchical clustering of primate data (7 features) [26,27]. a) original dendrogram, b) consensus tree continuous weights, c) consensus tree bootstrap.
Figure 4
Figure 4
Simulation study. Two groups with 10 variables each, fold change of 9, number of genes and number of differentially expressed genes vary, symbols indicate the proportion of differentially expressed genes where groups are just not separated any more; □: bootstrap, ▲: continuous weights
Figure 5
Figure 5
Simulation study. Two groups with 10 variables each, number of genes equals 100, fold change and number of differentially expressed genes vary, symbols indicate the number of differentially expressed genes where groups are just not separated any more; □: bootstrap, ▲: continuous weights
Figure 6
Figure 6
Simulation study. Two groups, number of genes equals 100, fold change of 9, number of variables and number of differentially expressed genes vary, symbols indicate the number of differentially expressed genes where groups are just not separated any more; □: bootstrap, ▲: continuous weights
Figure 7
Figure 7
Simulation study. Number of genes equals 100, fold change constant, number of groups and number of differentially expressed genes vary, symbols indicate the number of differentially expressed genes where groups are just not separated any more; □: bootstrap, ▲: continuous weights

Similar articles

Cited by

References

    1. Quackenbush J. Computational analysis of microarray data. Nat Rev Genet. 2001;2:418–427. doi: 10.1038/35076576. - DOI - PubMed
    1. Sørensen T. A method of establishing groups of equal amplitude in plant sociology based on similarity of species content and its application to analyses of the vegetation on Danish commons. Biologiske Skrifter. 1948;5:1–34.
    1. Sneath PHA. The application of computers to taxonomy. J Gen Microbiol. 1957;17:201–226. - PubMed
    1. Sokal RR, Michener CD. A statistical method for evaluating systematic relationships. Univ Kans Sci Bull. 1958;38:1409–1438.
    1. Ward JH. Hierarchical grouping to optimize an objective function. J Am Stat Assoc. 1963;58:236–244. doi: 10.2307/2282967. - DOI

Publication types

MeSH terms

LinkOut - more resources