New resampling method for evaluating stability of clusters
- PMID: 18218074
- PMCID: PMC2265265
- DOI: 10.1186/1471-2105-9-42
New resampling method for evaluating stability of clusters
Abstract
Background: Hierarchical clustering is a widely applied tool in the analysis of microarray gene expression data. The assessment of cluster stability is a major challenge in clustering procedures. Statistical methods are required to distinguish between real and random clusters. Several methods for assessing cluster stability have been published, including resampling methods such as the bootstrap. We propose a new resampling method based on continuous weights to assess the stability of clusters in hierarchical clustering. While in bootstrapping approximately one third of the original items is lost, continuous weights avoid zero elements and instead allow non integer diagonal elements, which leads to retention of the full dimensionality of space, i.e. each variable of the original data set is represented in the resampling sample.
Results: Comparison of continuous weights and bootstrapping using real datasets and simulation studies reveals the advantage of continuous weights especially when the dataset has only few observations, few differentially expressed genes and the fold change of differentially expressed genes is low.
Conclusion: We recommend the use of continuous weights in small as well as in large datasets, because according to our results they produce at least the same results as conventional bootstrapping and in some cases they surpass it.
Figures







Similar articles
-
Clustering of gene expression data: performance and similarity analysis.BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S19. doi: 10.1186/1471-2105-7-S4-S19. BMC Bioinformatics. 2006. PMID: 17217511 Free PMC article.
-
Bagging to improve the accuracy of a clustering procedure.Bioinformatics. 2003 Jun 12;19(9):1090-9. doi: 10.1093/bioinformatics/btg038. Bioinformatics. 2003. PMID: 12801869
-
Comparing algorithms for clustering of expression data: how to assess gene clusters.Methods Mol Biol. 2009;541:479-509. doi: 10.1007/978-1-59745-243-4_21. Methods Mol Biol. 2009. PMID: 19381534 Review.
-
A Resampling Based Clustering Algorithm for Replicated Gene Expression Data.IEEE/ACM Trans Comput Biol Bioinform. 2015 Nov-Dec;12(6):1295-303. doi: 10.1109/TCBB.2015.2403320. IEEE/ACM Trans Comput Biol Bioinform. 2015. PMID: 26671802
-
Evaluation and comparison of gene clustering methods in microarray analysis.Bioinformatics. 2006 Oct 1;22(19):2405-12. doi: 10.1093/bioinformatics/btl406. Epub 2006 Jul 31. Bioinformatics. 2006. PMID: 16882653
Cited by
-
Very Important Pool (VIP) genes--an application for microarray-based molecular signatures.BMC Bioinformatics. 2008 Aug 12;9 Suppl 9(Suppl 9):S9. doi: 10.1186/1471-2105-9-S9-S9. BMC Bioinformatics. 2008. PMID: 18793473 Free PMC article.
-
A novel measure and significance testing in data analysis of cell image segmentation.BMC Bioinformatics. 2017 Mar 14;18(1):168. doi: 10.1186/s12859-017-1527-x. BMC Bioinformatics. 2017. PMID: 28292256 Free PMC article.
-
Merged consensus clustering to assess and improve class discovery with microarray data.BMC Bioinformatics. 2010 Dec 3;11:590. doi: 10.1186/1471-2105-11-590. BMC Bioinformatics. 2010. PMID: 21129181 Free PMC article.
References
-
- Sørensen T. A method of establishing groups of equal amplitude in plant sociology based on similarity of species content and its application to analyses of the vegetation on Danish commons. Biologiske Skrifter. 1948;5:1–34.
-
- Sneath PHA. The application of computers to taxonomy. J Gen Microbiol. 1957;17:201–226. - PubMed
-
- Sokal RR, Michener CD. A statistical method for evaluating systematic relationships. Univ Kans Sci Bull. 1958;38:1409–1438.
-
- Ward JH. Hierarchical grouping to optimize an objective function. J Am Stat Assoc. 1963;58:236–244. doi: 10.2307/2282967. - DOI
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources