Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm
- PMID: 16257984
- DOI: 10.1093/bioinformatics/bti746
Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm
Abstract
Motivation: Hierarchical and relocation clustering (e.g. K-means and self-organizing maps) have been successful tools in the display and analysis of whole genome DNA microarray expression data. However, the results of hierarchical clustering are sensitive to outliers, and most relocation methods give results which are dependent on the initialization of the algorithm. Therefore, it is difficult to assess the significance of the results. We have developed a consensus clustering algorithm, where the final result is averaged over multiple clustering runs, giving a robust and reproducible clustering, capable of capturing small signal variations. The algorithm preserves valuable properties of hierarchical clustering, which is useful for visualization and interpretation of the results.
Results: We show for the first time that one can take advantage of multiple clustering runs in DNA microarray analysis by collecting re-occurring clustering patterns in a co-occurrence matrix. The results show that consensus clustering obtained from clustering multiple times with Variational Bayes Mixtures of Gaussians or K-means significantly reduces the classification error rate for a simulated dataset. The method is flexible and it is possible to find consensus clusters from different clustering algorithms. Thus, the algorithm can be used as a framework to test in a quantitative manner the homogeneity of different clustering algorithms. We compare the method with a number of state-of-the-art clustering methods. It is shown that the method is robust and gives low classification error rates for a realistic, simulated dataset. The algorithm is also demonstrated for real datasets. It is shown that more biological meaningful transcriptional patterns can be found without conservative statistical or fold-change exclusion of data.
Availability: Matlab source code for the clustering algorithm ClusterLustre, and the simulated dataset for testing are available upon request from T.G. and O.W.
Similar articles
-
Clustering microarray gene expression data using weighted Chinese restaurant process.Bioinformatics. 2006 Aug 15;22(16):1988-97. doi: 10.1093/bioinformatics/btl284. Epub 2006 Jun 9. Bioinformatics. 2006. PMID: 16766561
-
Graph-based consensus clustering for class discovery from gene expression data.Bioinformatics. 2007 Nov 1;23(21):2888-96. doi: 10.1093/bioinformatics/btm463. Epub 2007 Sep 14. Bioinformatics. 2007. PMID: 17872912
-
Detecting clusters of different geometrical shapes in microarray gene expression data.Bioinformatics. 2005 May 1;21(9):1927-34. doi: 10.1093/bioinformatics/bti251. Epub 2005 Jan 12. Bioinformatics. 2005. PMID: 15647300
-
Comparing algorithms for clustering of expression data: how to assess gene clusters.Methods Mol Biol. 2009;541:479-509. doi: 10.1007/978-1-59745-243-4_21. Methods Mol Biol. 2009. PMID: 19381534 Review.
-
Overview on techniques in cluster analysis.Methods Mol Biol. 2010;593:81-107. doi: 10.1007/978-1-60327-194-3_5. Methods Mol Biol. 2010. PMID: 19957146 Review.
Cited by
-
A permutation test for determining significance of clusters with applications to spatial and gene expression data.Comput Stat Data Anal. 2009 Oct 1;53(12):4290-4300. doi: 10.1016/j.csda.2009.05.031. Comput Stat Data Anal. 2009. PMID: 21258660 Free PMC article.
-
Necroptosis-Related LncRNAs Signature and Subtypes for Predicting Prognosis and Revealing the Immune Microenvironment in Breast Cancer.Front Oncol. 2022 May 24;12:887318. doi: 10.3389/fonc.2022.887318. eCollection 2022. Front Oncol. 2022. PMID: 35686108 Free PMC article.
-
Band-based similarity indices for gene expression classification and clustering.Sci Rep. 2021 Nov 3;11(1):21609. doi: 10.1038/s41598-021-00678-9. Sci Rep. 2021. PMID: 34732744 Free PMC article.
-
Transcriptome signatures in Helicobacter pylori-infected mucosa identifies acidic mammalian chitinase loss as a corpus atrophy marker.BMC Med Genomics. 2013 Oct 11;6:41. doi: 10.1186/1755-8794-6-41. BMC Med Genomics. 2013. PMID: 24119614 Free PMC article.
-
Cross-species microarray analysis with the OSCAR system suggests an INSR->Pax6->NQO1 neuro-protective pathway in aging and Alzheimer's disease.Nucleic Acids Res. 2007 Jul;35(Web Server issue):W105-14. doi: 10.1093/nar/gkm408. Epub 2007 Jun 1. Nucleic Acids Res. 2007. PMID: 17545194 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources