Graph-based consensus clustering for class discovery from gene expression data
- PMID: 17872912
- DOI: 10.1093/bioinformatics/btm463
Graph-based consensus clustering for class discovery from gene expression data
Abstract
Motivation: Consensus clustering, also known as cluster ensemble, is one of the important techniques for microarray data analysis, and is particularly useful for class discovery from microarray data. Compared with traditional clustering algorithms, consensus clustering approaches have the ability to integrate multiple partitions from different cluster solutions to improve the robustness, stability, scalability and parallelization of the clustering algorithms. By consensus clustering, one can discover the underlying classes of the samples in gene expression data.
Results: In addition to exploring a graph-based consensus clustering (GCC) algorithm to estimate the underlying classes of the samples in microarray data, we also design a new validation index to determine the number of classes in microarray data. To our knowledge, this is the first time in which GCC is applied to class discovery for microarray data. Given a pre specified maximum number of classes (denoted as K(max) in this article), our algorithm can discover the true number of classes for the samples in microarray data according to a new cluster validation index called the Modified Rand Index. Experiments on gene expression data indicate that our new algorithm can (i) outperform most of the existing algorithms, (ii) identify the number of classes correctly in real cancer datasets, and (iii) discover the classes of samples with biological meaning.
Availability: Matlab source code for the GCC algorithm is available upon request from Zhiwen Yu.
Similar articles
-
A mixture model with random-effects components for clustering correlated gene-expression profiles.Bioinformatics. 2006 Jul 15;22(14):1745-52. doi: 10.1093/bioinformatics/btl165. Epub 2006 May 3. Bioinformatics. 2006. PMID: 16675467
-
Class discovery from gene expression data based on perturbation and cluster ensemble.IEEE Trans Nanobioscience. 2009 Jun;8(2):147-60. doi: 10.1109/TNB.2009.2023321. Epub 2009 Jun 2. IEEE Trans Nanobioscience. 2009. PMID: 19497836
-
Clustering of change patterns using Fourier coefficients.Bioinformatics. 2008 Jan 15;24(2):184-91. doi: 10.1093/bioinformatics/btm568. Epub 2007 Nov 19. Bioinformatics. 2008. PMID: 18025003
-
How does gene expression clustering work?Nat Biotechnol. 2005 Dec;23(12):1499-501. doi: 10.1038/nbt1205-1499. Nat Biotechnol. 2005. PMID: 16333293 Review.
-
Classification based upon gene expression data: bias and precision of error rates.Bioinformatics. 2007 Jun 1;23(11):1363-70. doi: 10.1093/bioinformatics/btm117. Epub 2007 Mar 28. Bioinformatics. 2007. PMID: 17392326 Review.
Cited by
-
Coral: an integrated suite of visualizations for comparing clusterings.BMC Bioinformatics. 2012 Oct 29;13:276. doi: 10.1186/1471-2105-13-276. BMC Bioinformatics. 2012. PMID: 23102108 Free PMC article.
-
Assisted gene expression-based clustering with AWNCut.Stat Med. 2018 Dec 20;37(29):4386-4403. doi: 10.1002/sim.7928. Epub 2018 Aug 9. Stat Med. 2018. PMID: 30094873 Free PMC article.
-
Overlapping clustering of gene expression data using penalized weighted normalized cut.Genet Epidemiol. 2018 Dec;42(8):796-811. doi: 10.1002/gepi.22164. Epub 2018 Oct 9. Genet Epidemiol. 2018. PMID: 30302823 Free PMC article.
-
A unified computational model for revealing and predicting subtle subtypes of cancers.BMC Bioinformatics. 2012 May 1;13:70. doi: 10.1186/1471-2105-13-70. BMC Bioinformatics. 2012. PMID: 22548981 Free PMC article.
-
Critical limitations of consensus clustering in class discovery.Sci Rep. 2014 Aug 27;4:6207. doi: 10.1038/srep06207. Sci Rep. 2014. PMID: 25158761 Free PMC article.
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources