Validating clustering for gene expression data
- PMID: 11301299
- DOI: 10.1093/bioinformatics/17.4.309
Validating clustering for gene expression data
Abstract
Motivation: Many clustering algorithms have been proposed for the analysis of gene expression data, but little guidance is available to help choose among them. We provide a systematic framework for assessing the results of clustering algorithms. Clustering algorithms attempt to partition the genes into groups exhibiting similar patterns of variation in expression level. Our methodology is to apply a clustering algorithm to the data from all but one experimental condition. The remaining condition is used to assess the predictive power of the resulting clusters-meaningful clusters should exhibit less variation in the remaining condition than clusters formed by chance.
Results: We successfully applied our methodology to compare six clustering algorithms on four gene expression data sets. We found our quantitative measures of cluster quality to be positively correlated with external standards of cluster quality.
Similar articles
-
Visualization and evaluation of clusters for exploratory analysis of gene expression data.J Biomed Inform. 2002 Feb;35(1):25-36. doi: 10.1016/s1532-0464(02)00001-1. J Biomed Inform. 2002. PMID: 12415724
-
Knowledge-assisted recognition of cluster boundaries in gene expression data.Artif Intell Med. 2005 Sep-Oct;35(1-2):171-83. doi: 10.1016/j.artmed.2005.02.007. Artif Intell Med. 2005. PMID: 16054350
-
Model-based clustering and data transformations for gene expression data.Bioinformatics. 2001 Oct;17(10):977-87. doi: 10.1093/bioinformatics/17.10.977. Bioinformatics. 2001. PMID: 11673243
-
Clustering of gene expression data: performance and similarity analysis.BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S19. doi: 10.1186/1471-2105-7-S4-S19. BMC Bioinformatics. 2006. PMID: 17217511 Free PMC article.
-
Comparing algorithms for clustering of expression data: how to assess gene clusters.Methods Mol Biol. 2009;541:479-509. doi: 10.1007/978-1-59745-243-4_21. Methods Mol Biol. 2009. PMID: 19381534 Review.
Cited by
-
Molecular Profile of Tumor-Specific CD8+ T Cell Hypofunction in a Transplantable Murine Cancer Model.J Immunol. 2016 Aug 15;197(4):1477-88. doi: 10.4049/jimmunol.1600589. Epub 2016 Jul 1. J Immunol. 2016. PMID: 27371726 Free PMC article.
-
Dynamic clustering of gene expression.ISRN Bioinform. 2012 Oct 16;2012:537217. doi: 10.5402/2012/537217. eCollection 2012. ISRN Bioinform. 2012. PMID: 25969750 Free PMC article.
-
Evolutionary constraints and expression analysis of gene duplications in Rhodobacter sphaeroides 2.4.1.BMC Res Notes. 2012 Apr 25;5:192. doi: 10.1186/1756-0500-5-192. BMC Res Notes. 2012. PMID: 22533893 Free PMC article.
-
Gene expression profiling of human mesenchymal stem cells derived from bone marrow during expansion and osteoblast differentiation.BMC Genomics. 2007 Mar 12;8:70. doi: 10.1186/1471-2164-8-70. BMC Genomics. 2007. PMID: 17352823 Free PMC article.
-
GenClust: a genetic algorithm for clustering gene expression data.BMC Bioinformatics. 2005 Dec 7;6:289. doi: 10.1186/1471-2105-6-289. BMC Bioinformatics. 2005. PMID: 16336639 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Molecular Biology Databases