Modified fuzzy gap statistic for estimating preferable number of clusters in fuzzy k-means clustering
- PMID: 18397779
- DOI: 10.1263/jbb.105.273
Modified fuzzy gap statistic for estimating preferable number of clusters in fuzzy k-means clustering
Abstract
In clustering methods, the estimation of the optimal number of clusters is significant for subsequent analysis. Without detailed biological information on the genes involved, the evaluation of the number of clusters becomes difficult, and we have to rely on an internal measure that is based on the distribution of the data of the clustering result. The Gap statistic has been proposed as a superior method for estimating the number of clusters in crisp clustering. In this study, we proposed a modified Fuzzy Gap statistic (MFGS) and applied it to fuzzy k-means clustering. For estimating the number of clusters, fuzzy k-means clustering with the MFGS was applied to two artificial data sets with noise and to two experimentally observed gene expression data sets. For the artificial data sets, compared with other internal measures, the MFGS showed a higher performance in terms of robustness against noise for estimating the optimal number of clusters. Moreover, it could be used to estimate the optimal number of clusters in experimental data sets. It was confirmed that the proposed MFGS is a useful method for estimating the number of clusters for microarray data sets.
Similar articles
-
Detecting clusters of different geometrical shapes in microarray gene expression data.Bioinformatics. 2005 May 1;21(9):1927-34. doi: 10.1093/bioinformatics/bti251. Epub 2005 Jan 12. Bioinformatics. 2005. PMID: 15647300
-
Fuzzy ensemble clustering based on random projections for DNA microarray data analysis.Artif Intell Med. 2009 Feb-Mar;45(2-3):173-83. doi: 10.1016/j.artmed.2008.07.014. Epub 2008 Sep 17. Artif Intell Med. 2009. PMID: 18801650
-
Analysis of a Gibbs sampler method for model-based clustering of gene expression data.Bioinformatics. 2008 Jan 15;24(2):176-83. doi: 10.1093/bioinformatics/btm562. Epub 2007 Nov 22. Bioinformatics. 2008. PMID: 18033794
-
Techniques for clustering gene expression data.Comput Biol Med. 2008 Mar;38(3):283-93. doi: 10.1016/j.compbiomed.2007.11.001. Epub 2007 Dec 3. Comput Biol Med. 2008. PMID: 18061589 Review.
-
[Gene clustering analysis of DNA microarray data].Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2008 Jun;25(3):729-33. Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2008. PMID: 18693466 Review. Chinese.
Cited by
-
Detecting Non-Overlapping Signals with Dynamic Programming.Entropy (Basel). 2023 Jan 30;25(2):250. doi: 10.3390/e25020250. Entropy (Basel). 2023. PMID: 36832618 Free PMC article.
-
Analysis of gene expression profiles of soft tissue sarcoma using a combination of knowledge-based filtering with integration of multiple statistics.PLoS One. 2014 Sep 4;9(9):e106801. doi: 10.1371/journal.pone.0106801. eCollection 2014. PLoS One. 2014. PMID: 25188299 Free PMC article.
-
PPINGUIN: Peptide Profiling Guided Identification of Proteins improves quantitation of iTRAQ ratios.BMC Bioinformatics. 2012 Feb 16;13:34. doi: 10.1186/1471-2105-13-34. BMC Bioinformatics. 2012. PMID: 22340093 Free PMC article.
-
Dual blockade of IL-10 and PD-1 leads to control of SIV viral rebound following analytical treatment interruption.Nat Immunol. 2024 Oct;25(10):1900-1912. doi: 10.1038/s41590-024-01952-4. Epub 2024 Sep 12. Nat Immunol. 2024. PMID: 39266691 Free PMC article.
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
Miscellaneous