Fuzzy ensemble clustering based on random projections for DNA microarray data analysis
- PMID: 18801650
- DOI: 10.1016/j.artmed.2008.07.014
Fuzzy ensemble clustering based on random projections for DNA microarray data analysis
Abstract
Objective: Two major problems related the unsupervised analysis of gene expression data are represented by the accuracy and reliability of the discovered clusters, and by the biological fact that the boundaries between classes of patients or classes of functionally related genes are sometimes not clearly defined. The main goal of this work consists in the exploration of new strategies and in the development of new clustering methods to improve the accuracy and robustness of clustering results, taking into account the uncertainty underlying the assignment of examples to clusters in the context of gene expression data analysis.
Methodology: We propose a fuzzy ensemble clustering approach both to improve the accuracy of clustering results and to take into account the inherent fuzziness of biological and bio-medical gene expression data. We applied random projections that obey the Johnson-Lindenstrauss lemma to obtain several instances of lower dimensional gene expression data from the original high-dimensional ones, approximately preserving the information and the metric structure of the original data. Then we adopt a double fuzzy approach to obtain a consensus ensemble clustering, by first applying a fuzzy k-means algorithm to the different instances of the projected low-dimensional data and then by using a fuzzy t-norm to combine the multiple clusterings. Several variants of the fuzzy ensemble clustering algorithms are proposed, according to different techniques to combine the base clusterings and to obtain the final consensus clustering.
Results and conclusion: We applied our proposed fuzzy ensemble methods to the gene expression analysis of leukemia, lymphoma, adenocarcinoma and melanoma patients, and we compared the results with other state of the art ensemble methods. Results show that in some cases, taking into account the natural fuzziness of the data, we can improve the discovery of classes of patients defined at bio-molecular level. The reduction of the dimension of the data, achieved through random projections techniques, is well-suited to the characteristics of high-dimensional gene expression data, thus resulting in improved performance with respect to single fuzzy k-means and with respect to ensemble methods based on resampling techniques. Moreover, we show that the analysis of the accuracy and diversity of the base fuzzy clusterings can be useful to explain the advantages and the limitations of the proposed fuzzy ensemble approach.
Similar articles
-
Randomized maps for assessing the reliability of patients clusters in DNA microarray data analyses.Artif Intell Med. 2006 Jun;37(2):85-109. doi: 10.1016/j.artmed.2006.03.005. Epub 2006 May 23. Artif Intell Med. 2006. PMID: 16720093
-
Modified fuzzy gap statistic for estimating preferable number of clusters in fuzzy k-means clustering.J Biosci Bioeng. 2008 Mar;105(3):273-81. doi: 10.1263/jbb.105.273. J Biosci Bioeng. 2008. PMID: 18397779
-
Clustering of high-dimensional gene expression data with feature filtering methods and diffusion maps.Artif Intell Med. 2010 Feb-Mar;48(2-3):91-8. doi: 10.1016/j.artmed.2009.06.001. Epub 2009 Dec 4. Artif Intell Med. 2010. PMID: 19962867
-
Techniques for clustering gene expression data.Comput Biol Med. 2008 Mar;38(3):283-93. doi: 10.1016/j.compbiomed.2007.11.001. Epub 2007 Dec 3. Comput Biol Med. 2008. PMID: 18061589 Review.
-
[Gene clustering analysis of DNA microarray data].Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2008 Jun;25(3):729-33. Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2008. PMID: 18693466 Review. Chinese.
Cited by
-
Phenotype clustering in health care: A narrative review for clinicians.Front Artif Intell. 2022 Aug 12;5:842306. doi: 10.3389/frai.2022.842306. eCollection 2022. Front Artif Intell. 2022. PMID: 36034597 Free PMC article. Review.
-
Prediction of slaughter age in pigs and assessment of the predictive value of phenotypic and genetic information using random forest.J Anim Sci. 2018 Dec 3;96(12):4935-4943. doi: 10.1093/jas/sky359. J Anim Sci. 2018. PMID: 30239725 Free PMC article.
-
Unsupervised Algorithms for Microarray Sample Stratification.Methods Mol Biol. 2022;2401:121-146. doi: 10.1007/978-1-0716-1839-4_9. Methods Mol Biol. 2022. PMID: 34902126
-
Interpolation based consensus clustering for gene expression time series.BMC Bioinformatics. 2015 Apr 16;16:117. doi: 10.1186/s12859-015-0541-0. BMC Bioinformatics. 2015. PMID: 25888019 Free PMC article.
-
Clustering cancer gene expression data by projective clustering ensemble.PLoS One. 2017 Feb 24;12(2):e0171429. doi: 10.1371/journal.pone.0171429. eCollection 2017. PLoS One. 2017. PMID: 28234920 Free PMC article.
MeSH terms
LinkOut - more resources
Full Text Sources