Clustering of high-dimensional gene expression data with feature filtering methods and diffusion maps
- PMID: 19962867
- DOI: 10.1016/j.artmed.2009.06.001
Clustering of high-dimensional gene expression data with feature filtering methods and diffusion maps
Abstract
Objective: The importance of gene expression data in cancer diagnosis and treatment has become widely known by cancer researchers in recent years. However, one of the major challenges in the computational analysis of such data is the curse of dimensionality because of the overwhelming number of variables measured (genes) versus the small number of samples. Here, we use a two-step method to reduce the dimension of gene expression data and aim to address the problem of high dimensionality.
Methods: First, we extract a subset of genes based on statistical characteristics of their corresponding gene expression levels. Then, for further dimensionality reduction, we apply diffusion maps, which interpret the eigenfunctions of Markov matrices as a system of coordinates on the original data set, in order to obtain efficient representation of data geometric descriptions. Finally, a neural network clustering theory, fuzzy ART, is applied to the resulting data to generate clusters of cancer samples.
Results: Experimental results on the small round blue-cell tumor data set, compared with other widely used clustering algorithms, such as the hierarchical clustering algorithm and K-means, show that our proposed method can effectively identify different cancer types and generate high-quality cancer sample clusters.
Conclusion: The proposed feature selection methods and diffusion maps can achieve useful information from the multidimensional gene expression data and prove effective at addressing the problem of high dimensionality inherent in gene expression data analysis.
2009 Elsevier B.V. All rights reserved.
Similar articles
-
A GMM-IG framework for selecting genes as expression panel biomarkers.Artif Intell Med. 2010 Feb-Mar;48(2-3):75-82. doi: 10.1016/j.artmed.2009.07.006. Epub 2009 Dec 8. Artif Intell Med. 2010. PMID: 20004087
-
Fuzzy ensemble clustering based on random projections for DNA microarray data analysis.Artif Intell Med. 2009 Feb-Mar;45(2-3):173-83. doi: 10.1016/j.artmed.2008.07.014. Epub 2008 Sep 17. Artif Intell Med. 2009. PMID: 18801650
-
Mixture classification model based on clinical markers for breast cancer prognosis.Artif Intell Med. 2010 Feb-Mar;48(2-3):129-37. doi: 10.1016/j.artmed.2009.07.008. Epub 2009 Dec 14. Artif Intell Med. 2010. PMID: 20005686
-
Techniques for clustering gene expression data.Comput Biol Med. 2008 Mar;38(3):283-93. doi: 10.1016/j.compbiomed.2007.11.001. Epub 2007 Dec 3. Comput Biol Med. 2008. PMID: 18061589 Review.
-
Gene expression profiling--Clusters of possibilities.Methods. 2010 Apr;50(4):323-35. doi: 10.1016/j.ymeth.2010.01.009. Epub 2010 Jan 15. Methods. 2010. PMID: 20079843 Review.
Cited by
-
Molecular phenotyping using networks, diffusion, and topology: soft tissue sarcoma.Sci Rep. 2019 Sep 27;9(1):13982. doi: 10.1038/s41598-019-50300-2. Sci Rep. 2019. PMID: 31562358 Free PMC article.
-
A novel dimension reduction algorithm based on weighted kernel principal analysis for gene expression data.PLoS One. 2021 Oct 13;16(10):e0258326. doi: 10.1371/journal.pone.0258326. eCollection 2021. PLoS One. 2021. PMID: 34644329 Free PMC article.
-
Ensemble learning for microbiome-based caries diagnosis: multi-group modeling and biological interpretation from salivary and plaque metagenomic data.BMC Oral Health. 2025 Jul 17;25(1):1188. doi: 10.1186/s12903-025-06590-2. BMC Oral Health. 2025. PMID: 40676575 Free PMC article.
-
A new avenue for classification and prediction of olive cultivars using supervised and unsupervised algorithms.PLoS One. 2012;7(9):e44164. doi: 10.1371/journal.pone.0044164. Epub 2012 Sep 5. PLoS One. 2012. PMID: 22957050 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources