Clustering of high-dimensional gene expression data with feature filtering methods and diffusion maps

Rui Xu¹, Steven Damelin, Boaz Nadler, Donald C Wunsch 2nd

Affiliations

Affiliation

¹ Applied Computational Intelligence Laboratory, Department of Electrical and Computer Engineering, Missouri University of Science and Technology, Rolla, MO 65409-0249, USA. rxu@mst.edu

PMID: 19962867
DOI: 10.1016/j.artmed.2009.06.001

Comparative Study

Clustering of high-dimensional gene expression data with feature filtering methods and diffusion maps

Rui Xu et al. Artif Intell Med. 2010 Feb-Mar.

. 2010 Feb-Mar;48(2-3):91-8.

doi: 10.1016/j.artmed.2009.06.001. Epub 2009 Dec 4.

Authors

Rui Xu¹, Steven Damelin, Boaz Nadler, Donald C Wunsch 2nd

Affiliation

¹ Applied Computational Intelligence Laboratory, Department of Electrical and Computer Engineering, Missouri University of Science and Technology, Rolla, MO 65409-0249, USA. rxu@mst.edu

PMID: 19962867
DOI: 10.1016/j.artmed.2009.06.001

Abstract

Objective: The importance of gene expression data in cancer diagnosis and treatment has become widely known by cancer researchers in recent years. However, one of the major challenges in the computational analysis of such data is the curse of dimensionality because of the overwhelming number of variables measured (genes) versus the small number of samples. Here, we use a two-step method to reduce the dimension of gene expression data and aim to address the problem of high dimensionality.

Methods: First, we extract a subset of genes based on statistical characteristics of their corresponding gene expression levels. Then, for further dimensionality reduction, we apply diffusion maps, which interpret the eigenfunctions of Markov matrices as a system of coordinates on the original data set, in order to obtain efficient representation of data geometric descriptions. Finally, a neural network clustering theory, fuzzy ART, is applied to the resulting data to generate clusters of cancer samples.

Results: Experimental results on the small round blue-cell tumor data set, compared with other widely used clustering algorithms, such as the hierarchical clustering algorithm and K-means, show that our proposed method can effectively identify different cancer types and generate high-quality cancer sample clusters.

Conclusion: The proposed feature selection methods and diffusion maps can achieve useful information from the multidimensional gene expression data and prove effective at addressing the problem of high dimensionality inherent in gene expression data analysis.

PubMed Disclaimer

Cited by

Molecular phenotyping using networks, diffusion, and topology: soft tissue sarcoma.
Mathews JC, Pouryahya M, Moosmüller C, Kevrekidis YG, Deasy JO, Tannenbaum A. Mathews JC, et al. Sci Rep. 2019 Sep 27;9(1):13982. doi: 10.1038/s41598-019-50300-2. Sci Rep. 2019. PMID: 31562358 Free PMC article.
A novel dimension reduction algorithm based on weighted kernel principal analysis for gene expression data.
Liu WB, Liang SN, Qin XW. Liu WB, et al. PLoS One. 2021 Oct 13;16(10):e0258326. doi: 10.1371/journal.pone.0258326. eCollection 2021. PLoS One. 2021. PMID: 34644329 Free PMC article.
Ensemble learning for microbiome-based caries diagnosis: multi-group modeling and biological interpretation from salivary and plaque metagenomic data.
Wei F, Wu Z, Li G, Sun X, Shi X, Tan L, Ai T, Qu L, Zheng S. Wei F, et al. BMC Oral Health. 2025 Jul 17;25(1):1188. doi: 10.1186/s12903-025-06590-2. BMC Oral Health. 2025. PMID: 40676575 Free PMC article.
A new avenue for classification and prediction of olive cultivars using supervised and unsupervised algorithms.
Beiki AH, Saboor S, Ebrahimi M. Beiki AH, et al. PLoS One. 2012;7(9):e44164. doi: 10.1371/journal.pone.0044164. Epub 2012 Sep 5. PLoS One. 2012. PMID: 22957050 Free PMC article.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
- ClinicalKey
- Elsevier Science

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Clustering of high-dimensional gene expression data with feature filtering methods and diffusion maps

Affiliation

Clustering of high-dimensional gene expression data with feature filtering methods and diffusion maps

Authors

Affiliation

Abstract

Similar articles

Cited by

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources