A framework for feature selection in clustering
- PMID: 20811510
- PMCID: PMC2930825
- DOI: 10.1198/jasa.2010.tm09415
A framework for feature selection in clustering
Abstract
We consider the problem of clustering observations using a potentially large set of features. One might expect that the true underlying clusters present in the data differ only with respect to a small fraction of the features, and will be missed if one clusters the observations using the full set of features. We propose a novel framework for sparse clustering, in which one clusters the observations using an adaptively chosen subset of the features. The method uses a lasso-type penalty to select the features. We use this framework to develop simple methods for sparse K-means and sparse hierarchical clustering. A single criterion governs both the selection of the features and the resulting clusters. These approaches are demonstrated on simulated data and on genomic data sets.
Figures









Similar articles
-
Detecting Meaningful Clusters From High-Dimensional Data: A Strongly Consistent Sparse Center-Based Clustering Approach.IEEE Trans Pattern Anal Mach Intell. 2022 Jun;44(6):2894-2908. doi: 10.1109/TPAMI.2020.3047489. Epub 2022 May 5. IEEE Trans Pattern Anal Mach Intell. 2022. PMID: 33360985
-
A Practical Guide to Sparse k-Means Clustering for Studying Molecular Development of the Human Brain.Front Neurosci. 2021 Nov 16;15:668293. doi: 10.3389/fnins.2021.668293. eCollection 2021. Front Neurosci. 2021. PMID: 34867140 Free PMC article.
-
Feature selection and semi-supervised clustering using multiobjective optimization.Springerplus. 2014 Aug 26;3:465. doi: 10.1186/2193-1801-3-465. eCollection 2014. Springerplus. 2014. PMID: 25279282 Free PMC article.
-
Integrative Generalized Convex Clustering Optimization and Feature Selection for Mixed Multi-View Data.J Mach Learn Res. 2021 Jan;22:55. J Mach Learn Res. 2021. PMID: 34744522 Free PMC article.
-
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217. Cochrane Database Syst Rev. 2022. PMID: 36321557 Free PMC article.
Cited by
-
Simultaneous cluster structure learning and estimation of heterogeneous graphs for matrix-variate fMRI data.Biometrics. 2023 Sep;79(3):2246-2259. doi: 10.1111/biom.13753. Epub 2022 Sep 13. Biometrics. 2023. PMID: 36017603 Free PMC article.
-
Phenotypic clusters within sepsis-associated multiple organ dysfunction syndrome.Intensive Care Med. 2015 May;41(5):814-22. doi: 10.1007/s00134-015-3764-7. Epub 2015 Apr 8. Intensive Care Med. 2015. PMID: 25851384 Free PMC article.
-
MicroRNA-16 suppresses metastasis in an orthotopic, but not autochthonous, mouse model of soft tissue sarcoma.Dis Model Mech. 2015 Aug 1;8(8):867-75. doi: 10.1242/dmm.017897. Epub 2015 Jun 4. Dis Model Mech. 2015. PMID: 26044957 Free PMC article.
-
Prognostic Immunity and Therapeutic Sensitivity Analyses Based on Differential Genomic Instability-Associated LncRNAs in Left- and Right-Sided Colon Adenocarcinoma.Front Mol Biosci. 2021 Aug 31;8:668888. doi: 10.3389/fmolb.2021.668888. eCollection 2021. Front Mol Biosci. 2021. PMID: 34532341 Free PMC article.
-
Higher-Order Disease Interactions in Multimorbidity Measurement: Marginal Benefit Over Additive Disease Summation.J Gerontol A Biol Sci Med Sci. 2024 Dec 11;80(1):glae282. doi: 10.1093/gerona/glae282. J Gerontol A Biol Sci Med Sci. 2024. PMID: 39565288
References
-
- Boyd S, Vandenberghe L. Convex Optimization. Cambridge University Press; 2004.
-
- Chang W-C. On using principal components before separating a mixture of two multivariate normal distributions. Journal of the Royal Statistical Society, Series C (Applied Statistics) 1983;32:267–275.
-
- Chipman H, Tibshirani R. Hybrid hierarchical clustering with applications to microarray data. Biostatistics. 2005;7:286–301. - PubMed
-
- Dempster A, Laird N, Rubin D. Maximum likelihood from incomplete data via the EM algorithm (with discussion) J R Statist Soc B. 1977;39:1–38.
Publication types
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources