Variable selection for model-based high-dimensional clustering and its application to microarray data
- PMID: 17970821
- DOI: 10.1111/j.1541-0420.2007.00922.x
Variable selection for model-based high-dimensional clustering and its application to microarray data
Abstract
Variable selection in high-dimensional clustering analysis is an important yet challenging problem. In this article, we propose two methods that simultaneously separate data points into similar clusters and select informative variables that contribute to the clustering. Our methods are in the framework of penalized model-based clustering. Unlike the classical L(1)-norm penalization, the penalty terms that we propose make use of the fact that parameters belonging to one variable should be treated as a natural "group." Numerical results indicate that the two new methods tend to remove noninformative variables more effectively and provide better clustering results than the L(1)-norm approach.
Similar articles
-
Improved centroids estimation for the nearest shrunken centroid classifier.Bioinformatics. 2007 Apr 15;23(8):972-9. doi: 10.1093/bioinformatics/btm046. Epub 2007 Mar 24. Bioinformatics. 2007. PMID: 17384429
-
Clustering of change patterns using Fourier coefficients.Bioinformatics. 2008 Jan 15;24(2):184-91. doi: 10.1093/bioinformatics/btm568. Epub 2007 Nov 19. Bioinformatics. 2008. PMID: 18025003
-
A mixture model with random-effects components for clustering correlated gene-expression profiles.Bioinformatics. 2006 Jul 15;22(14):1745-52. doi: 10.1093/bioinformatics/btl165. Epub 2006 May 3. Bioinformatics. 2006. PMID: 16675467
-
How does gene expression clustering work?Nat Biotechnol. 2005 Dec;23(12):1499-501. doi: 10.1038/nbt1205-1499. Nat Biotechnol. 2005. PMID: 16333293 Review.
-
Techniques for clustering gene expression data.Comput Biol Med. 2008 Mar;38(3):283-93. doi: 10.1016/j.compbiomed.2007.11.001. Epub 2007 Dec 3. Comput Biol Med. 2008. PMID: 18061589 Review.
Cited by
-
Identification of significant features in DNA microarray data.Wiley Interdiscip Rev Comput Stat. 2013 Jul;5(4):10.1002/wics.1260. doi: 10.1002/wics.1260. Wiley Interdiscip Rev Comput Stat. 2013. PMID: 24244802 Free PMC article.
-
Penalized mixtures of factor analyzers with application to clustering high-dimensional microarray data.Bioinformatics. 2010 Feb 15;26(4):501-8. doi: 10.1093/bioinformatics/btp707. Epub 2009 Dec 23. Bioinformatics. 2010. PMID: 20031967 Free PMC article.
-
Comparing Model Selection and Regularization Approaches to Variable Selection in Model-Based Clustering.J Soc Fr Statistique (2009). 2014;155(2):57-71. J Soc Fr Statistique (2009). 2014. PMID: 25279246 Free PMC article.
-
Supervised Bayesian latent class models for high-dimensional data.Stat Med. 2012 Jun 15;31(13):1342-60. doi: 10.1002/sim.4448. Epub 2012 Apr 11. Stat Med. 2012. PMID: 22495652 Free PMC article.
-
Discovering a sparse set of pairwise discriminating features in high-dimensional data.Bioinformatics. 2021 Apr 19;37(2):202-212. doi: 10.1093/bioinformatics/btaa690. Bioinformatics. 2021. PMID: 32730566 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources