Mixtures of common t-factor analyzers for clustering high-dimensional microarray data
- PMID: 21372081
- DOI: 10.1093/bioinformatics/btr112
Mixtures of common t-factor analyzers for clustering high-dimensional microarray data
Abstract
Motivation: Mixtures of factor analyzers enable model-based clustering to be undertaken for high-dimensional microarray data, where the number of observations n is small relative to the number of genes p. Moreover, when the number of clusters is not small, for example, where there are several different types of cancer, there may be the need to reduce further the number of parameters in the specification of the component-covariance matrices. A further reduction can be achieved by using mixtures of factor analyzers with common component-factor loadings (MCFA), which is a more parsimonious model. However, this approach is sensitive to both non-normality and outliers, which are commonly observed in microarray experiments. This sensitivity of the MCFA approach is due to its being based on a mixture model in which the multivariate normal family of distributions is assumed for the component-error and factor distributions.
Results: An extension to mixtures of t-factor analyzers with common component-factor loadings is considered, whereby the multivariate t-family is adopted for the component-error and factor distributions. An EM algorithm is developed for the fitting of mixtures of common t-factor analyzers. The model can handle data with tails longer than that of the normal distribution, is robust against outliers and allows the data to be displayed in low-dimensional plots. It is applied here to both synthetic data and some microarray gene expression data for clustering and shows its better performance over several existing methods.
Availability: The algorithms were implemented in Matlab. The Matlab code is available at http://blog.naver.com/aggie100.
Similar articles
-
Segmentation and intensity estimation of microarray images using a gamma-t mixture model.Bioinformatics. 2007 Feb 15;23(4):458-65. doi: 10.1093/bioinformatics/btl630. Epub 2006 Dec 12. Bioinformatics. 2007. PMID: 17166856
-
Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualization of high-dimensional data.IEEE Trans Pattern Anal Mach Intell. 2010 Jul;32(7):1298-309. doi: 10.1109/TPAMI.2009.149. IEEE Trans Pattern Anal Mach Intell. 2010. PMID: 20489231
-
Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm.Bioinformatics. 2006 Jan 1;22(1):58-67. doi: 10.1093/bioinformatics/bti746. Epub 2005 Oct 27. Bioinformatics. 2006. PMID: 16257984
-
Biclustering of gene expression data by an extension of mixtures of factor analyzers.Int J Biostat. 2008;4(1):Article 3. doi: 10.2202/1557-4679.1078. Int J Biostat. 2008. PMID: 22462105
-
Model-based clustering with gene ranking using penalized mixtures of heavy-tailed distributions.J Bioinform Comput Biol. 2013 Jun;11(3):1341007. doi: 10.1142/S0219720013410072. Epub 2013 Mar 21. J Bioinform Comput Biol. 2013. PMID: 23796184
Cited by
-
densityCut: an efficient and versatile topological approach for automatic clustering of biological data.Bioinformatics. 2016 Sep 1;32(17):2567-76. doi: 10.1093/bioinformatics/btw227. Epub 2016 Apr 23. Bioinformatics. 2016. PMID: 27153661 Free PMC article.
-
SMART: unique splitting-while-merging framework for gene clustering.PLoS One. 2014 Apr 8;9(4):e94141. doi: 10.1371/journal.pone.0094141. eCollection 2014. PLoS One. 2014. PMID: 24714159 Free PMC article.
-
Distributed Density Estimation Based on a Mixture of Factor Analyzers in a Sensor Network.Sensors (Basel). 2015 Aug 5;15(8):19047-68. doi: 10.3390/s150819047. Sensors (Basel). 2015. PMID: 26251903 Free PMC article.
-
Cancer subtype discovery and biomarker identification via a new robust network clustering algorithm.PLoS One. 2013 Jun 17;8(6):e66256. doi: 10.1371/journal.pone.0066256. Print 2013. PLoS One. 2013. PMID: 23799085 Free PMC article.
-
Statistical Significance of Clustering using Soft Thresholding.J Comput Graph Stat. 2015;24(4):975-993. doi: 10.1080/10618600.2014.948179. Epub 2015 Dec 10. J Comput Graph Stat. 2015. PMID: 26755893 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources