Mixtures of common t-factor analyzers for clustering high-dimensional microarray data

Jangsun Baek¹, Geoffrey J McLachlan

Affiliations

PMID: 21372081
DOI: 10.1093/bioinformatics/btr112

Mixtures of common t-factor analyzers for clustering high-dimensional microarray data

Jangsun Baek et al. Bioinformatics. 2011.

. 2011 May 1;27(9):1269-76.

doi: 10.1093/bioinformatics/btr112. Epub 2011 Mar 3.

Authors

Jangsun Baek¹, Geoffrey J McLachlan

Affiliation

¹ Department of Statistics, Chonnam National University, Gwangju, South Korea. jbaek@jnu.ac.kr

PMID: 21372081
DOI: 10.1093/bioinformatics/btr112

Abstract

Motivation: Mixtures of factor analyzers enable model-based clustering to be undertaken for high-dimensional microarray data, where the number of observations n is small relative to the number of genes p. Moreover, when the number of clusters is not small, for example, where there are several different types of cancer, there may be the need to reduce further the number of parameters in the specification of the component-covariance matrices. A further reduction can be achieved by using mixtures of factor analyzers with common component-factor loadings (MCFA), which is a more parsimonious model. However, this approach is sensitive to both non-normality and outliers, which are commonly observed in microarray experiments. This sensitivity of the MCFA approach is due to its being based on a mixture model in which the multivariate normal family of distributions is assumed for the component-error and factor distributions.

Results: An extension to mixtures of t-factor analyzers with common component-factor loadings is considered, whereby the multivariate t-family is adopted for the component-error and factor distributions. An EM algorithm is developed for the fitting of mixtures of common t-factor analyzers. The model can handle data with tails longer than that of the normal distribution, is robust against outliers and allows the data to be displayed in low-dimensional plots. It is applied here to both synthetic data and some microarray gene expression data for clustering and shows its better performance over several existing methods.

Availability: The algorithms were implemented in Matlab. The Matlab code is available at http://blog.naver.com/aggie100.

PubMed Disclaimer

Cited by

densityCut: an efficient and versatile topological approach for automatic clustering of biological data.
Ding J, Shah S, Condon A. Ding J, et al. Bioinformatics. 2016 Sep 1;32(17):2567-76. doi: 10.1093/bioinformatics/btw227. Epub 2016 Apr 23. Bioinformatics. 2016. PMID: 27153661 Free PMC article.
SMART: unique splitting-while-merging framework for gene clustering.
Fa R, Roberts DJ, Nandi AK. Fa R, et al. PLoS One. 2014 Apr 8;9(4):e94141. doi: 10.1371/journal.pone.0094141. eCollection 2014. PLoS One. 2014. PMID: 24714159 Free PMC article.
Distributed Density Estimation Based on a Mixture of Factor Analyzers in a Sensor Network.
Wei X, Li C, Zhou L, Zhao L. Wei X, et al. Sensors (Basel). 2015 Aug 5;15(8):19047-68. doi: 10.3390/s150819047. Sensors (Basel). 2015. PMID: 26251903 Free PMC article.
Cancer subtype discovery and biomarker identification via a new robust network clustering algorithm.
Wu MY, Dai DQ, Zhang XF, Zhu Y. Wu MY, et al. PLoS One. 2013 Jun 17;8(6):e66256. doi: 10.1371/journal.pone.0066256. Print 2013. PLoS One. 2013. PMID: 23799085 Free PMC article.
Statistical Significance of Clustering using Soft Thresholding.
Huang H, Liu Y, Yuan M, Marron JS. Huang H, et al. J Comput Graph Stat. 2015;24(4):975-993. doi: 10.1080/10618600.2014.948179. Epub 2015 Dec 10. J Comput Graph Stat. 2015. PMID: 26755893 Free PMC article.

See all "Cited by" articles

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Ovid Technologies, Inc.
- Silverchair Information Systems

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Mixtures of common t-factor analyzers for clustering high-dimensional microarray data

Affiliation

Mixtures of common t-factor analyzers for clustering high-dimensional microarray data

Authors

Affiliation

Abstract

Similar articles

Cited by

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources