Penalized mixtures of factor analyzers with application to clustering high-dimensional microarray data

Benhuai Xie¹, Wei Pan, Xiaotong Shen

Affiliations

PMID: 20031967
PMCID: PMC2852217
DOI: 10.1093/bioinformatics/btp707

Penalized mixtures of factor analyzers with application to clustering high-dimensional microarray data

Benhuai Xie et al. Bioinformatics. 2010.

. 2010 Feb 15;26(4):501-8.

doi: 10.1093/bioinformatics/btp707. Epub 2009 Dec 23.

Authors

Benhuai Xie¹, Wei Pan, Xiaotong Shen

Affiliation

¹ Division of Biostatistics, School of Public Health and School of Statistics, University of Minnesota, Minneapolis, MN, USA.

PMID: 20031967
PMCID: PMC2852217
DOI: 10.1093/bioinformatics/btp707

Abstract

Motivation: Model-based clustering has been widely used, e.g. in microarray data analysis. Since for high-dimensional data variable selection is necessary, several penalized model-based clustering methods have been proposed tørealize simultaneous variable selection and clustering. However, the existing methods all assume that the variables are independent with the use of diagonal covariance matrices.

Results: To model non-independence of variables (e.g. correlated gene expressions) while alleviating the problem with the large number of unknown parameters associated with a general non-diagonal covariance matrix, we generalize the mixture of factor analyzers to that with penalization, which, among others, can effectively realize variable selection. We use simulated data and real microarray data to illustrate the utility and advantages of the proposed method over several existing ones.

PubMed Disclaimer

Figures

**Fig. 1.**
Clusters identified by PMFA (right panels) and PMND (left panels) from a dataset in simulation set-up 4. Three informative variables (X3, X6 and X16) were plotted. The five types of the symbols represent the cluster-memberships in the five clusters identified by PMND in left panels, while the two types of the symbols represent the true cluster memberships in right panels.

**Fig. 2.**
Survival curves for the clusters identified by PMFA and PMND for the lung cancer data.

See this image and copyright information in PMC

Cited by

Cancer subtype discovery and biomarker identification via a new robust network clustering algorithm.
Wu MY, Dai DQ, Zhang XF, Zhu Y. Wu MY, et al. PLoS One. 2013 Jun 17;8(6):e66256. doi: 10.1371/journal.pone.0066256. Print 2013. PLoS One. 2013. PMID: 23799085 Free PMC article.
Clustering High-Dimensional Landmark-based Two-dimensional Shape Data^‡.
Huang C, Styner M, Zhu H. Huang C, et al. J Am Stat Assoc. 2015 Nov 7;110(115):946-961. doi: 10.1080/01621459.2015.1034802. Epub 2015 Apr 16. J Am Stat Assoc. 2015. PMID: 26604425 Free PMC article.
Penalized model-based clustering with unconstrained covariance matrices.
Zhou H, Pan W, Shen X. Zhou H, et al. Electron J Stat. 2009 Jan 1;3:1473-1496. doi: 10.1214/09-EJS487. Electron J Stat. 2009. PMID: 20463857 Free PMC article.
Mixture of regressions with multivariate responses for discovering subtypes in Alzheimer's biomarkers with detection limits.
Tian G, Hanfelt J, Lah J, Risk BB. Tian G, et al. Data Sci Sci. 2024;3(1):2309403. doi: 10.1080/26941899.2024.2309403. Epub 2024 Mar 6. Data Sci Sci. 2024. PMID: 38680829 Free PMC article.

References

1. Baek J, McLachlan GJ. Mixtures of factor analyzers with common factor loadings for the clustering and visualisation of high-dimensional data. Isaac Newton Institute for Mathematical Sciences; 2008. Preprints. - PubMed
1. Baek J, et al. Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualisation of high-dimensional data. To appear in IEEE Transactions on Pattern Analysis and Machine Intelligence. 2009 Available at: http://www.maths.uq.edu.au/∼gjm/bmf_pami09.pdf/ - PubMed
1. Beer DG, et al. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat. Med. 2002;8:816–824. - PubMed
1. Dempster AP, et al. Maximum likelihood from incomplete data via the EM algorithm (with discussion) J. R. Stat. Soc. Series B. 1977;39:1–38.
1. Eisen M, et al. Cluster analysis and display of genome-wide expression patterns. Proc. Natl Acad. Sci. USA. 1998;95:14863–14868. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Penalized mixtures of factor analyzers with application to clustering high-dimensional microarray data

Affiliation

Penalized mixtures of factor analyzers with application to clustering high-dimensional microarray data

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources