Multi-way overlapping clustering by Bayesian tensor decomposition
- PMID: 39713480
- PMCID: PMC11661849
- DOI: 10.4310/23-sii790
Multi-way overlapping clustering by Bayesian tensor decomposition
Abstract
The development of modern sequencing technologies provides great opportunities to measure gene expression of multiple tissues from different individuals. The three-way variation across genes, tissues, and individuals makes statistical inference a challenging task. In this paper, we propose a Bayesian multi-way clustering approach to cluster genes, tissues, and individuals simultaneously. The proposed model adaptively trichotomizes the observed data into three latent categories and uses a Bayesian hierarchical construction to further decompose the latent variables into lower-dimensional features, which can be interpreted as overlapping clusters. With a Bayesian nonparametric prior, i.e., the Indian buffet process, our method determines the cluster number automatically. The utility of our approach is demonstrated through simulation studies and an application to the Genotype-Tissue Expression (GTEx) RNA-seq data. The clustering result reveals some interesting findings about depression-related genes in human brain, which are also consistent with biological domain knowledge. The detailed algorithm and some numerical results are available in the online Supplementary Material, http://intlpress.com/site/pub/files/-supp/sii/2024/0017/0002/sii-2024-0017-0002-s001.pdf.
Keywords: Bayesian nonparametric prior; Gene expression data; Indian buffet process; Low-rank tensor; Mixture model; Primary 62H30; secondary 62F15.
Figures
References
-
- Aldred EM (2009). Pharmacology: A handbook for complementary healthcare professionals. Elsevier, Amsterdam, Netherlands.
-
- Banerjee A, Krumpelman C, Ghosh J, Basu S and Mooney RJ (2005). Model-based overlapping clustering. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining 532–537.
-
- Beal S (1976). Fisher’s hypergeometric test for a comparison in a finite population. The American Statistician 30 165–168.
-
- Beck AT and Greenberg RL (1979). Coping with depression. Institute for Rational Living, New York.
-
- Bergmann S, Ihmels J and Barkai N (2003). Iterative signature algorithm for the analysis of large-scale gene expression data. Physical Review E 67 031902. - PubMed
Grants and funding
LinkOut - more resources
Full Text Sources