Multi-way overlapping clustering by Bayesian tensor decomposition
- PMID: 39713480
- PMCID: PMC11661849
- DOI: 10.4310/23-sii790
Multi-way overlapping clustering by Bayesian tensor decomposition
Abstract
The development of modern sequencing technologies provides great opportunities to measure gene expression of multiple tissues from different individuals. The three-way variation across genes, tissues, and individuals makes statistical inference a challenging task. In this paper, we propose a Bayesian multi-way clustering approach to cluster genes, tissues, and individuals simultaneously. The proposed model adaptively trichotomizes the observed data into three latent categories and uses a Bayesian hierarchical construction to further decompose the latent variables into lower-dimensional features, which can be interpreted as overlapping clusters. With a Bayesian nonparametric prior, i.e., the Indian buffet process, our method determines the cluster number automatically. The utility of our approach is demonstrated through simulation studies and an application to the Genotype-Tissue Expression (GTEx) RNA-seq data. The clustering result reveals some interesting findings about depression-related genes in human brain, which are also consistent with biological domain knowledge. The detailed algorithm and some numerical results are available in the online Supplementary Material, http://intlpress.com/site/pub/files/-supp/sii/2024/0017/0002/sii-2024-0017-0002-s001.pdf.
Keywords: Bayesian nonparametric prior; Gene expression data; Indian buffet process; Low-rank tensor; Mixture model; Primary 62H30; secondary 62F15.
Figures





Similar articles
-
A Unified Bayesian Framework for Bi-overlapping-Clustering Multi-omics Data via Sparse Matrix Factorization.Stat Biosci. 2023 Dec;15(3):669-691. doi: 10.1007/s12561-022-09350-w. Epub 2022 Jul 8. Stat Biosci. 2023. PMID: 38179127 Free PMC article.
-
THREE-WAY CLUSTERING OF MULTI-TISSUE MULTI-INDIVIDUAL GENE EXPRESSION DATA USING SEMI-NONNEGATIVE TENSOR DECOMPOSITION.Ann Appl Stat. 2019 Jun;13(2):1103-1127. doi: 10.1214/18-aoas1228. Epub 2019 Jun 17. Ann Appl Stat. 2019. PMID: 33381253 Free PMC article.
-
Consensus Monte Carlo for Random Subsets using Shared Anchors.J Comput Graph Stat. 2020;29(4):703-714. doi: 10.1080/10618600.2020.1737085. Epub 2020 Apr 15. J Comput Graph Stat. 2020. PMID: 33456293 Free PMC article.
-
Biomarker detection and categorization in ribonucleic acid sequencing meta-analysis using Bayesian hierarchical models.J R Stat Soc Ser C Appl Stat. 2017 Aug;66(4):847-867. doi: 10.1111/rssc.12199. Epub 2016 Dec 16. J R Stat Soc Ser C Appl Stat. 2017. PMID: 28785119 Free PMC article.
-
Bayesian cluster analysis.Philos Trans A Math Phys Eng Sci. 2023 May 15;381(2247):20220149. doi: 10.1098/rsta.2022.0149. Epub 2023 Mar 27. Philos Trans A Math Phys Eng Sci. 2023. PMID: 36970819 Free PMC article. Review.
References
-
- Aldred EM (2009). Pharmacology: A handbook for complementary healthcare professionals. Elsevier, Amsterdam, Netherlands.
-
- Banerjee A, Krumpelman C, Ghosh J, Basu S and Mooney RJ (2005). Model-based overlapping clustering. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining 532–537.
-
- Beal S (1976). Fisher’s hypergeometric test for a comparison in a finite population. The American Statistician 30 165–168.
-
- Beck AT and Greenberg RL (1979). Coping with depression. Institute for Rational Living, New York.
-
- Bergmann S, Ihmels J and Barkai N (2003). Iterative signature algorithm for the analysis of large-scale gene expression data. Physical Review E 67 031902. - PubMed
Grants and funding
LinkOut - more resources
Full Text Sources