Subject clustering by IF-PCA and several recent methods
- PMID: 37287536
- PMCID: PMC10242062
- DOI: 10.3389/fgene.2023.1166404
Subject clustering by IF-PCA and several recent methods
Abstract
Subject clustering (i.e., the use of measured features to cluster subjects, such as patients or cells, into multiple groups) is a problem of significant interest. In recent years, many approaches have been proposed, among which unsupervised deep learning (UDL) has received much attention. Two interesting questions are 1) how to combine the strengths of UDL and other approaches and 2) how these approaches compare to each other. We combine the variational auto-encoder (VAE), a popular UDL approach, with the recent idea of influential feature-principal component analysis (IF-PCA) and propose IF-VAE as a new method for subject clustering. We study IF-VAE and compare it with several other methods (including IF-PCA, VAE, Seurat, and SC3) on 10 gene microarray data sets and eight single-cell RNA-seq data sets. We find that IF-VAE shows significant improvement over VAE, but still underperforms compared to IF-PCA. We also find that IF-PCA is quite competitive, slightly outperforming Seurat and SC3 over the eight single-cell data sets. IF-PCA is conceptually simple and permits delicate analysis. We demonstrate that IF-PCA is capable of achieving phase transition in a rare/weak model. Comparatively, Seurat and SC3 are more complex and theoretically difficult to analyze (for these reasons, their optimality remains unclear).
Keywords: PCA; ScRNA-seq; feature selection; gene microarray; higher criticism threshold; sparsity; subject clustering; variational.
Copyright © 2023 Chen, Jin and Ke.
Conflict of interest statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Figures


Similar articles
-
scVAG: Unified single-cell clustering via variational-autoencoder integration with Graph Attention Autoencoder.Heliyon. 2024 Nov 27;10(23):e40732. doi: 10.1016/j.heliyon.2024.e40732. eCollection 2024 Dec 15. Heliyon. 2024. PMID: 39687165 Free PMC article.
-
Unsupervised Deep Learning based Variational Autoencoder Model for COVID-19 Diagnosis and Classification.Pattern Recognit Lett. 2021 Nov;151:267-274. doi: 10.1016/j.patrec.2021.08.018. Epub 2021 Sep 22. Pattern Recognit Lett. 2021. PMID: 34566223 Free PMC article.
-
scDCCA: deep contrastive clustering for single-cell RNA-seq data based on auto-encoder network.Brief Bioinform. 2023 Jan 19;24(1):bbac625. doi: 10.1093/bib/bbac625. Brief Bioinform. 2023. PMID: 36631401
-
Decoding regulatory structures and features from epigenomics profiles: A Roadmap-ENCODE Variational Auto-Encoder (RE-VAE) model.Methods. 2021 May;189:44-53. doi: 10.1016/j.ymeth.2019.10.012. Epub 2019 Oct 28. Methods. 2021. PMID: 31672653 Free PMC article.
-
Machine learning and statistical methods for clustering single-cell RNA-sequencing data.Brief Bioinform. 2020 Jul 15;21(4):1209-1223. doi: 10.1093/bib/bbz063. Brief Bioinform. 2020. PMID: 31243426 Review.
References
-
- Abramovich F., Benjamini Y., Donoho D., Johnstone I. (2006). Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statistics 34, 584–653. 10.1214/009053606000000074 - DOI
-
- Arthur D., Vassilvitskii S. (2007). “k-means++: The advantages of careful seeding,” in Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, New Orleans, Louisiana, USA, January 7-9, 2007, 1027–1035.
-
- Cai T. T., Ma R. (2022). Theoretical foundations of t-sne for visualizing high-dimensional clustered data. J. Mach. Learn. Resarch 23, 1–54.
LinkOut - more resources
Full Text Sources
Miscellaneous