This is a preprint.
CHOIR improves significance-based detection of cell types and states from single-cell data
- PMID: 38328105
- PMCID: PMC10849522
- DOI: 10.1101/2024.01.18.576317
CHOIR improves significance-based detection of cell types and states from single-cell data
Update in
-
CHOIR improves significance-based detection of cell types and states from single-cell data.Nat Genet. 2025 May;57(5):1309-1319. doi: 10.1038/s41588-025-02148-8. Epub 2025 Apr 7. Nat Genet. 2025. PMID: 40195561
Abstract
Clustering is a critical step in the analysis of single-cell data, as it enables the discovery and characterization of putative cell types and states. However, most popular clustering tools do not subject clustering results to statistical inference testing, leading to risks of overclustering or underclustering data and often resulting in ineffective identification of cell types with widely differing prevalence. To address these challenges, we present CHOIR (clustering hierarchy optimization by iterative random forests), which applies a framework of random forest classifiers and permutation tests across a hierarchical clustering tree to statistically determine which clusters represent distinct populations. We demonstrate the enhanced performance of CHOIR through extensive benchmarking against 14 existing clustering methods across 100 simulated and 4 real single-cell RNA-seq, ATAC-seq, spatial transcriptomic, and multi-omic datasets. CHOIR can be applied to any single-cell data type and provides a flexible, scalable, and robust solution to the important challenge of identifying biologically relevant cell groupings within heterogeneous single-cell data.
Conflict of interest statement
COMPETING INTERESTS The authors declare no competing interests.
Figures








Similar articles
-
CHOIR improves significance-based detection of cell types and states from single-cell data.Nat Genet. 2025 May;57(5):1309-1319. doi: 10.1038/s41588-025-02148-8. Epub 2025 Apr 7. Nat Genet. 2025. PMID: 40195561
-
IDclust: Iterative clustering for unsupervised identification of cell types with single cell transcriptomics and epigenomics.NAR Genom Bioinform. 2024 Dec 18;6(4):lqae174. doi: 10.1093/nargab/lqae174. eCollection 2024 Dec. NAR Genom Bioinform. 2024. PMID: 39703425 Free PMC article.
-
Significance analysis for clustering with single-cell RNA-sequencing data.Nat Methods. 2023 Aug;20(8):1196-1202. doi: 10.1038/s41592-023-01933-9. Epub 2023 Jul 10. Nat Methods. 2023. PMID: 37429993 Free PMC article.
-
Evaluation of single-cell classifiers for single-cell RNA sequencing data sets.Brief Bioinform. 2020 Sep 25;21(5):1581-1595. doi: 10.1093/bib/bbz096. Brief Bioinform. 2020. PMID: 31675098 Free PMC article. Review.
-
Review of single-cell RNA-seq data clustering for cell-type identification and characterization.RNA. 2023 May;29(5):517-530. doi: 10.1261/rna.078965.121. Epub 2023 Feb 3. RNA. 2023. PMID: 36737104 Free PMC article. Review.
References
-
- Blondel V. D., Guillaume J. L., Lambiotte R. & Lefebvre E. Fast unfolding of communities in large networks. J. Stat. Mech. P10008, 1–12 (2008).
-
- Kiselev V. Y., Andrews T. S. & Hemberg M. Challenges in unsupervised clustering of single-cell RNA-seq data. Nat. Rev. Genet. 20, 273–282 (2019). - PubMed
-
- Herman J. S., Sagar & Grun, D. FateID infers cell fate bias in multipotent progenitors from single-cell RNA-seq data. Nat. Methods 15, 379–386 (2018). - PubMed
METHODS-ONLY REFERENCES
Publication types
Grants and funding
LinkOut - more resources
Full Text Sources