Recovery of biological signals lost in single-cell batch integration with CellANOVA
- PMID: 39592777
- DOI: 10.1038/s41587-024-02463-1
Recovery of biological signals lost in single-cell batch integration with CellANOVA
Abstract
Data integration to align cells across batches has become a cornerstone of single-cell data analysis, critically affecting downstream results. Currently, there are no guidelines for when the biological differences between samples are separable from batch effects. Here we show that current paradigms for single-cell data integration remove biologically meaningful variation and introduce distortion. We present a statistical model and computationally scalable algorithm, CellANOVA (cell state space analysis of variance), that harnesses experimental design to explicitly recover biological signals that are erased during single-cell data integration. CellANOVA uses a 'pool-of-controls' design concept, applicable across diverse settings, to separate unwanted variation from biological variation of interest and allow the recovery of subtle biological signals. We apply CellANOVA to diverse contexts and validate the recovered biological signals by orthogonal assays. In particular, we show that CellANOVA is effective in the challenging case of single-cell and single-nucleus data integration, where it recovers subtle biological signals that can be validated and replicated by external data.
© 2024. The Author(s), under exclusive licence to Springer Nature America, Inc.
Conflict of interest statement
Competing interests: The authors declare no competing interests.
Update of
-
Signal recovery in single cell batch integration.bioRxiv [Preprint]. 2023 Sep 23:2023.05.05.539614. doi: 10.1101/2023.05.05.539614. bioRxiv. 2023. Update in: Nat Biotechnol. 2024 Nov 26. doi: 10.1038/s41587-024-02463-1. PMID: 37215021 Free PMC article. Updated. Preprint.
References
Grants and funding
- DMS/NIGMS 2245575/National Science Foundation (NSF)
- DMS/NIGMS 2245575/National Science Foundation (NSF)
- DMS/NIGMS 2245575/National Science Foundation (NSF)
- 5R01GM125301/U.S. Department of Health & Human Services | NIH | Center for Information Technology (Center for Information Technology, National Institutes of Health)
- R01-HG006137/U.S. Department of Health & Human Services | NIH | Center for Information Technology (Center for Information Technology, National Institutes of Health)
- U2C1335 CA233285/U.S. Department of Health & Human Services | NIH | Center for Information Technology (Center for Information Technology, National Institutes of Health)
- 5R01DK087635/U.S. Department of Health & Human Services | NIH | Center for Information Technology (Center for Information Technology, National Institutes of Health)
- CSGF/U.S. Department of Energy (DOE)
LinkOut - more resources
Full Text Sources
Molecular Biology Databases