This is a preprint.
Signal recovery in single cell batch integration
- PMID: 37215021
- PMCID: PMC10197537
- DOI: 10.1101/2023.05.05.539614
Signal recovery in single cell batch integration
Update in
-
Recovery of biological signals lost in single-cell batch integration with CellANOVA.Nat Biotechnol. 2024 Nov 26. doi: 10.1038/s41587-024-02463-1. Online ahead of print. Nat Biotechnol. 2024. PMID: 39592777
Abstract
Data integration to align cells across batches has become a cornerstone of single cell data analysis, critically affecting downstream results. Yet, how much biological signal is erased during integration? Currently, there are no guidelines for when the biological differences between samples are separable from batch effects, and thus, data integration usually involve a lot of guesswork: Cells across batches should be aligned to be "appropriately" mixed, while preserving "main cell type clusters". We show evidence that current paradigms for single cell data integration are unnecessarily aggressive, removing biologically meaningful variation. To remedy this, we present a novel statistical model and computationally scalable algorithm, CellANOVA, to recover biological signal that is lost during single cell data integration. CellANOVA utilizes a "pool-of-controls" design concept, applicable across diverse settings, to separate unwanted variation from biological variation of interest. When applied with existing integration methods, CellANOVA allows the recovery of subtle biological signals and corrects, to a large extent, the data distortion introduced by integration. Further, CellANOVA explicitly estimates cell- and gene-specific batch effect terms which can be used to identify the cell types and pathways exhibiting the largest batch variations, providing clarity as to which biological signals can be recovered. These concepts are illustrated on studies of diverse designs, where the biological signals that are recovered by CellANOVA are shown to be validated by orthogonal assays. In particular, we show that CellANOVA is effective in the challenging case of single-cell and single-nuclei data integration, where the recovered biological signals are replicated in an independent study.
Keywords: Batch effect; Data alignment; Data integration; Experimental design; RNA; Removing unwanted variation; Single cell.
Figures







References
-
- Hicks Stephanie C, Teng Mingxiang, Irizarry Rafael A, et al. On the widespread and critical impact of systematic bias and batch effects in single-cell rna-seq data. BioRxiv, 10:025528, 2015.
-
- Kang Hyun Min, Subramaniam Meena, Targ Sasha, Nguyen Michelle, Maliskova Lenka, McCarthy Elizabeth, Wan Eunice, Wong Simon, Byrnes Lauren, Lanata Cristina M, et al. Multiplexed droplet single-cell rna-sequencing using natural genetic variation. Nature biotechnology, 36(1):89–94, 2018. - PMC - PubMed
Publication types
Grants and funding
- UH3 DK114907/DK/NIDDK NIH HHS/United States
- UH3 DK114908/DK/NIDDK NIH HHS/United States
- U01 DK133090/DK/NIDDK NIH HHS/United States
- UH3 DK114920/DK/NIDDK NIH HHS/United States
- UG3 DK114907/DK/NIDDK NIH HHS/United States
- U01 DK114933/DK/NIDDK NIH HHS/United States
- U01 DK114908/DK/NIDDK NIH HHS/United States
- U01 DK133095/DK/NIDDK NIH HHS/United States
- UH3 DK114866/DK/NIDDK NIH HHS/United States
- U01 DK133081/DK/NIDDK NIH HHS/United States
- U01 DK114907/DK/NIDDK NIH HHS/United States
- UG3 DK114923/DK/NIDDK NIH HHS/United States
- U2C CA233285/CA/NCI NIH HHS/United States
- U01 DK114920/DK/NIDDK NIH HHS/United States
- UH3 DK114933/DK/NIDDK NIH HHS/United States
- U24 DK114886/DK/NIDDK NIH HHS/United States
- U01 DK133766/DK/NIDDK NIH HHS/United States
- U01 DK114923/DK/NIDDK NIH HHS/United States
- U01 DK133113/DK/NIDDK NIH HHS/United States
- UG3 DK114908/DK/NIDDK NIH HHS/United States
- R01 HG006137/HG/NHGRI NIH HHS/United States
- UG3 DK114866/DK/NIDDK NIH HHS/United States
- U01 DK114866/DK/NIDDK NIH HHS/United States
- UG3 DK114933/DK/NIDDK NIH HHS/United States
- UH3 DK114923/DK/NIDDK NIH HHS/United States
- U01 DK133768/DK/NIDDK NIH HHS/United States
- U01 DK133092/DK/NIDDK NIH HHS/United States
- UG3 DK114920/DK/NIDDK NIH HHS/United States
- U01 DK133091/DK/NIDDK NIH HHS/United States
- U01 DK133093/DK/NIDDK NIH HHS/United States
LinkOut - more resources
Full Text Sources