Integrating single-cell data with biological variables
- PMID: 40294274
- PMCID: PMC12067276
- DOI: 10.1073/pnas.2416516122
Integrating single-cell data with biological variables
Abstract
Constructing single-cell atlases requires preserving differences attributable to biological variables, such as cell types, tissue origins, and disease states, while eliminating batch effects. However, existing methods are inadequate in explicitly modeling these biological variables. Here, we introduce SIGNAL, a general framework that leverages biological variables to disentangle biological and technical effects, thereby linking these metadata to data integration. SIGNAL employs a variant of principal component analysis to align multiple batches, enabling the integration of 1 million cells in approximately 2 min. SIGNAL, despite its computational simplicity, surpasses state-of-the-art methods across multiple integration scenarios: 1) heterogeneous datasets, 2) cross-species datasets, 3) simulated datasets, 4) integration on low-quality cell annotations, and 5) reference-based integration. Furthermore, we demonstrate that SIGNAL accurately transfers knowledge from reference to query datasets. Notably, we propose a self-adjustment strategy to restore annotated cell labels potentially distorted during integration. Finally, we apply SIGNAL to multiple large-scale atlases, including a human heart cell atlas containing 2.7 million cells, identifying tissue- and developmental stage-specific subtypes, as well as condition-specific cell states. This underscores SIGNAL's exceptional capability in multiscale analysis.
Keywords: data integration; knowledge transfer; principal component analysis; single-cell data; technical variation.
Conflict of interest statement
Competing interests statement:The authors declare no competing interest.
Similar articles
-
Beaconet: A Reference-Free Method for Integrating Multiple Batches of Single-Cell Transcriptomic Data in Original Molecular Space.Adv Sci (Weinh). 2024 Jul;11(26):e2306770. doi: 10.1002/advs.202306770. Epub 2024 May 6. Adv Sci (Weinh). 2024. PMID: 38711214 Free PMC article.
-
UniMap: Type-Level Integration Enhances Biological Preservation and Interpretability in Single-Cell Annotation.Adv Sci (Weinh). 2025 Apr;12(16):e2410790. doi: 10.1002/advs.202410790. Epub 2025 Feb 27. Adv Sci (Weinh). 2025. PMID: 40013940 Free PMC article.
-
Benchmarking atlas-level data integration in single-cell genomics.Nat Methods. 2022 Jan;19(1):41-50. doi: 10.1038/s41592-021-01336-8. Epub 2021 Dec 23. Nat Methods. 2022. PMID: 34949812 Free PMC article.
-
Considerations for building and using integrated single-cell atlases.Nat Methods. 2025 Jan;22(1):41-57. doi: 10.1038/s41592-024-02532-y. Epub 2024 Dec 13. Nat Methods. 2025. PMID: 39672979 Review.
-
The future of rapid and automated single-cell data analysis using reference mapping.Cell. 2024 May 9;187(10):2343-2358. doi: 10.1016/j.cell.2024.03.009. Cell. 2024. PMID: 38729109 Free PMC article. Review.
References
-
- Chaffin M., et al. , Single-nucleus profiling of human dilated and hypertrophic cardiomyopathy. Nature 608, 174–180 (2022). - PubMed
MeSH terms
Grants and funding
- No. 62271173 and No. 62172122/MOST | National Natural Science Foundation of China (NSFC)
- No. 2022ZX01A19/Key Research and Development Program of Heilongjiang ()
- No. JQ2023A003/| Natural Science Foundation of Heilongjiang Province (Heilongjiang Natural Science Foundation)
- No. 124B2027/MOST | National Natural Science Foundation of China (NSFC)
- HIT.DZJJ.2024043/MOE | Fundamental Research Funds for the Central Universities (Fundamental Research Fund for the Central Universities)
LinkOut - more resources
Full Text Sources