Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 6;122(18):e2416516122.
doi: 10.1073/pnas.2416516122. Epub 2025 Apr 28.

Integrating single-cell data with biological variables

Affiliations

Integrating single-cell data with biological variables

Yang Zhou et al. Proc Natl Acad Sci U S A. .

Abstract

Constructing single-cell atlases requires preserving differences attributable to biological variables, such as cell types, tissue origins, and disease states, while eliminating batch effects. However, existing methods are inadequate in explicitly modeling these biological variables. Here, we introduce SIGNAL, a general framework that leverages biological variables to disentangle biological and technical effects, thereby linking these metadata to data integration. SIGNAL employs a variant of principal component analysis to align multiple batches, enabling the integration of 1 million cells in approximately 2 min. SIGNAL, despite its computational simplicity, surpasses state-of-the-art methods across multiple integration scenarios: 1) heterogeneous datasets, 2) cross-species datasets, 3) simulated datasets, 4) integration on low-quality cell annotations, and 5) reference-based integration. Furthermore, we demonstrate that SIGNAL accurately transfers knowledge from reference to query datasets. Notably, we propose a self-adjustment strategy to restore annotated cell labels potentially distorted during integration. Finally, we apply SIGNAL to multiple large-scale atlases, including a human heart cell atlas containing 2.7 million cells, identifying tissue- and developmental stage-specific subtypes, as well as condition-specific cell states. This underscores SIGNAL's exceptional capability in multiscale analysis.

Keywords: data integration; knowledge transfer; principal component analysis; single-cell data; technical variation.

PubMed Disclaimer

Conflict of interest statement

Competing interests statement:The authors declare no competing interest.

Similar articles

References

    1. Nowotschin S., et al. , The emergent landscape of the mouse gut endoderm at single-cell resolution. Nature 569, 361–367 (2019). - PMC - PubMed
    1. Chaffin M., et al. , Single-nucleus profiling of human dilated and hypertrophic cardiomyopathy. Nature 608, 174–180 (2022). - PubMed
    1. Suo C., et al. , Mapping the developing human immune system across organs. Science 376, eabo0510 (2022). - PMC - PubMed
    1. Luecken M. D., et al. , Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19, 41–50 (2022). - PMC - PubMed
    1. Korsunsky I., et al. , Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16, 1289–1296 (2019). - PMC - PubMed

LinkOut - more resources