Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 May 9;187(10):2343-2358.
doi: 10.1016/j.cell.2024.03.009.

The future of rapid and automated single-cell data analysis using reference mapping

Affiliations
Review

The future of rapid and automated single-cell data analysis using reference mapping

Mohammad Lotfollahi et al. Cell. .

Abstract

As the number of single-cell datasets continues to grow rapidly, workflows that map new data to well-curated reference atlases offer enormous promise for the biological community. In this perspective, we discuss key computational challenges and opportunities for single-cell reference-mapping algorithms. We discuss how mapping algorithms will enable the integration of diverse datasets across disease states, molecular modalities, genetic perturbations, and diverse species and will eventually replace manual and laborious unsupervised clustering pipelines.

Keywords: cross-species comparisons; machine learning; multimodal analysis; reference mapping; single-cell analysis.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests M.L. consults Santa Ana Bio, worked as a part-time employee at Relation Therapeutics, and owns interests in Relation Therapeutics and is a part-time employee at AIVIVO. F.J.T. consults for Immunai Inc., CytoReason Ltd, Cellarity Inc, and Omniscope Ltd, and owns interests in Dermagnostix GmbH and Cellarity Inc. In the past 3 years, R.S. has worked as a consultant for Bristol-Myers Squibb, Regeneron, and Kallyope and served as a scientific advisory board member for ImmunAI, Resolve Biosciences, Nanostring, and the NYC Pandemic Response Lab. R.S. and Y.H. are co-founders and equity holders of Neptune Bio. As of August 1, 2023, Y.H. is an employee of Neptune Bio.

Figures

Fig1.
Fig1.. Automated analysis of single-cell data using reference mapping.
(a) Mapping RNA or DNA short reads to the reference genome using reference mappers as an alternative for computationally expensive de novo reference assembly. (b) Assembly of a single-cell reference - similar to a reference genome - enables automated analysis of newly generated query datasets by mapping them into the reference using a reference mapping algorithm. (c) Applications of single-cell reference mapping are automated cell-type annotation of query data (first row), analyzing single-cell perturbations such as disease states or missing perturbations to be imputed in the query data (second row), imputing continuous information for the query data including spatial location for scRNAseq using a spatial atlas or chromatin accessibility for query data using a multimodal reference including scRNAseq and scATACseq (third row).
Fig2.
Fig2.. Reference mapping at population scale.
(a) The availability of Cohort-level single-cell references enables the assembly of resources composed of many samples (or patients) to learn heterogeneity across populations and cells (b). (c) Query samples are mapped to both cell and sample-level representations. (d) After mapping the new samples leveraging cell embedding and supervised analysis, the disease phenotype for query samples can be classified (e.g. type of the tumor type). (e) Sample-level representation can infer sample-sample similarity maps between reference and query directly linked to cell-level representation. The circle represents a group of donors in query with different cellular compositions, as reflected in the reference embedding.
Fig3.
Fig3.. Single-cell data reference mapping across molecular modalities.
(a) Two frameworks to build cross modality feature correspondence. Feature conversion: transforming one type of measurement into another. For example, ATAC-seq peaks within gene bodies can be converted into gene activity scores, the same set of features measured by scRNA-seq. Multi-omics bridge: leveraging multi-omic datasets to establish connections between different modalities. For example, bridging ATAC-seq peaks and RNA-seq genes using datasets that measure both ATAC peaks and gene expression. (b) Expanding RNA reference to other query modalities using single-cell multi-omics datasets. By using single-cell multi-omics technologies as molecular bridges, RNA references can be expanded to include additional modalities such as DNA methylation (DNA met), ATAC peaks, surface proteins, CUT&Tag (cleavage under targets and tagmentation) , and Spatial data. snmC2T-seq, single-nucleus methylCytosine, Chromatin accessibility and Transcriptome sequencing; SNARE-seq, single-nucleus chromatin accessibility and mRNA expression sequencing; ASAP-seq, ATAC with select antigen profiling by sequencing; CITE-seq, cellular indexing of transcriptomes and epitopes by sequencing; Paired-Tag, parallel analysis of individual cells for RNA expression and DNA from targeted tagmentation by sequencing; CUT&Tag-Pro, single-cell cleavage under targets and tagmentation with cell surface proteins ; Spatial-CUT&Tag, spatial cleavage under targets and tagmentation.

References

    1. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001). - PubMed
    1. Xu C et al. Probabilistic harmonization and annotation of single-cell transcriptomics data with deep generative models. Mol. Syst. Biol. 17, e9620 (2021). - PMC - PubMed
    1. Lotfollahi M et al. Mapping single-cell data to reference atlases by transfer learning. Nat. Biotechnol. 40, 121–130 (2022). - PMC - PubMed
    1. Cao Z-J, Wei L, Lu S, Yang D-C & Gao G Searching large-scale scRNA-seq databases via unbiased cell embedding with Cell BLAST. Nat. Commun. 11, 1–13 (2020). - PMC - PubMed
    1. Kang JB et al. Efficient and precise single-cell reference atlas mapping with Symphony. Nat. Commun. 12, 5890 (2021). - PMC - PubMed

Publication types

LinkOut - more resources