Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Aug;19(8):2283-2297.
doi: 10.1038/s41596-024-00991-3. Epub 2024 Jun 6.

Scanorama: integrating large and diverse single-cell transcriptomic datasets

Affiliations
Review

Scanorama: integrating large and diverse single-cell transcriptomic datasets

Brian L Hie et al. Nat Protoc. 2024 Aug.

Abstract

Merging diverse single-cell RNA sequencing (scRNA-seq) data from numerous experiments, laboratories and technologies can uncover important biological insights. Nonetheless, integrating scRNA-seq data encounters special challenges when the datasets are composed of diverse cell type compositions. Scanorama offers a robust solution for improving the quality and interpretation of heterogeneous scRNA-seq data by effectively merging information from diverse sources. Scanorama is designed to address the technical variation introduced by differences in sample preparation, sequencing depth and experimental batches that can confound the analysis of multiple scRNA-seq datasets. Here we provide a detailed protocol for using Scanorama within a Scanpy-based single-cell analysis workflow coupled with Google Colaboratory, a cloud-based free Jupyter notebook environment service. The protocol involves Scanorama integration, a process that typically spans 0.5-3 h. Scanorama integration requires a basic understanding of cellular biology, transcriptomic technologies and bioinformatics. Our protocol and new Scanorama-Colaboratory resource should make scRNA-seq integration more widely accessible to researchers.

PubMed Disclaimer

Conflict of interest statement

Competing interests

The authors declare no competing interests.

Figures

Figure 1.
Figure 1.. The Scanorama-Colab workflow.
a, Integrating multiple single-cell RNA sequencing (scRNA-seq) datasets in a Colaboratory Python environment. Input datasets can be from public sources or user data. Scanorama conducts integration and batch correction, identifying shared cell types among batches by searching for nearest neighbors and aligns them into a shared space. b, Small dataset example: three datasets consisting of Jurkat cells, 293T cells, and a mixed population of these two cell types were used as the inputs and visualized using t-distributed stochastic neighbor embedding (t-SNE) before and after Scanorama correction. (ii-iii) Scanorama was able to accurately distinguish between Jurkat cells and 293T cells, originating from different batches (indicated by orange, green and blue) as distinct clusters (indicated by orange and blue). c, Large dataset 1 example: the Scanorama-integrated 26 single-cell datasets from 9 different technologies are visualized using t-SNE and clustered by cell types.
Figure 2.
Figure 2.. Panoramic integration of three organ single-cell datasets across seven donors and two different technologies.
a–d, The scRNA-seq datasets of three organs (a), seven donors (b), and two different technologies (c) from Tabula Sapiens were visualized using uniform manifold approximation and projection (UMAP), both before and after Scanorama correction. Cell clusters were grouped by cell type instead of batch factors such as organ, donor, and methods (d). e, Mutual information score between Leiden cluster labels and metadata (organ, donor, method, and cell types) after Scanorama correction.

Similar articles

Cited by

References

    1. Tabula Sapiens, C. et al. The Tabula Sapiens: A multiple-organ, single-cell transcriptomic atlas of humans. Science 376, eabl4896 (2022). - PMC - PubMed
    1. Eraslan G et al. Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function. Science 376, eabl4290 (2022). - PMC - PubMed
    1. Luecken MD et al. Benchmarking atlas-level data integration in single-cell genomics. Nat Methods 19, 41–50 (2022). - PMC - PubMed
    1. Haghverdi L, Lun ATL, Morgan MD & Marioni JC Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat Biotechnol 36, 421–427 (2018). - PMC - PubMed
    1. Butler A, Hoffman P, Smibert P, Papalexi E & Satija R Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol 36, 411–420 (2018). - PMC - PubMed

Publication types