Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Aug;16(8):695-698.
doi: 10.1038/s41592-019-0466-z. Epub 2019 Jul 15.

Joint analysis of heterogeneous single-cell RNA-seq dataset collections

Affiliations

Joint analysis of heterogeneous single-cell RNA-seq dataset collections

Nikolas Barkas et al. Nat Methods. 2019 Aug.

Abstract

Single-cell RNA sequencing is often applied in study designs that include multiple individuals, conditions or tissues. To identify recurrent cell subpopulations in such heterogeneous collections, we developed Conos, an approach that relies on multiple plausible inter-sample mappings to construct a global graph connecting all measured cells. The graph enables identification of recurrent cell clusters and propagation of information between datasets in multi-sample or atlas-scale collections.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. Joint graph is an effective strategy for assembling diverse scRNA-seq dataset collections.
a. Conos builds joint graph by comparing all pairs of datasets. Reduced space (e.g. CPCA) is determined for each pair and the putative inter-sample edges are established using mutual-nearest neighbor mapping. Low-weight within-sample edges are also included in the graph. Subpopulations of cells recurrent within the dataset collection form clique-like communities of inter-sample edges within the joint graph. b. Joint graph combining eight human bone marrow and eight cord blood datasets is visualized using largeVis embedding. c. Visualization of each individual sample on the joint embedding. d. Adjusted Rand index (y-axis) is shown as a function of the fraction of cells omitted from the datasets (x-axis) relative to the full dataset for different joint clustering approaches. Conos shows improved stability of subpopulation detection even for small numbers of cells. e. Stability of the subpopulation detection is shown for increasing amount of heterogeneity between datasets. Adjusted Rand index is shown for increasing probability of random subpopulation omission from individual datasets (x-axis, see Methods). f,g. Mixing of different bone marrow (h) and cord blood (i) datasets within the identified subpopulations is quantified using normalized average cluster entropy (see Methods). h. The power to detect cell subpopulations increases with the size of the collection. The number of stable clusters (y axis, see Methods) detected in a collection of human bone marrow samples (red curve) increases as more samples are added to the collection (x-axis), while maintaining high level of sample mixing (high average cluster entropy) within each cluster. In contrast, addition of randomized expression datasets (grey) does not result in such increase. d-h: Mean across n=10 random replicates is shown for each point, with shading marking the 95% confidence band.
Figure 2.
Figure 2.. Examples of analyses using joint graphs.
a-e. Trade-off between cluster resolution and sample breadth. Joint graph is shown for n=15 samples from eight breast cancer patients (a). The distribution of source tissues (b). A fragment of the subpopulation hierarchy is shown for T cells subsets (d), with color of the branches showing tissue composition, and width showing normalized sample entropy (higher entropy corresponds to more samples contributing to the branch). Depending on the level, a cut of the cluster hierarchy can yield more granular but tissue-specific clusters (c) or less granular clusters that incorporate more tissues and samples (e). f-i. Propagation of cell annotation labels. Joint embedding of bone marrow samples from n=8 patients is shown (f). The annotations were erased from all but one sample, and propagated back to the entire dataset. Positions of the incorrectly propagated labels (g). Uncertainty of propagation, reported by Conos (h). Reported uncertainty of correctly and incorrectly propagated labels (i). j-k. Conos integration of the Tabula Muris and Han et al. mouse atlases. Joint graph of the 127 datasets is, with colors and numbers marking top-level joint clusters (j) or scRNA-seq platforms (k).

References

    1. Tabula Muris C et al. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature 562, 367–372 (2018). - PMC - PubMed
    1. Regev A et al. The Human Cell Atlas. Elife 6 (2017). - PMC - PubMed
    1. Hicks SC, Townes FW, Teng M & Irizarry RA Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics 19, 562–578 (2018). - PMC - PubMed
    1. McCarthy DJ, Campbell KR, Lun AT & Wills QF Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics 33, 1179–1186 (2017). - PMC - PubMed
    1. Butler A, Hoffman P, Smibert P, Papalexi E & Satija R Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol 36, 411–420 (2018). - PMC - PubMed

Methods-only References

    1. Xin Y et al. Pseudotime Ordering of Single Human beta-Cells Reveals States of Insulin Production and Unfolded Protein Response. Diabetes 67, 1783–1794 (2018). - PubMed
    1. Baron M et al. A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure. Cell Syst 3, 346–360 e344 (2016). - PMC - PubMed
    1. Segerstolpe A et al. Single-Cell Transcriptome Profiling of Human Pancreatic Islets in Health and Type 2 Diabetes. Cell Metab 24, 593–607 (2016). - PMC - PubMed

Publication types

MeSH terms