Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar 27;23(3):100914.
doi: 10.1016/j.isci.2020.100914. Epub 2020 Feb 14.

scID Uses Discriminant Analysis to Identify Transcriptionally Equivalent Cell Types across Single-Cell RNA-Seq Data with Batch Effect

Affiliations

scID Uses Discriminant Analysis to Identify Transcriptionally Equivalent Cell Types across Single-Cell RNA-Seq Data with Batch Effect

Katerina Boufea et al. iScience. .

Abstract

The power of single-cell RNA sequencing (scRNA-seq) stems from its ability to uncover cell type-dependent phenotypes, which rests on the accuracy of cell type identification. However, resolving cell types within and, thus, comparison of scRNA-seq data across conditions is challenging owing to technical factors such as sparsity, low number of cells, and batch effect. To address these challenges, we developed scID (Single Cell IDentification), which uses the Fisher's Linear Discriminant Analysis-like framework to identify transcriptionally related cell types between scRNA-seq datasets. We demonstrate the accuracy and performance of scID relative to existing methods on several published datasets. By increasing power to identify transcriptionally similar cell types across datasets with batch effect, scID enhances investigator's ability to integrate and uncover development-, disease-, and perturbation-associated changes in scRNA-seq data.

Keywords: Bioinformatics; Biological Sciences; Mathematical Biosciences; Omics; Transcriptomics.

PubMed Disclaimer

Conflict of interest statement

Declaration of Interests The authors declare that they have no competing interest.

Figures

None
Graphical abstract
Figure 1
Figure 1
Overview and Assessment of scID (A) The three main stages involved in mapping cells across scRNA-seq data with scID are as follows: In stage 1, gene signatures are extracted from the reference data (shown as clustered groups on a reduced dimension). In stage 2, discriminative weights are estimated from the target data for each reference cluster-specific gene signature. In stage 3, every target cell is scored for each feature and is assigned to the corresponding reference cluster. (B) Quantification of accuracy of DPR classification (stage 2 of scID). Boxplot shows interquartile range for TPR (black) or FPR (white) for all the cell types in each published dataset listed in the x axis. See also Figure S1. (C) Quantification of TPR and FPR of stage 2 (black) and stage 3 (white) of scID. Significance was computed using two-sided paired Kruskal-Wallis test for difference in TPR or FPR between stage 2 and stage 3. (D) Assessment of accuracy of scID via self-mapping of published datasets. The indicated published data (x axis labels) were self-mapped, i.e., used as both reference and target, by scID and the assigned labels were compared with the published cell labels. (E) Assessment of classification accuracy of scRNA-seq data integration. Human pancreas Smart-seq2 data (Segerstolpe et al.) were used as reference and CEL-seq1 as target (white; Grun et al., 2016) or CEL-seq2 as target (black; Muraro et al., 2016). See also Figures S2 and S3.
Figure 2
Figure 2
Reference-Based Identification of Equivalent Cells across the Mouse Retinal Bipolar Neurons scRNA-Seq Datasets (A) t-SNE plot showing clusters in Drop-seq (reference) and Smart-seq2 (target) data of mouse retinal bipolar cells from Shekhar et al. Cluster membership of reference cells (∼26,800 cells) were taken from the publication. Smart-seq2 data (288 cells) were clustered using Seurat, and cluster names were assigned arbitrarily. (B) Heatmap showing Z score normalized average expression of gene signatures (row) in the clusters (column) of the reference Drop-seq data (left) and in the target Smart-seq2 data (right). Red (khakhi) indicates enrichment and blue (turquois) indicates depletion of the reference gene signature levels relative to average expression of gene signatures across all clusters of reference (target) data. (C) Identification of target (Smart-seq2) cells that are equivalent to reference (Drop-seq) clusters using marker-based approach. The top two differentially enriched (or marker) genes in each reference (Drop-seq) cluster were used to identify equivalent cells in the target (Smart-seq2) data using a thresholding approach. Bars represent percentage of classified and unassigned cells using various thresholds for normalized gene expression of the marker genes as indicated on the x axis. Gray represents the percentage of cells that express markers of multiple clusters, yellow represents the percentage of cells that can be unambiguously classified to a single cluster, and blue represents the percentage of cells that do not express markers of any of the clusters. These cells are referred to as orphans. X axis represents different thresholds of normalized gene expression (see Methods). (D) Assessment of accuracy of various methods methods for classifying target cells using Adjusted Rand Index. (E) Assessment of accuracy of various methods methods for classifying target cells using Variation of Information.
Figure 3
Figure 3
Reference-Based Identification of Equivalent Cells in Single Cell and Nuclei RNA-seq from Mouse Brain (A) t-SNE plot showing clusters in the mouse brain scRNA-seq data with ∼9,000 cells (left) and tSNE of the mouse brain single nuclei RNA-seq (snRNA-seq) data with ∼1,000 cells (right). Data were clustered with Seurat (v3). (B) Heatmap showing Z score normalized average expression of gene signatures (rows) in the reference (left) and in the target (right) clusters (columns). Red (khakhi) indicates enrichment and blue (turquois) indicates depletion of the gene signature levels relative to average expression of that gene signature across all clusters of reference (target) data. (C) Marker-based identification of cell types in target data. Data were binarized using different thresholds (see Methods) that represent expression value of each marker gene relative to the maximum. Two most differentially expressed markers from each reference cluster were used. Cells that express markers of multiple clusters (gray) are labeled as Ambiguous. Cells that only express markers of a single cluster (yellow) are labeled as Classified. Cells that do not express markers of any clusters (blue) are labeled as Orphans. (D) Assessment of accuracy of various methods methods for classifying target cells using Adjusted Rand Index. (E) Assessment of accuracy of various methods methods for classifying target cells using Variation of Information.

References

    1. Aggarwal C.C., Hinneburg A., Keim D.A. On the Surprising Behavior of Distance Metrics in High Dimensional Space. In: Van den Bussche J., Vianu V., editors. Database Theory — ICDT 2001. ICDT 2001. Lecture Notes in Computer Science, vol 1973. Springer, Berlin; Heidelberg: 2001. https://link.springer.com/chapter/10.1007/3-540-44503-X_27 - DOI
    1. Bacher R., Kendziorski C. Design and computational analysis of single-cell RNA-sequencing experiments. Genome Biol. 2016;17:63. - PMC - PubMed
    1. Baron M., Veres A., Wolock S.L., Faust A.L., Gaujoux R., Vetere A., Ryu J.H., Wagner B.K., Shen-Orr S.S., Klein A.M. A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure. Cell Syst. 2016;3:346–360.e4. - PMC - PubMed
    1. Butler A., Hoffman P., Smibert P., Papalexi E., Satija R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 2018;36:411–420. - PMC - PubMed
    1. Buttner M., Miao Z., Wolf F.A., Teichmann S.A., Theis F.J. A test metric for assessing single-cell RNA-seq batch correction. Nat. Methods. 2019;16:43–49. - PubMed

LinkOut - more resources