Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jun 16;5(2):lqad058.
doi: 10.1093/nargab/lqad058. eCollection 2023 Jun.

scROSHI: robust supervised hierarchical identification of single cells

Collaborators, Affiliations

scROSHI: robust supervised hierarchical identification of single cells

Michael Prummer et al. NAR Genom Bioinform. .

Abstract

Identifying cell types based on expression profiles is a pillar of single cell analysis. Existing machine-learning methods identify predictive features from annotated training data, which are often not available in early-stage studies. This can lead to overfitting and inferior performance when applied to new data. To address these challenges we present scROSHI, which utilizes previously obtained cell type-specific gene lists and does not require training or the existence of annotated data. By respecting the hierarchical nature of cell type relationships and assigning cells consecutively to more specialized identities, excellent prediction performance is achieved. In a benchmark based on publicly available PBMC data sets, scROSHI outperforms competing methods when training data are limited or the diversity between experiments is large.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Schematics of the scROSHI workflow. The gene x cell (row x column) normalized expression matrix (A) is combined with the binary gene × cell type membership matrix (B) to define genes specific for a cell type (black) and genes specific for other cell types (white). (C) The one-sided Mann-Whitney rank sum test provides a cell type x cell score matrix (top), which is normalized (bottom). (D) UMAP representation of all cells from a melanoma patient biopsy using the most highly variable genes, colored by phenograph clusters. (E) Developmental ‘family tree’ defining cell type hierarchies. (F) The representation in (D) is colored by scROSHI predicted cell types.
Figure 2.
Figure 2.
Benchmark results for scROSHI (left) and the three competing machine learning methods. Each panel corresponds to a combination of training data (column) and test data (row). The cross-validation accuracy (in%) is shown separately for major cell types (orange) and all fine-grained cell types (green). The black bar shows the percentage of cases where the cell type is ‘unknown’.
Figure 3.
Figure 3.
UMAP representation of cells in gene expression and CNV space. Cells from three melanoma biopsy samples (A, n = 3928 cells; B, n = 2967 cells; C, n = 2326 cells) were annotated using scROSHI. The first row shows the UMAP embedding of the normalized and log-transformed gene expressions and the second row shows the UMAP embedding of CNV profiles. The colors represent the cell type annotations. The greyscale in the insets represent the CNV status.
Figure 4.
Figure 4.
Cell type classification where the list of expected cell types was modified. (A) All known cell types are included. (B) Plasmacytoid dendritic cells are excluded. (C) T cells are excluded. (D) Melanoma cells are excluded. In panels B–D, the cluster of cells for which the label was excluded in the classification is marked by an arrow.

References

    1. Eberwine J., Yeh H., Miyashiro K., Cao Y., Nair S., Finnell R., Zettel M., Coleman P.. Analysis of gene expression in single live neurons. Proc. Natl. Acad. Sci. U.S.A. 1992; 89:3010–3014. - PMC - PubMed
    1. Tang F., Barbacioru C., Wang Y., Nordman E., Lee C., Xu N., Wang X., Bodeau J., Tuch B.B., Siddiqui A.et al. .. mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods. 2009; 6:377–382. - PubMed
    1. Aldridge S., Teichmann S.A.. Single cell transcriptomics comes of age. Nat. Commun. 2020; 11:4307. - PMC - PubMed
    1. Hwang B., Lee J.H., Bang D.. Single-cell RNA sequencing technologies and bioinformatics pipelines. Exp. Mol. Med. 2018; 50:1–14. - PMC - PubMed
    1. Stuart T., Satija R.. Integrative single-cell analysis. Nat. Rev. Genet. 2019; 20:257–272. - PubMed