Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Sep 19;47(16):e95.
doi: 10.1093/nar/gkz543.

CHETAH: a selective, hierarchical cell type identification method for single-cell RNA sequencing

Affiliations

CHETAH: a selective, hierarchical cell type identification method for single-cell RNA sequencing

Jurrian K de Kanter et al. Nucleic Acids Res. .

Abstract

Cell type identification is essential for single-cell RNA sequencing (scRNA-seq) studies, currently transforming the life sciences. CHETAH (CHaracterization of cEll Types Aided by Hierarchical classification) is an accurate cell type identification algorithm that is rapid and selective, including the possibility of intermediate or unassigned categories. Evidence for assignment is based on a classification tree of previously available scRNA-seq reference data and includes a confidence score based on the variance in gene expression per cell type. For cell types represented in the reference data, CHETAH's accuracy is as good as existing methods. Its specificity is superior when cells of an unknown type are encountered, such as malignant cells in tumor samples which it pinpoints as intermediate or unassigned. Although designed for tumor samples in particular, the use of unassigned and intermediate types is also valuable in other exploratory studies. This is exemplified in pancreas datasets where CHETAH highlights cell populations not well represented in the reference dataset, including cells with profiles that lie on a continuum between that of acinar and ductal cell types. Having the possibility of unassigned and intermediate cell types is pivotal for preventing misclassification and can yield important biological information for previously unexplored tissues.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The CHETAH algorithm.
Figure 2.
Figure 2.
CHETAH’s classification of two tumor sample datasets is nearly identical to the published manual classification. The t-SNE plots depict each cell as a dot, with the colors representing the inferred cell type shown in the legend. Gray colors indicate intermediate cell types which are labeled automatically as Node1, Node2, etc. For the corresponding classification trees see Supplementary Figure S2. Rows of panels: datasets classified (A-C: Melanoma, Tirosh et al. (24); D-F: Ovarian, Schelker et al. (18)); columns: classification method. For an overview of the datasets see Table 1.
Figure 3.
Figure 3.
CHETAH compared with other methods (bottom labels), across six combinations of input and reference datasets (top labels, including the corresponding scRNA-seq platform: ss2: Smart-seq2; iD: inDrops; cs2: CEL-seq2. Microfluidics methods in blue, well-plate methods in orange). For scmap, both the ‘cell’ mode (scmap_cell) and ‘cluster’ mode (scmap_cl.) where evaluated. CHETAH was run with default settings, but also with a zero confidence score threshold (CHETAH_0), thus forcing it to classify all cells to a final type. (A) Percentages of cells per classification result category as shown in (B). (B) Classification result categories used in A. (C) The influence of the number of cells per reference cell type on CHETAH’s classification performance was investigated as follows. The 7830 cells of the (Drop-seq protocol) CITE-seq study (39), were classified with reference cells from the PBMC dataset (27), generated with the 10× Genomics platform. This dataset contained a total of 68 579 cells. The numbers on the y-axis are the number of (randomly sampled) cells per reference cell type taken to classify the input dataset. Classification results were divided into the six categories depicted in (B). Besides investigating the influence of the number of cells in a reference type, this analysis also serves as an example of cross-platform performance, as well as an example using datasets with large numbers of cells. More details of the datasets used can be found in Table 1. Note that in the other analyses reported throughout, no limitation is placed on the number of cells per reference type.
Figure 4.
Figure 4.
CHETAH identifies opposing gradients of duct and acinar cell marker genes in the Pancreas2 dataset (41). (A) t-SNE plot of the Pancreas2 dataset as classified by CHETAH, with colors representing the inferred cell types. The arrowhead indicates a population that was labeled as acinar cell in the publication, but is classified to a mixture of duct cell (blue), acinar cell (green) and intermediate Node 6 (gray) by CHETAH. (B) The classification tree used for (A), based on the Pancreas1 dataset. The arrow indicates the acinar/ductal intermediate node (Node 6) for which the profile score of duct cells is shown in (C). (C) As (B), but with all cells colored by the profile score for ductal cell in Node 6. The cells in the cluster of interest show a gradient of the profile score. (D) Heatmap showing the normalized expression of the genes (rows) used by CHETAH to calculate the profile score plotted in (C), for the cells (columns) in the cluster indicated by the arrowhead in panel (A). Only genes (rows) having an absolute correlation >0.5 with the profile score are shown. The cells are sorted by the duct cell profile score in Node 6 which is shown above the heatmap. Well-known acinar (top) and ductal marker genes (bottom) are labeled (see main text). For the heatmap with all genes annotated see Supplementary Figure S7.

References

    1. Islam S., Kjällquist U., Moliner A., Zajac P., Fan J.-B., Lönnerberg P., Linnarsson S.. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res. 2011; 21:1160–1167. - PMC - PubMed
    1. Saadatpour A., Lai S., Guo G., Yuan G.-C.. Single-cell analysis in cancer genomics. Trends Genet. 2015; 31:576–586. - PMC - PubMed
    1. Grün D., van Oudenaarden A.. Design and analysis of single-cell sequencing experiments. Cell. 2015; 163:799–810. - PubMed
    1. Lambrechts D., Wauters E., Boeckx B., Aibar S., Nittner D., Burton O., Bassez A., Decaluwé H., Pircher A., Eynde K.V. den et al. .. Phenotype molding of stromal cells in the lung tumor microenvironment. Nat. Med. 2018; 24:1277–1289. - PubMed
    1. Levitin H.M., Yuan J., Sims P.A.. Single-cell transcriptomic analysis of tumor heterogeneity. Trends Cancer. 2018; 4:264–268. - PMC - PubMed

Publication types

MeSH terms