Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 4;40(3):btae128.
doi: 10.1093/bioinformatics/btae128.

Uncertainty-aware single-cell annotation with a hierarchical reject option

Affiliations

Uncertainty-aware single-cell annotation with a hierarchical reject option

Lauren Theunissen et al. Bioinformatics. .

Abstract

Motivation: Automatic cell type annotation methods assign cell type labels to new datasets by extracting relationships from a reference RNA-seq dataset. However, due to the limited resolution of gene expression features, there is always uncertainty present in the label assignment. To enhance the reliability and robustness of annotation, most machine learning methods address this uncertainty by providing a full reject option, i.e. when the predicted confidence score of a cell type label falls below a user-defined threshold, no label is assigned and no prediction is made. As a better alternative, some methods deploy hierarchical models and consider a so-called partial rejection by returning internal nodes of the hierarchy as label assignment. However, because a detailed experimental analysis of various rejection approaches is missing in the literature, there is currently no consensus on best practices.

Results: We evaluate three annotation approaches (i) full rejection, (ii) partial rejection, and (iii) no rejection for both flat and hierarchical probabilistic classifiers. Our findings indicate that hierarchical classifiers are superior when rejection is applied, with partial rejection being the preferred rejection approach, as it preserves a significant amount of label information. For optimal rejection implementation, the rejection threshold should be determined through careful examination of a method's rejection behavior. Without rejection, flat and hierarchical annotation perform equally well, as long as the cell type hierarchy accurately captures transcriptomic relationships.

Availability and implementation: Code is freely available at https://github.com/Latheuni/Hierarchical_reject and https://doi.org/10.5281/zenodo.10697468.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
(A) Schematic overview of the training of a hierarchical classifier and the division of the training data during this task. (B) Example illustrating the difference between greedy versus nongreedy label assignment with hierarchical annotation. With greedy label assignment, only the path with the highest probability scores in the hierarchy is followed. With nongreedy label assignment, all possible prediction paths are traversed and only the end score is considered for the final label assignment. (C) Illustration of how intermediate node probabilities for a cell type hierarchy are reconstructed for bottom-up label assignment. The intermediate node probabilities are calculated by summing all the probabilities of the children nodes. This information can then be used to perform label rejection along the hierarchy and to construct accuracy-rejection curves.
Figure 2.
Figure 2.
Accuracy-rejection curves of flat and hierarchical annotation with nongreedy prediction of the AMB, Azimuth PBMC, COVID, Flybody, and Flyhead datasets with logistic regression (LR), random forests (RF), and linear SVM (SVM) classifiers. The curves in the top row represent the accuracy score given a rejection threshold value. The curves on the bottom row represent the percentage of rejected labels for all rejection threshold values.
Figure 3.
Figure 3.
A comparison of the percentage of unknown labels assigned during the annotation task when partial and full rejection is implemented during nongreedy hierarchical and flat annotation. Evaluation is performed with three classifiers (logistic regression, random forests, and linear SVM) across five datasets (AMB, Azimuth PBMC, COVID, Flyhead, and Flybody).

References

    1. Abdelaal T, Michielsen L, Cats D. et al. A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol 2019;20:194. - PMC - PubMed
    1. Alquicira-Hernandez J, Sathe A, Ji HP. et al. scPred: accurate supervised method for cell-type classification from single-cell RNA-seq data. Genome Biol 2019;20:264. - PMC - PubMed
    1. Bernstein MN, Ma Z, Gleicher M. et al. CellO: comprehensive and hierarchical cell type classification of human cells with the cell ontology. iScience 2021;24:101913. - PMC - PubMed
    1. Beygelzimer A, Langford J, Lifshits Y. et al. Conditional probability tree estimation analysis and algorithms. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI ’09, Arlington, Virginia, United States. AUAI Press, 2009, 51–8.
    1. Bi W, Kwok JT.. Bayes-optimal hierarchical multilabel classification. IEEE Trans Knowl Data Eng 2015;27:2907–18.

Publication types