. 2024 Mar 4;40(3):btae128.

doi: 10.1093/bioinformatics/btae128.

Uncertainty-aware single-cell annotation with a hierarchical reject option

Lauren Theunissen^{1

2

3}, Thomas Mortier¹, Yvan Saeys^{2

3}, Willem Waegeman¹

Affiliations

¹ Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium.
² Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium.
³ Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium.

PMID: 38441258
PMCID: PMC10957513
DOI: 10.1093/bioinformatics/btae128

Uncertainty-aware single-cell annotation with a hierarchical reject option

Lauren Theunissen et al. Bioinformatics. 2024.

. 2024 Mar 4;40(3):btae128.

doi: 10.1093/bioinformatics/btae128.

Authors

Lauren Theunissen^{1

2

3}, Thomas Mortier¹, Yvan Saeys^{2

3}, Willem Waegeman¹

Affiliations

¹ Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium.
² Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium.
³ Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium.

PMID: 38441258
PMCID: PMC10957513
DOI: 10.1093/bioinformatics/btae128

Abstract

Motivation: Automatic cell type annotation methods assign cell type labels to new datasets by extracting relationships from a reference RNA-seq dataset. However, due to the limited resolution of gene expression features, there is always uncertainty present in the label assignment. To enhance the reliability and robustness of annotation, most machine learning methods address this uncertainty by providing a full reject option, i.e. when the predicted confidence score of a cell type label falls below a user-defined threshold, no label is assigned and no prediction is made. As a better alternative, some methods deploy hierarchical models and consider a so-called partial rejection by returning internal nodes of the hierarchy as label assignment. However, because a detailed experimental analysis of various rejection approaches is missing in the literature, there is currently no consensus on best practices.

Results: We evaluate three annotation approaches (i) full rejection, (ii) partial rejection, and (iii) no rejection for both flat and hierarchical probabilistic classifiers. Our findings indicate that hierarchical classifiers are superior when rejection is applied, with partial rejection being the preferred rejection approach, as it preserves a significant amount of label information. For optimal rejection implementation, the rejection threshold should be determined through careful examination of a method's rejection behavior. Without rejection, flat and hierarchical annotation perform equally well, as long as the cell type hierarchy accurately captures transcriptomic relationships.

Availability and implementation: Code is freely available at https://github.com/Latheuni/Hierarchical_reject and https://doi.org/10.5281/zenodo.10697468.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

**Figure 1.**
(A) Schematic overview of the training of a hierarchical classifier and the division of the training data during this task. (B) Example illustrating the difference between greedy versus nongreedy label assignment with hierarchical annotation. With greedy label assignment, only the path with the highest probability scores in the hierarchy is followed. With nongreedy label assignment, all possible prediction paths are traversed and only the end score is considered for the final label assignment. (C) Illustration of how intermediate node probabilities for a cell type hierarchy are reconstructed for bottom-up label assignment. The intermediate node probabilities are calculated by summing all the probabilities of the children nodes. This information can then be used to perform label rejection along the hierarchy and to construct accuracy-rejection curves.

**Figure 2.**
Accuracy-rejection curves of flat and hierarchical annotation with nongreedy prediction of the AMB, Azimuth PBMC, COVID, Flybody, and Flyhead datasets with logistic regression (LR), random forests (RF), and linear SVM (SVM) classifiers. The curves in the top row represent the accuracy score given a rejection threshold value. The curves on the bottom row represent the percentage of rejected labels for all rejection threshold values.

**Figure 3.**
A comparison of the percentage of unknown labels assigned during the annotation task when partial and full rejection is implemented during nongreedy hierarchical and flat annotation. Evaluation is performed with three classifiers (logistic regression, random forests, and linear SVM) across five datasets (AMB, Azimuth PBMC, COVID, Flyhead, and Flybody).

See this image and copyright information in PMC

References

1. Abdelaal T, Michielsen L, Cats D. et al. A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol 2019;20:194. - PMC - PubMed
1. Alquicira-Hernandez J, Sathe A, Ji HP. et al. scPred: accurate supervised method for cell-type classification from single-cell RNA-seq data. Genome Biol 2019;20:264. - PMC - PubMed
1. Bernstein MN, Ma Z, Gleicher M. et al. CellO: comprehensive and hierarchical cell type classification of human cells with the cell ontology. iScience 2021;24:101913. - PMC - PubMed
1. Beygelzimer A, Langford J, Lifshits Y. et al. Conditional probability tree estimation analysis and algorithms. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI ’09, Arlington, Virginia, United States. AUAI Press, 2009, 51–8.
1. Bi W, Kwok JT.. Bayes-optimal hierarchical multilabel classification. IEEE Trans Knowl Data Eng 2015;27:2907–18.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

Flanders AI Research Program

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Uncertainty-aware single-cell annotation with a hierarchical reject option

Affiliations

Uncertainty-aware single-cell annotation with a hierarchical reject option

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials