Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Jan 19:19:961-969.
doi: 10.1016/j.csbj.2021.01.015. eCollection 2021.

Automated methods for cell type annotation on scRNA-seq data

Affiliations
Review

Automated methods for cell type annotation on scRNA-seq data

Giovanni Pasquini et al. Comput Struct Biotechnol J. .

Abstract

The advent of single-cell sequencing started a new era of transcriptomic and genomic research, advancing our knowledge of the cellular heterogeneity and dynamics. Cell type annotation is a crucial step in analyzing single-cell RNA sequencing data, yet manual annotation is time-consuming and partially subjective. As an alternative, tools have been developed for automatic cell type identification. Different strategies have emerged to ultimately associate gene expression profiles of single cells with a cell type either by using curated marker gene databases, correlating reference expression data, or transferring labels by supervised classification. In this review, we present an overview of the available tools and the underlying approaches to perform automated cell type annotations on scRNA-seq data.

Keywords: Automatic annotation; Cell state; Cell type; scRNA-seq.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Fig 1
Fig 1
Approaches for cell type annotation of scRNA-seq datasets. scRNA-seq datasets can be automatically annotated by tools implementing one of three approaches: annotation by marker gene databases; correlation-based methods; and annotation by supervised classification. The task of annotating a query scRNA-seq dataset consists of assigning a cell type identity to each one of the query single cells, or to a group of cells at once i.e. an unbiasedly calculated cluster. (A) Marker gene database-based annotation takes advantage of cell type atlases. Literature- and scRNA-seq analysis-derived markers have been assembled into reference cell type hierarchies and marker lists. In this approach, basic scoring systems are used to ascribe cell types at the cluster level in the query dataset. (B) Correlation-based methods make use of multiple correlation measures to compare gene expression profiles between a reference and a query dataset, at either single-cell or cluster level, by the use of centroids (pseudo-cells obtained by averaging the single-cell gene expression level of an entire cluster). Some of these tools assemble a reference of cell type gene-expression profiles from an ensemble of published studies and bulk RNA data repositories. The annotation step in this approach consists of finding the reference cell type that best correlates to the query cell or cluster, and every tool uses multiple steps for accurately finding the best match. (C) Annotation by supervised classification uses machine learning techniques for training a classifier on reference labeled scRNA-seq datasets. The classifier is subsequently applied to the query. Supervised learning is a powerful tool for building a model distribution of training labels as a function of features. Machine learning techniques offer a variety of alternatives in the training step and allow for hierarchical classification, which permits a more biologically-relevant identification of cell types.

References

    1. The T.W., Thoery C. Past and present. J Anat Physiol. 1890;24:253–287. - PMC - PubMed
    1. Hosokawa H., Rothenberg E.V. How transcription factors drive choice of the T cell fate. Nat Rev Immunol. 2020 doi: 10.1038/s41577-020-00426-6. - DOI - PMC - PubMed
    1. Fuchs E, Blau HM. Tissue Stem Cells: Architects of Their Niches. Cell Stem Cell 2020;27:532–56. DOI:10.1016/j.stem.2020.09.011. - PMC - PubMed
    1. Mereu E., Lafzi A., Moutinho C., Ziegenhain C., McCarthy D.J., Álvarez-Varela A. Benchmarking single-cell RNA-sequencing protocols for cell atlas projects. Nat Biotechnol. 2020;38:747–755. doi: 10.1038/s41587-020-0469-4. - DOI - PubMed
    1. Eberwine J., Sul J.Y., Bartfai T., Kim J. The promise of single-cell sequencing. Nat Methods. 2014;11:25–27. doi: 10.1038/nmeth.2769. - DOI - PubMed