Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep 27;10(19):e38567.
doi: 10.1016/j.heliyon.2024.e38567. eCollection 2024 Oct 15.

Identification of kidney cell types in scRNA-seq and snRNA-seq data using machine learning algorithms

Affiliations

Identification of kidney cell types in scRNA-seq and snRNA-seq data using machine learning algorithms

Adam Tisch et al. Heliyon. .

Abstract

Introduction: Single-cell RNA sequencing (scRNA-seq) and single-nucleus RNA sequencing (snRNA-seq) provide valuable insights into the cellular states of kidney cells. However, the annotation of cell types often requires extensive domain expertise and time-consuming manual curation, limiting scalability and generalizability. To facilitate this process, we tested the performance of five supervised classification methods for automatic cell type annotation.

Results: We analyzed publicly available sc/snRNA-seq datasets from five expert-annotated studies, comprising 62,120 cells from 79 kidney biopsy samples. Datasets were integrated by harmonizing cell type annotations across studies. Five different supervised machine learning algorithms (support vector machines, random forests, multilayer perceptrons, k-nearest neighbors, and extreme gradient boosting) were applied to automatically annotate cell types using four training datasets and one testing dataset. Performance metrics, including accuracy (F1 score) and rejection rates, were evaluated. All five machine learning algorithms demonstrated high accuracies, with a median F1 score of 0.94 and a median rejection rate of 1.8 %. The algorithms performed equally well across different datasets and successfully rejected cell types that were not present in the training data. However, F1 scores were lower when models trained primarily on scRNA-seq data were tested on snRNA-seq data.

Conclusions: Despite limitations including the number of biopsy samples, our findings demonstrate that machine learning algorithms can accurately annotate a wide range of adult kidney cell types in scRNA-seq/snRNA-seq data. This approach has the potential to standardize cell type annotation and facilitate further research on cellular mechanisms underlying kidney disease.

Keywords: Annotation; Cell identity; Classification; Kidney; Machine learning; RNA-seq.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Fig. 1
Fig. 1
Heatmap of correlations between cell types across all 5 cohorts per the authors' original annotations. Axes are a symmetrical layout of annotations and annotations are grouped according to dendrogram and boxed by harmonized cell types, which are defined as groups of annotations with a high degree of correlation.
Fig. 2
Fig. 2
Heatmaps demonstrating for each testing dataset, the performances of each classification algorithm as defined by (a) median F1 score or (b) median percent of cells classified as unknowns. (c) Heatmap of each classification algorithm's rejection rate on when Young et al. was used as the testing dataset.
Fig. 3
Fig. 3
Heatmap of each classifier's F1 score on (a) Menon et al., (b) Lake et al., (c) Wu et al., and (d) Young et al. with respect to each harmonized cell type.

References

    1. Ju W., Greene C.S., Eichinger F., Nair V., Hodgin J.B., Bitzer M., et al. Defining cell-type specificity at the transcriptional level in human disease. Genome Res. 2013;23(11):1862–1873. - PMC - PubMed
    1. Shen-Orr S.S., Tibshirani R., Khatri P., Bodian D.L., Staedtler F., Perry N.M., et al. Cell type-specific gene expression differences in complex tissues. Nat. Methods. 2010;7(4):287–289. - PMC - PubMed
    1. Gawel D.R., Serra-Musach J., Lilja S., Aagesen J., Arenas A., Asking B., et al. Correction to: a validated single-cell-based strategy to identify diagnostic and therapeutic targets in complex diseases. Genome Med. 2020;12(1):37. - PMC - PubMed
    1. Abdelaal T., Michielsen L., Cats D., Hoogduin D., Mei H., Reinders M.J.T., et al. A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol. 2019;20(1):194. - PMC - PubMed
    1. Young M.D., Mitchell T.J., Vieira Braga F.A., Tran M.G.B., Stewart B.J., Ferdinand J.R., et al. Single-cell transcriptomes from human kidneys reveal the cellular identity of renal tumors. Science. 2018;361(6402):594–599. - PMC - PubMed

LinkOut - more resources