Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan-Dec;19(1):e12107.
doi: 10.1049/syb2.12107. Epub 2025 Apr 22.

scRSSL: Residual semi-supervised learning with deep generative models to automatically identify cell types

Affiliations

scRSSL: Residual semi-supervised learning with deep generative models to automatically identify cell types

Yanru Gao et al. IET Syst Biol. 2025 Jan-Dec.

Abstract

Single-cell sequencing (scRNA-seq) allows researchers to study cellular heterogeneity in individual cells. In single-cell transcriptomics analysis, identifying the cell type of individual cells is a key task. At present, single-cell datasets often face the challenges of high dimensionality, large number of samples, high sparsity and sample imbalance. The traditional methods of cell type recognition have been challenged. The authors propose a deep residual generation model based on semi-supervised learning (scRSSL) to address these challenges. ScRSSL creatively introduces residual networks into semi-supervised generative models. The authors take advantage of its semi-supervised learning to solve the problem of sample imbalance. During the training of the model, the authors use a residual neural network to accomplish the inference of cell types so that local features of single-cell data can be extracted. Because of the semi-supervised learning approach, it can automatically and accurately predict individual cell types in datasets, even with only a small number of cell labels. Experimentally, the authors' method has proven to have better performance compared to other methods.

Keywords: bioinformatics; deep generative model; deep learning; semi‐supervised learning; single cell.

PubMed Disclaimer

Conflict of interest statement

The authors declare no potential conflicts of interests.

Figures

FIGURE 1
FIGURE 1
Model framework summary figure. Firstly, part of the labelled data was preliminarily pre‐processed before entering the neural network model. Then, the first hidden layer is compressed to produce an initial possible representation, designated as z1. After separating the labelled cells from the unlabelled cells in z1, we consider the cell type of the unlabelled cell data as a latent variable and introduce the variable y for the labelled cell data. After the cell type is inferred by the residual neural network architecture, the unlabelled cells are created together with the labelled data as the second latent representation z2. Finally, the decoder neural network converts it into a negative binomial distribution of the original dataset.
FIGURE 2
FIGURE 2
Inter‐dataset experiment result figure. The box plot of the experimental results of scRSSL compared with seven other baseline methods, using the f1‐score as the evaluation metric. The first row of the graph represents the use of the Xin dataset as the reference dataset, with the remaining four datasets as the query dataset for prediction.
FIGURE 3
FIGURE 3
Intra‐dataset experiment result figure. Boxplot of the experimental results of scRSSL against seven other baseline methods on four datasets: Zeisel, Baron, Klein, and Romanov.
FIGURE 4
FIGURE 4
Confusion matrix figure of experimental results. The confusion matrix obtained by each model predicting the hECA dataset, where 0–7 represent: Adipocyte, Cardiomyocyte cell, Endothelial cell, Fibroblast, Lymphoid cell, Myeloid cell, Pericyte, Smooth muscle cell, and other 8 cell types.
FIGURE 5
FIGURE 5
Visualisation of the prediction performance comparison of different cell types on four models: scRSSL, RF, KNN, and AdaBoost. In the ROC plot, the closer the curve is to the upper left corner, the better the performance of the model.
FIGURE 6
FIGURE 6
Visualisation of the prediction performance comparison of the four models scRSSL, RF, KNN, and AdaBoost. In the ROC plot, the closer the curve is to the top‐left corner, the better the performance of the model.
FIGURE 7
FIGURE 7
Ablation experiment. F1 score visualisation results of two models, ScRSSL and scRSSL variants without the residual network module on four datasets: Zeisel, Baron, Klein, and Romanov.

References

    1. Azizi, E. , et al.: Single‐cell map of diverse immune phenotypes in the breast tumor microenvironment. Cell 174(5), 1293.e1236–1308.e1236 (2018). 10.1016/j.cell.2018.05.060 - DOI - PMC - PubMed
    1. Schaum, N. , et al.: Single‐cell transcriptomics of 20 mouse organs creates a tabula Muris: the Tabula Muris consortium. Nature 562(7727), 367–372 (2018). 10.1038/s41586-018-0590-4 - DOI - PMC - PubMed
    1. Jaitin, D.A. , et al.: Massively parallel single‐cell RNA‐seq for marker‐free decomposition of tissues into cell types. Science 343(6172), 776–779 (2014). 10.1126/science.1247651 - DOI - PMC - PubMed
    1. Chen, S. , et al.: hECA: the cell‐centric assembly of a cell atlas. iScience 25(5), 104318 (2022). 10.1016/j.isci.2022.104318 - DOI - PMC - PubMed
    1. Ao, C. , et al.: Computational approaches for predicting drug‐disease associations: a comprehensive review. (2023)

Publication types

LinkOut - more resources