Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 1;26(3):bbaf298.
doi: 10.1093/bib/bbaf298.

scTsI: an effective two-stage imputation method for single-cell RNA-seq data

Affiliations

scTsI: an effective two-stage imputation method for single-cell RNA-seq data

Hongyu Zhang et al. Brief Bioinform. .

Abstract

Single-cell RNA-seq facilitates the understanding of cell types and states and the revealing of the cellular heterogeneity in developmental processes and disease mechanisms. However, the dropout events in single-cell RNA-seq data, in which genes are not detected due to technical noise or limited sequencing depth, seriously affect downstream analyses. Imputation is an effective way to relieve the impact of dropout events. However, the current methods may introduce new noise or modify the high expression values in the imputation process and their performance may be lower than expected when dealing with data with a high dropout rate, facing with different types of data, and aiming at various downstream analyses. We propose a two-stage imputation algorithm, scTsI, for single-cell RNA-seq data. In the first stage, scTsI imputes the zero values using the information of neighboring cells and genes. In the second stage, scTsI transforms the expression matrix into a vector, performs row transformation, and adjusts the imputed values through ridge regression and leveraging bulk RNA-seq data as a constraint. scTsI ensures that the original highly expressed values are unchanged, avoids introducing new noise, and allows sparse matrix input to accelerate imputation. We conduct experiments on a variety of simulated and real data with different dropout rates and compare scTsI with the commonly used imputation methods. The results show that scTsI can restore gene expression and maintain cell-cell similarity across different data dimensions and dropout rates. scTsI can also improve the performance of data visualization, clustering, and cell trajectory inference.

Keywords: bulk RNA-seq data; imputation; ridge regression; single-cell gene expression; vector transformation.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Flow chart of scTsI. In the first stage, scTsI uses neighboring cells and genes for initial imputation through KNN. In the second stage, scTsI leverages bulk RNA-seq data as a constraint and uses ridge regression for adjusting the initial imputed values.
Figure 2
Figure 2
(A) PCC and (B) RMSE between the flattened imputed gene expression data by every method and the flattened real data for all simulated data and all dropout rates.
Figure 3
Figure 3
(A) PCC between the flattened cell–cell similarity matrices obtained from the true and the imputed single-cell gene expression by every method. (B) t-SNE visualization results of all imputation methods on simulated data of 1000 genes × 3000 cells with a 60% dropout rate. (C) t-SNE visualization of all imputation methods on real dataset sc_10x.
Figure 4
Figure 4
(A) Boxplots of ARI and NMI values on data imputed by all methods for each simulated dataset. Each dot represents a specific dropout rate. (B) Averages of ARI and NMI across all dropout rates for each simulated dataset. (C) ARI and NMI on imputed data by all methods for the five tested experimental datasets.
Figure 5
Figure 5
The inferred trajectory based on the imputed RNAmix_celseq2 dataset, showing different (A) pseudotimes and (B) cell types. (C) Trajectory inference metrics, correlation, overlap, and percentage, calculated based on the five real datasets.
Figure 6
Figure 6
Violin plots of eight evaluation metrics, ARI, NMI, PCC, RMSE, cell–cell similarity, correlation, overlap, and percentage across all simulated and experimental datasets. One-sided paired t-tests were conducted to evaluate whether each method significantly underperforms scTsI. Significance levels are indicated as follows, *: P < .1, **: <.01, ***: <.001, ****: <1e-4, *****: <1e-5.

References

    1. Stevenson K, Uversky VN. Single-cell RNA-Seq: A next generation sequencing tool for a high-resolution view of the individual cell. J Biomol Struct Dyn 2020;38:3730–5. - PubMed
    1. Shapiro E, Biezuner T, Linnarsson S. Single-cell sequencing-based technologies will revolutionize whole-organism science. Nat Rev Genet 2013;14:618–30. - PubMed
    1. Dai C, Jiang Y, Yin C. et al. scIMC: A platform for benchmarking comparison and visualization analysis of scRNA-seq data imputation methods. Nucleic Acids Res 2022;50:4877–99. 10.1093/nar/gkac317. - DOI - PMC - PubMed
    1. Olsen TK, Baryawno N. Introduction to single-cell RNA sequencing. Curr Protoc Mol Biol 2018;122:e57. - PubMed
    1. Papalexi E, Satija R. Single-cell RNA sequencing to explore immune cell heterogeneity. Nat Rev Immunol 2018;18:35–45. - PubMed

LinkOut - more resources