. 2024 Mar 27;25(3):bbae171.

doi: 10.1093/bib/bbae171.

stDiff: a diffusion model for imputing spatial transcriptomics through single-cell transcriptomics

Kongming Li^{1

2}, Jiahao Li^{1

2}, Yuhao Tao^{1

2}, Fei Wang^{1

2}

Affiliations

¹ Shanghai Key Lab of Intelligent Information Processing, Handan Street, 200433 Shanghai, China.
² School of Computer Science and Technology, Fudan UniversityHandan Street, 200433 Shanghai, China.

PMID: 38628114
PMCID: PMC11021815
DOI: 10.1093/bib/bbae171

stDiff: a diffusion model for imputing spatial transcriptomics through single-cell transcriptomics

Kongming Li et al. Brief Bioinform. 2024.

. 2024 Mar 27;25(3):bbae171.

doi: 10.1093/bib/bbae171.

Authors

Kongming Li^{1

2}, Jiahao Li^{1

2}, Yuhao Tao^{1

2}, Fei Wang^{1

2}

Affiliations

¹ Shanghai Key Lab of Intelligent Information Processing, Handan Street, 200433 Shanghai, China.
² School of Computer Science and Technology, Fudan UniversityHandan Street, 200433 Shanghai, China.

PMID: 38628114
PMCID: PMC11021815
DOI: 10.1093/bib/bbae171

Abstract

Spatial transcriptomics (ST) has become a powerful tool for exploring the spatial organization of gene expression in tissues. Imaging-based methods, though offering superior spatial resolutions at the single-cell level, are limited in either the number of imaged genes or the sensitivity of gene detection. Existing approaches for enhancing ST rely on the similarity between ST cells and reference single-cell RNA sequencing (scRNA-seq) cells. In contrast, we introduce stDiff, which leverages relationships between gene expression abundance in scRNA-seq data to enhance ST. stDiff employs a conditional diffusion model, capturing gene expression abundance relationships in scRNA-seq data through two Markov processes: one introducing noise to transcriptomics data and the other denoising to recover them. The missing portion of ST is predicted by incorporating the original ST data into the denoising process. In our comprehensive performance evaluation across 16 datasets, utilizing multiple clustering and similarity metrics, stDiff stands out for its exceptional ability to preserve topological structures among cells, positioning itself as a robust solution for cell population identification. Moreover, stDiff's enhancement outcomes closely mirror the actual ST data within the batch space. Across diverse spatial expression patterns, our model accurately reconstructs them, delineating distinct spatial boundaries. This highlights stDiff's capability to unify the observed and predicted segments of ST data for subsequent analysis. We anticipate that stDiff, with its innovative approach, will contribute to advancing ST imputation methodologies.

Keywords: diffusion model; imputation; scRNA-seq data; spatial transcriptomics data.

PubMed Disclaimer

Figures

**Figure 1**
Framework of stDiff. (A) Brief framework of DDPM. The forward diffusion process (left to right) gradually introduces Gaussian noise to the target data. The reverse process (right to left) iteratively denoises the target data. (B) Training process of stDiff. ScRNA-seq data undergoes noise perturbation to get . It is then introduced noise dependent on time step , resulting in . Shared part of and unique part of are concatenated to form . Finally, a denoising network is trained to predict the introduced noise. The training process is guided by the shared gene part of . (C) Inference process of stDiff. ST data serve as condition to guide the learned denoising network to denoise step by step from a random noise. The final result after removing introduced noise is the predicted imputation for ST data.

formula image — **Figure 1**
Framework of stDiff. (A) Brief framework of DDPM. The forward diffusion process (left to right) gradually introduces Gaussian noise to the target data. The reverse process (right to left) iteratively denoises the target data. (B) Training process of stDiff. ScRNA-seq data undergoes noise perturbation to get . It is then introduced noise dependent on time step , resulting in . Shared part of and unique part of are concatenated to form . Finally, a denoising network is trained to predict the introduced noise. The training process is guided by the shared gene part of . (C) Inference process of stDiff. ST data serve as condition to guide the learned denoising network to denoise step by step from a random noise. The final result after removing introduced noise is the predicted imputation for ST data.

**Figure 2**
UMAP plots illustrating scRNA-seq data, real ST data and imputed ST data generated by Tangram, gimVI, stPlus, SpaGE, uniPort, SpatialScope and stDiff. (A) and (B) correspond to Dataset2_osmFISH and Dataset3_ExSeq in Table 1, respectively.

**Figure 3**
Clustering metrics (ARI, AMI, Homogeneity, NMI) demonstrating the topological consistency among cells between authentic ST data and predicted data generated by Tangram, gimVI, stPlus, SpaGE, uniPort, SpatialScope and stDiff across different platforms of ST data.

**Figure 4**
Evaluation metrics (1-SPCC, 1-SSIM, RMSE, JS) to assess gene expression similarity between authentic ST data and predicted data generated by Tangram(Tan), gimVI(gim), SpaGE(Spa), stPlus(stP), uniPort(uni), SpatialScope(SpS) and stDiff(stD) across different platforms of ST data. (A)–(D) correspond to Dataset2_osmFISH, Dataset5_MERFISH, Dataset6_MERFISH and Dataset10_seqFISH in Table 1, respectively.

**Figure 5**
The predicted expression abundance of known spatially patterned genes in Dataset8_FISH. Each column corresponds to a single gene with a clear spatial pattern. The first row from the top displays the ground truth of spatial gene expression in Dataset8_FISH, while the subsequent rows show the corresponding predicted expression patterns through 5-fold cross-validation experiments using stDiff, Tangram, gimVI, SpaGE, stPlus, uniPort and SpatialScope.

**Figure 6**
Boxplots and scatter plots of the AS for the data generated by the seven methods across all 15 paired datasets. The central line represents the median, the box depicts the interquartile range, whiskers extend to 1.5 times the interquartile range, and dots represent the AS of individual datasets. (A) The AS scores for clustering metrics. (B) the AS scores for gene similarity metrics. Panel (C), The overall AS scores for all eight metrics.

See this image and copyright information in PMC

Cited by

eMCI: An Explainable Multimodal Correlation Integration Model for Unveiling Spatial Transcriptomics and Intercellular Signaling.
Hong R, Tong Y, Tang H, Zeng T, Liu R. Hong R, et al. Research (Wash D C). 2024 Nov 1;7:0522. doi: 10.34133/research.0522. eCollection 2024. Research (Wash D C). 2024. PMID: 39494219 Free PMC article.
SpaDiT: diffusion transformer for spatial gene expression prediction using scRNA-seq.
Li X, Zhu F, Min W. Li X, et al. Brief Bioinform. 2024 Sep 23;25(6):bbae571. doi: 10.1093/bib/bbae571. Brief Bioinform. 2024. PMID: 39508444 Free PMC article.
Building a learnable universal coordinate system for single-cell atlas with a joint-VAE model.
Gao H, Hua K, Wu X, Wei L, Chen S, Yin Q, Jiang R, Zhang X. Gao H, et al. Commun Biol. 2024 Aug 12;7(1):977. doi: 10.1038/s42003-024-06564-0. Commun Biol. 2024. PMID: 39134617 Free PMC article.
SpaIM: Single-cell Spatial Transcriptomics Imputation via Style Transfer.
Li B, Tang Z, Budhkar A, Liu X, Zhang T, Yang B, Su J, Song Q. Li B, et al. bioRxiv [Preprint]. 2025 Jan 27:2025.01.24.634756. doi: 10.1101/2025.01.24.634756. bioRxiv. 2025. Update in: Nat Commun. 2025 Aug 23;16(1):7861. doi: 10.1038/s41467-025-63185-9. PMID: 39975319 Free PMC article. Updated. Preprint.
GEMDiff: a diffusion workflow bridges between normal and tumor gene expression states: a breast cancer case study.
Ai X, Smith MC, Feltus FA. Ai X, et al. Brief Bioinform. 2025 Mar 4;26(2):bbaf093. doi: 10.1093/bib/bbaf093. Brief Bioinform. 2025. PMID: 40067113 Free PMC article.

See all "Cited by" articles

References

1. Moffitt JR, Bambah-Mukku D, Eichhorn SW, et al. .. Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region. Science 2018;362. - PMC - PubMed
1. Codeluppi S, Borm LE, Zeisel A, et al. .. Spatial organization of the somatosensory cortex revealed by osmFISH. Nat Methods 2018;15(11):932–5. - PubMed
1. Eng C-HL, Lawson M, Zhu Q, et al. .. Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH+. Nature 2019;568(7751):235–9. - PMC - PubMed
1. Rodriques SG, Stickels RR, Goeva A, et al. .. Slide-seq: a scalable technology for measuring genome-wide expression at high spatial resolution. Science 2019;363(6434):1463–7. - PMC - PubMed
1. Ståhl PL, Salmén F, Vickovic S, et al. .. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science 2016;353(6294):78–82. - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

61472086/National Natural Science Foundation of China

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

stDiff: a diffusion model for imputing spatial transcriptomics through single-cell transcriptomics

Affiliations

stDiff: a diffusion model for imputing spatial transcriptomics through single-cell transcriptomics

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources