Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Sep;22(9):1900-1910.
doi: 10.1038/s41592-025-02795-z. Epub 2025 Sep 15.

Spatial gene expression at single-cell resolution from histology using deep learning with GHIST

Affiliations

Spatial gene expression at single-cell resolution from histology using deep learning with GHIST

Xiaohang Fu et al. Nat Methods. 2025 Sep.

Abstract

The increased use of spatially resolved transcriptomics provides new biological insights into disease mechanisms. However, the high cost and complexity of these methods are barriers to broader application. Consequently, methods have been created to predict spot-based gene expression from routinely collected histology images. Recent benchmarking showed that current methodologies have limited accuracy and spatial resolution, constraining translational capacity. Here, we introduce GHIST, a deep learning-based framework that predicts spatial gene expression at single-cell resolution by leveraging subcellular spatial transcriptomics and synergistic relationships between multiple layers of biological information. We validated GHIST using public datasets and The Cancer Genome Atlas data, demonstrating its flexibility across different spatial resolutions and superior performance. Our results underscore the utility of in silico generation of single-cell spatial gene expression measurements and the capacity to enrich existing datasets with a spatially resolved omics modality, paving the way for scalable multi-omics analysis and biomarker identification.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. GHIST framework.
GHIST maps H&E images to spatially resolved single-cell gene expression through leveraging paired SST data during training. The multitask deep learning-based model integrates relationships between multiple levels of information related to gene expression within cells. Once trained, GHIST enables the in silico estimation of spatially resolved single-cell gene expression from existing H&E data alone, which can be used to enrich further downstream analysis. The cell segmentation and cell-type classification preprocessing steps are performed during training only. During inference, stain normalization is applied on query H&E slides. ^ denotes training only and * denotes optional input or step. WSI, whole-slide image.
Fig. 2
Fig. 2. GHIST predictions on two breast cancer H&E images.
a,b, Comparison between cell types (a) and cell-type compositions (b) obtained from BreastCancer1 paired SST (Xenium) data and predicted cell types from expression predicted from H&E. c,d, Comparison between cell types (c) and cell-type compositions (d) obtained from BreastCancer2 paired SST (Xenium) data and predicted cell types from expression predicted from H&E. Scale bar, 1 mm. e, Predicted gene expression in individual cells from GHIST and SST data (for the gene SCD) on BreastCancer1. f, Box plot of the computed PCC between predicted and measured (ground truth) expression for the top 20 and 50 predicted SVGs and non-SVGs in both BreastCancer1 and BreastCancer2. Each box plot ranges from the first to third quartile with the median as the horizontal line. The lower whisker extends to 1.5 times the interquartile range below the first quartile, while the upper whisker extends to 1.5 times the interquartile range above the third quartile. The sample size corresponds to the number of genes included (either 20 or 50). g, Scatterplots showing the predicted SGE expression of SCD, FASN, FOXA1 and EPCAM compared to measured SST expression. Source data
Fig. 3
Fig. 3. Comparison of spot-based gene prediction and survival analysis performance among state-of-the-art methods and GHIST using the HER2ST dataset.
a,b, Violin and box plots of the average PCC (a) and SSIM (b) between ground-truth gene expression and predicted gene expression. Metrics measured from the test fold of a fourfold cross-validation, averaged over each gene (n = 785) across the dataset. c, Top five correlated genes. d,e, PCC (d) and SSIM (e) violin and box plots for each method for selected SVGs (n = 20 per image sample). f, C-indices of multivariate cox regression models predicting survival of HER2+ subtype from TCGA-BRCA patients (n = 92), using RNA-seq bulk, RNA-seq bulk using only genes present in HER2ST dataset, and the predicted pseudobulk from each method. C-indices were calculated from the test sets of a threefold cross-validation with 100 repeats. g, Cross-validated Kaplan–Meier curves for patients split into high-risk and low-risk groups by the median risk prediction of the multivariate cox regression models for each method and HER2+ breast cancer subtypes. The P value represents the result of the two-sided log-rank test for assessing the statistical significance of differences in survival between the groups. In a, b and df, each box plot ranges from the first to third quartile with the median as the horizontal line. The lower whisker extends to 1.5 times the interquartile range below the first quartile, while the upper whisker extends to 1.5 times the interquartile range above the third quartile. Source data
Fig. 4
Fig. 4. Application of GHIST to TCGA breast cancer H&E images.
a, Illustration of the capacity of GHIST to predict gene expression for each cell in an H&E WSI (example from two TCGA-BRCA samples, TCGA-AN-A0XP and TCGA-C8-A1HF). b, Visualization of the predicted cell type on the two selected TCGA samples. A UMAP visualization is used to project the cells on a two-dimensional plot. A malignant cell-type marker EPCAM and a stromal cell-type marker SFRP4 are used to visualize the GHIST-predicted location of the two cell types. c, Predicted cell-type proportions for the selected TCGA HER2+ individuals. Source data
Fig. 5
Fig. 5. Potential of GHIST to create an in silico modality for multi-view analysis.
a, Cross-validated Kaplan–Meier curves for TCGA HER2 individuals split into high-risk and low-risk groups by the median risk predictions from cell-type-specific gene proportion, nearest-neighbor correlation and RNA-seq data downloaded from TCGA. Shaded regions represent 95% confidence intervals. A two-sided log-rank test was used to calculate the χ² (chi-squared) test statistic and P values of the survival difference between the two groups (n = 92). b, Cell-type-specific differential state genes in macrophage and stromal cells between ER+/PR+ and ER/PR patients. c, The ER+/PR+ group exhibited heterogeneity in expression of LPL, CAVIN2, TIMP4 and ADIPOQ. The clustering method grouped them into two clusters. d, We used the various feature types, that is, cell-type proportion, cell-type-specific expression and spatial features extracted from the predicted gene expression, to build a patient outcome prediction model with the two clusters of ER+/PR+ status refined in b as the patient outcome. Higher accuracy indicates better ability to distinguish the two clusters of ER+/PR+ individuals (n = 54). Each box plot ranges from the first to third quartile with the median as the horizontal line. The lower whisker extends to 1.5 times the interquartile range below the first quartile, while the upper whisker extends to 1.5 times the interquartile range above the third quartile. e, Differential SGE affected by CNA was calculated using two-sided t-test. Volcano plots display three selected hotspots (1, 8, 17q) that affect a number of spatial expression patterns of genes (top). The sum of −log10(P) of each genomic region of the associations between CNAs and SGE (bottom). The P value was reported without multiple-comparison adjustment. FC, fold change. Source data

References

    1. Ståhl, P. L. et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science353, 78–82 (2016). - DOI - PubMed
    1. Chen, W. -T. et al. Spatial transcriptomics and in situ sequencing to study Alzheimer’s disease. Cell182, 976–991 (2020). - DOI - PubMed
    1. Baccin, C. et al. Combined single-cell and spatial transcriptomics reveal the molecular, cellular and spatial bone marrow niche organization. Nat. Cell Biol.22, 38–48 (2019). - DOI - PMC - PubMed
    1. Walker, B. L., Cang, Z., Ren, H., Bourgain-Chang, E. & Nie, Q. Deciphering tissue structure and function using spatial transcriptomics. Commun. Biol.5, 220 (2022). - DOI - PMC - PubMed
    1. Janesick, A. et al. High resolution mapping of the tumor microenvironment using integrated single-cell, spatial and in situ analysis. Nat. Commun.14, 8350 (2023). - DOI - PMC - PubMed

LinkOut - more resources