Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul;22(7):1568-1582.
doi: 10.1038/s41592-025-02707-1. Epub 2025 May 29.

A visual-omics foundation model to bridge histopathology with spatial transcriptomics

Affiliations

A visual-omics foundation model to bridge histopathology with spatial transcriptomics

Weiqing Chen et al. Nat Methods. 2025 Jul.

Abstract

Artificial intelligence has revolutionized computational biology. Recent developments in omics technologies, including single-cell RNA sequencing and spatial transcriptomics, provide detailed genomic data alongside tissue histology. However, current computational models focus on either omics or image analysis, lacking their integration. To address this, we developed OmiCLIP, a visual-omics foundation model linking hematoxylin and eosin images and transcriptomics using tissue patches from Visium data. We transformed transcriptomic data into 'sentences' by concatenating top-expressed gene symbols from each patch. We curated a dataset of 2.2 million paired tissue images and transcriptomic data across 32 organs to train OmiCLIP integrating histology and transcriptomics. Building on OmiCLIP, our Loki platform offers five key functions: tissue alignment, annotation via bulk RNA sequencing or marker genes, cell-type decomposition, image-transcriptomics retrieval and spatial transcriptomics gene expression prediction from hematoxylin and eosin-stained images. Compared with 22 state-of-the-art models on 5 simulations, and 19 public and 4 in-house experimental datasets, Loki demonstrated consistent accuracy and robustness.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of the study.
a, The workflow of pretraining the OmiCLIP model with paired image–transcriptomics dataset via contrastive learning. b, Workflow of the Loki platform using the OmiCLIP foundation model as an engine. Left diagram illustrates the size of the training data in different organs. Right diagram lists the existing modules of the Loki platform, including tissue alignment, cell-type decomposition, tissue annotation, ST gene expression prediction and histology image–transcriptomics retrieval. Created in BioRender.com. c, The heat map represents image embeddings and transcriptomic embeddings similarity across various organs and disease conditions. The color of the heat map reflects the OmiCLIP’s embedding similarities, with red indicating high similarity and blue indicating low similarity. HCM, hypertrophic cardiomyopathy; HBV, hepatitis B virus infection. d, Schematic illustration of Loki platform with transfer learning for 3D tissue analysis. Created in BioRender.com.
Fig. 2
Fig. 2. Tissue alignment.
a, Schematic illustration of tissue alignment using ST and histology image with Loki Align. Created in BioRender.com. b, Performance comparison of tissue alignment on 100 low-noise and 100 high-noise simulated datasets, represented by the distance between ground truth and aligned simulated sample using Loki (ST-to-ST and image-to-ST) and baseline methods PASTE (ST-to-ST) and GPSA (ST-to-ST), respectively. P values were calculated using a one-sided Wilcoxon test. c, Alignment results on eight adjacent normal human small intestine samples using Loki (ST-to-ST and image-to-ST) and baseline methods PASTE (ST-to-ST), GPSA (ST-to-ST) and CPD (ST-to-ST), respectively. We colored the samples using the top three PCA components of OmiCLIP transcriptomic embeddings, mapped to red, green and blue color channels, respectively. For visualization, we stacked the eight samples together along the perpendicular axis before and after different alignment methods, respectively, and visualized from the side view. The source2 that has no spatial variable gene selected by GPSA to run it is marked as ‘not applicable’ (NA). Box plots show the comparison of tissue alignment performances on these seven source samples respectively and combined, represented by the PCC (and Kendall’s tau coefficient in Supplementary Fig. 1) of highly variable gene expression between target and source samples after alignment at the same location, using Loki and baseline methods (PASTE, GPSA and CPD using PCA embeddings as input), respectively. In the box plots, the middle line represents the median, the box boundaries indicate the interquartile range, and the whiskers extend to data points within 1.5 times the interquartile range. d, Tissue alignment of two adjacent human ovarian carcinosarcoma samples using Loki (ST-to-ST and image-to-ST) and baseline methods PASTE (ST-to-ST), GPSA (ST-to-ST) and CAST (ST-to-ST), respectively. We colored the samples as described in c. e, Alignment performance comparison using PCC and Kendall’s tau coefficient of the highly expressed gene expression between the target sample and the source sample at aligned locations, using Loki (ST-to-ST and image-to-ST) and baseline methods PASTE (ST-to-ST), GPSA (ST-to-ST) and CAST (ST-to-ST), respectively. In the box plots, the middle line represents the median, the box boundaries indicate the interquartile range, and the whiskers extend to data points within 1.5 times the interquartile range; n = 147.
Fig. 3
Fig. 3. Tissue annotation using bulk RNA-seq data.
a, Schematic illustration of tissue annotation using H&E image and reference bulk RNA-seq data from different sources, with OmiCLIP paired image and transcriptomic embeddings. b, Histology WSIs of breast cancer, heart failure and normal breast samples. The major tumor regions, fibroblast cell-enriched regions and adipose regions are annotated by pathology experts in black lines. Heat map shows the similarity of WSIs to the corresponding reference bulk RNA-seq of tumor, fibroblast and adipose, respectively. The color of the heat map reflects the similarities between WSIs and reference bulk RNA-seq data, with red indicating high similarity and blue indicating low similarity. CLAM attention heat maps were generated using CLAM with default parameters.
Fig. 4
Fig. 4. Tissue annotation using marker genes.
a, Schematic illustration of tissue annotation using H&E image and reference marker genes. The annotation result is decided by choosing the candidate texts with the highest similarity score to the input image query. For Loki, we used the text content of marker gene symbols of each tissue type. For the PLIP model, we used the text content of natural language description of each tissue type. b, Examples of similarity scores of images and texts calculated by Loki and OpenAI CLIP model, respectively. c, Comparison of zero-shot performances, represented by weighted F1 scores, across four datasets using Loki and OpenAI CLIP, respectively. Number of test samples for each dataset: CRC7K (n = 6,333); WSSS4LUAD (n = 10,091); LC25000 (n = 15,000); and PatchCamelyon (n = 32,768). d, Comparison of zero-shot performances, represented by weighted F1 scores, across four datasets using Loki, PLIP and incorporating Loki and PLIP models by average similarity (shown in a; Methods), respectively. e, Comparison of zero-shot performances, represented by weighted F1 scores of each tissue type in the CRC7K dataset using OpenAI CLIP model, Loki, PLIP model and incorporating Loki and PLIP models, respectively. f, Confusion matrix of the CRC7K dataset using Loki (left), PLIP model (middle) and incorporating Loki and PLIP models (right), respectively. The ground-truth labels are presented in rows and the predicted labels are presented in columns. ADI, adipose tissue; NOR, normal colon mucosa; TUM, colorectal carcinoma epithelium; LYM, lymphocytes; MUC, mucus; DEB, debris; MUS, smooth muscle; STR, cancer-associated stroma.
Fig. 5
Fig. 5. Cell-type decomposition.
a, Schematic illustration of tissue alignment using ST, reference scRNA-seq data and histology images with OmiCLIP paired transcriptomic and image embeddings after fine-tuning. b, H&E image of our in-house TNBC sample, characterized by Xenium into three major cell types: cancer epithelial, immune and stromal cells. c, Performance comparison of 12 decomposition methods using JS divergence, SSIM and impact scores. z-scores of JS divergence (or SSIM) across methods were calculated based on the average JS divergence (or SSIM) among cell types. The impact score of each method is the average of the z-score of JS divergence and SSIM (Methods). The green color indicates decomposition tools. The blue color indicates the performance of replacing OmiCLIP embeddings with other transcriptomic foundation models’ embeddings. d, Cell-type decomposition results on three major cell types of the TNBC sample using the image by Loki and using ST by Tangram, with Xenium data as ground truth. The color of the heat map reflects the z-score, calculated by the probability distribution of each cell type. e, H&E image of the human colorectal cancer sample and cell-type distribution within the Visium-HD capture area. f, Bar plot shows the accuracy of decomposition on four major cell types by Loki using ST or image mode, and by Tangram using ST. Error bars indicate the standard deviation and the center values represent the mean. For both JS divergence and SSIM, adjusted P value > 0.1 using a two-sided Wilcoxon test. g, Whole-slide (20 mm × 13 mm) human colorectal cancer cell-type decomposition. Different tissue regions are annotated by the pathologist as ground truth. Heat map shows the cell-type distribution of fibroblast, tumor, intestinal epithelial, smooth muscle and immune/inflammatory cells, with color reflecting the density of each cell type. CLAM attention heat maps were generated using CLAM with default parameters. h, Cell-type decomposition results on the brain sample. Left, brain anatomic references with zoom-in H&E image patches of L1 (VLMCs, astrocytes), L2/3, L4/5, L6 and white matter (WM; oligodendrocytes), respectively. Created in BioRender.com. Right, heat map shows the cell-type distribution of VLMCs, astrocytes, L2/3, L4/5, L6 and oligodendrocytes, with color reflecting the distribution of each cell type.
Fig. 6
Fig. 6. Image-to-transcriptomics retrieval.
a, Schematic illustration of image-to-transcriptomics retrieval on the ST-bank dataset. b, Example image-to-transcriptomics retrieval results. For each example image from adipose tissue, colorectal adenocarcinoma epithelium, lymphocytes, smooth muscle and normal colon mucosa, the retrieved top 50 most similar transcriptomics are shown by the paired image from the ST-bank dataset. c, Image-to-transcriptomics retrieval similarity scores across the four validation datasets—CRC7K, WSSS4LUAD, LC25000 and PatchCamelyon—using Loki, OpenAI CLIP and PLIP. In the box plots, the middle line represents the median, the box boundaries indicate the interquartile range, and the whiskers extend to data points within 1.5 times the interquartile range. d, Image-to-transcriptomics retrieval similarity scores across the eight in-house human tissues: heart failure (HF), Alzheimer’s disease (AD), metaplastic breast cancer (MPBC) and TNBC, using Loki, OpenAI CLIP and PLIP. In the box plots, the middle line represents the median, the box boundaries indicate the interquartile range, and the whiskers extend to data points within 1.5 times the interquartile range. e, Image-to-transcriptomics retrieval evaluation across four validation datasets and one test dataset using Loki, OpenAI CLIP and PLIP, with random baseline. The top-K quantile most similar transcriptomics were retrieved. We report Recall@K for K ∈ {5%, 10%} (Methods). f, Example image-to-transcriptomics retrieval results. The retrieved transcriptomics are shown by the paired image.
Extended Data Fig. 1
Extended Data Fig. 1. Image and transcriptomic representations.
a, Clustering performance on ST-bank data with cell type annotation. Left: clustering performance using transcriptomic embeddings generated from OmiCLIP model before and after training. Right: clustering performance usings image embeddings from OmiCLIP model before and after training. The Calinski-Harabasz scores were calculated on the embeddings (Methods) using the pretrained OmiCLIP transcriptomic (left) and image (right) encoders, evaluated for each organ type. Higher Calinski-Harabasz scores indicate better separation capability between clusters of the embeddings. In the box plots, the middle line represents the median, the box boundaries indicate the interquartile range, and the whiskers extend to data points within 1.5× the interquartile range. b, Image and transcriptomic embeddings of the lung, kidney cancer, healthy heart, and Myocardial Infarction (MI) heart samples. Each row corresponds to a WSI and showcases information from two modalities. The first column are H&E images showing tissue morphology; the second column are the heatmaps of ST data with the colors indicating the cell types; the third column are the UMAP of image embeddings colored by cell types before and after contrastive learning; the fourth column are the UMAP of transcriptomics embeddings colored by cell types before and after contrastive learning.
Extended Data Fig. 2
Extended Data Fig. 2. Image and transcriptomic representations analysis.
a, Clustering performance on all ST-bank data. Top: clustering performance using transcriptomic embeddings generated from OmiCLIP model before and after training. Bottom: clustering performance usings image embeddings from OmiCLIP model before and after training, and image embeddings generated from UNI and Pro-GigaPath, respectively. The Calinski-Harabasz scores were calculated on the embeddings using the pre-trained OmiCLIP transcriptomic (top) and image (bottom) encoders, evaluated for each organ type. Higher Calinski-Harabasz scores indicate better separation capability between clusters of the embeddings. In the box plots, the middle line represents the median, the box boundaries indicate the interquartile range, and the whiskers extend to data points within 1.5× the interquartile range. Sample sizes are skin: 163, brain: 119, breast: 97, heart: 73, kidney: 73, embryo: 73, others: 64, liver: 57, prostate: 49, spinal cord: 44, ovary: 32, colon: 29, pancreas: 25, lung: 22, tonsil: 18, uterus: 17, adipose: 15, small intestine: 14, and stomach: 12. b, Image and transcriptomic embeddings of the spinal cord, liver cancer, brain cancer, kidney cancer and skin cancer samples. Each row corresponds to a WSI and showcases information from two modalities. The first column are H&E images showing tissue morphology; the second column are the heatmaps of ST data with the colors indicating the ST data clustering using Leiden algorithm (Methods); the third column are the UMAP of image embeddings colored by ST Leiden clusters before and after contrastive learning; the fourth column are the UMAP of transcriptomics embeddings colored by ST Leiden clusters before and after contrastive learning.
Extended Data Fig. 3
Extended Data Fig. 3. OmiCLIP’s robustness for image quality and sequencing depth.
a, Example image with low-quality region marked in red line and simulated low-quality image by adding Gaussian noise. b, Cosine similarity of paired transcriptomic and image embeddings using OmiCLIP (original image and simulated low-quality image), PLIP (original image), and OpenAI CLIP (original image). In the box plots, the middle line represents the median, the box boundaries indicate the interquartile range, and the whiskers extend to data points within 1.5× the interquartile range. Sample sizes are 10 for each simulated condition. c, Cosine similarity of the paired image with transcriptomic embeddings using OmiCLIP (original transcriptomes and down sampled transcriptome from high sequencing depth to middle sequencing depth, middle sequencing depth to low sequencing depth, and high sequencing depth to low sequencing depth, respectively), PLIP (original transcriptome), and OpenAI CLIP (original transcriptome). In the box plots, the middle line represents the median, the box boundaries indicate the interquartile range, and the whiskers extend to data points within 1.5× the interquartile range, n = 500.
Extended Data Fig. 4
Extended Data Fig. 4. Visium and Xenium tissue alignment.
Tissue alignment results on breast cancer sample using Loki Align. a, Source Xenium ST data. b, Target Visium ST data. c, Xenium ST data after Loki alignment.
Extended Data Fig. 5
Extended Data Fig. 5. Cell type decomposition of TNBC case study.
a, Xenium data from our in-house TNBC patient sample, colored by Louvain clusters and cell types, respectively. b, H&E image, marker gene expression (KRT7, ATCG2, RORC), and cell type distribution in an example zoom-in region of the TNBC sample. c, Cell type decomposition results on 3 major cell types of the TNBC sample using ST by RCTD, CARD, scGPT, Spatial Seurat, scFoundation, GeneFormer, CytoSPACE, Cell2location, and SpatialDWLS, respectively. The color of the heatmap reflects the z-score, calculated by the enrichment of each cell type.
Extended Data Fig. 6
Extended Data Fig. 6. Cell type decomposition of colorectal case study.
a, UMAP representation of the OmiCLIP transcriptomic embeddings colored by cell types, where each dot represents a spot. b, Cell type decomposition result using Loki ST decomposition and Loki image decomposition respectively on human colorectal sample within the Visium HD capture area, and ground truth. Heatmap shows the cell type distribution of tumor, fibroblast, smooth muscle, and intestinal epithelial, respectively, with color reflecting the probability of each cell type.
Extended Data Fig. 7
Extended Data Fig. 7. Cell type decomposition of fine-tuning, pre-training, and train from scratch.
a, Cell type decomposition results on 3 major cell types of the TNBC sample using Loki Decompose Image-to-ST (fine-tuning, pre-training, and train from scratch). The color of the heatmap reflects the z-score, calculated by the enrichment of each cell type. b, Bar plot shows the accuracy of decomposition of 3 major cell types by Loki Decompose Image-to-ST (fine-tuning, pre-training, and train from scratch). Error bar is standard deviation with center measured by mean.
Extended Data Fig. 8
Extended Data Fig. 8. Examples of ST gene expression prediction.
H&E images, ground truth ST gene expression, and ST gene expression predicted by Loki, Hist2ST, HisToGene, BLEEP, and mclSTExp, respectively.
Extended Data Fig. 9
Extended Data Fig. 9. Comparison of ST gene expression prediction performances.
a, Comparison of ST gene expression prediction performances, represented by MSE and PCC respectively on 39 normal heart tissues using Loki, Hist2ST, HisToGene, BLEEP, and mclSTExp, respectively. In the box plots, the middle line represents the median, the box boundaries indicate the interquartile range, and the whiskers extend to data points within 1.5× the interquartile range. b, Summarized comparison of ST gene expression prediction performances, represented by MSE and PCC respectively across all samples using Loki, HisToGene, mclSTExp, BLEEP, and Hist2ST respectively. In the box plots, the middle line represents the median, the box boundaries indicate the interquartile range, and the whiskers extend to data points within 1.5× the interquartile range.
Extended Data Fig. 10
Extended Data Fig. 10. Summary of the fine-tuning settings for downstream tasks.
Recommendation settings for downstream tasks.

Update of

References

    1. Hegde, N. et al. Similar image search for histopathology: SMILY. NPJ Digit. Med.2, 56 (2019). - PMC - PubMed
    1. Chen, C. et al. Fast and scalable search of whole-slide images via self-supervised deep learning. Nat. Biomed. Eng.6, 1420–1434 (2022). - PMC - PubMed
    1. Huang, Z. et al. Artificial intelligence reveals features associated with breast cancer neoadjuvant chemotherapy responses from multi-stain histopathologic images. NPJ Precis. Oncol.7, 14 (2023). - PMC - PubMed
    1. Lu, M. Y. et al. AI-based pathology predicts origins for cancers of unknown primary. Nature594, 106–110 (2021). - PubMed
    1. Zhu, L. et al. An accurate prediction of the origin for bone metastatic cancer using deep learning on digital pathological images. EBioMedicine87, 104426 (2023). - PMC - PubMed

LinkOut - more resources