Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep 23;25(6):bbae551.
doi: 10.1093/bib/bbae551.

Multimodal contrastive learning for spatial gene expression prediction using histology images

Affiliations

Multimodal contrastive learning for spatial gene expression prediction using histology images

Wenwen Min et al. Brief Bioinform. .

Abstract

In recent years, the advent of spatial transcriptomics (ST) technology has unlocked unprecedented opportunities for delving into the complexities of gene expression patterns within intricate biological systems. Despite its transformative potential, the prohibitive cost of ST technology remains a significant barrier to its widespread adoption in large-scale studies. An alternative, more cost-effective strategy involves employing artificial intelligence to predict gene expression levels using readily accessible whole-slide images stained with Hematoxylin and Eosin (H&E). However, existing methods have yet to fully capitalize on multimodal information provided by H&E images and ST data with spatial location. In this paper, we propose mclSTExp, a multimodal contrastive learning with Transformer and Densenet-121 encoder for Spatial Transcriptomics Expression prediction. We conceptualize each spot as a "word", integrating its intrinsic features with spatial context through the self-attention mechanism of a Transformer encoder. This integration is further enriched by incorporating image features via contrastive learning, thereby enhancing the predictive capability of our model. We conducted an extensive evaluation of highly variable genes in two breast cancer datasets and a skin squamous cell carcinoma dataset, and the results demonstrate that mclSTExp exhibits superior performance in predicting spatial gene expression. Moreover, mclSTExp has shown promise in interpreting cancer-specific overexpressed genes, elucidating immune-related genes, and identifying specialized spatial domains annotated by pathologists. Our source code is available at https://github.com/shizhiceng/mclSTExp.

Keywords: histology images; multimodal contrastive learning; spatial transcriptomics; transformer encoder.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The architecture of the proposed mclSTExp model. Step 1: mclSTExp seamlessly integrates spot features with their positional information using the self-attention mechanism of Transformer. Subsequently, it fuses H&E image information through contrastive learning, thus learning a multimodal embedding space enriched with diverse features. Step 2: Projected image patches into the learned multimodal embedding space to query the expressions of the nearest k spotsl; inferred the gene expression of the test image by weighted aggregation of these queried spot expressions.
Figure 2
Figure 2
Evaluation of gene expression prediction on the HER2+ datasets by the PCC (ACGs) between the observed and predicted gene expression by STnet [21], HisToGene [15], His2ST [22], THItoGene [23], BLEEP [24], and mclSTExp.
Figure 3
Figure 3
Evaluation of gene expression prediction on the cSCC datasets by the PCC (ACGs) between the observed and predicted gene expression by STnet [21], HisToGene [15], His2ST [22], THItoGene [23], BLEEP [24], and mclSTExp.
Figure 4
Figure 4
Evaluation of gene expression prediction on the Alex+10x datasets by the PCC (ACGs) between the observed and predicted gene expression by STnet [21], HisToGene [15], His2ST [22], THItoGene [23], BLEEP [24], and mclSTExp.
Figure 5
Figure 5
Visualize the top seven predicted genes in the HER2+ dataset based on the highest average formula image (P-values) calculated across all tissue sections. The P-values are determined based on the correlation between predicted and observed gene expressions. For each of these seven genes, select the tissue section predicted by our model with the smallest P-value for visualization.
Figure 6
Figure 6
We conducted spatial clustering analysis using six H&E images annotated by pathologists from the HER2+ dataset, while “Observed Exp” represents clustering directly using the sequenced gene expression.

References

    1. Rao A, Barkley D, França GS. et al. . Exploring tissue architecture using spatial transcriptomics. Nature 2021;596:211–20. 10.1038/s41586-021-03634-9. - DOI - PMC - PubMed
    1. Alon S, Goodwin DR, Sinha A. et al. . Expansion sequencing: spatially precise in situ transcriptomics in intact biological systems. Science 2021;371:2656–69. - PMC - PubMed
    1. Chen A, Liao S, Cheng M. et al. . Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell 2022;185:1777–1792.e21. 10.1016/j.cell.2022.04.003. - DOI - PubMed
    1. Longo SK, Guo MG, Ji AL. et al. . Integrating single-cell and spatial transcriptomics to elucidate intercellular tissue dynamics. Nat Rev Genet 2021;22:627–44. 10.1038/s41576-021-00370-8. - DOI - PMC - PubMed
    1. Zhao E, Stone MR, Ren X. et al. . Spatial transcriptomics at subspot resolution with bayesspace. Nat Biotechnol 2021;39:1375–84. 10.1038/s41587-021-00935-2. - DOI - PMC - PubMed