Multimodal contrastive learning for spatial gene expression prediction using histology images

Wenwen Min¹, Zhiceng Shi¹, Jun Zhang¹, Jun Wan², Changmiao Wang³

Affiliations

¹ School of Information Science and Engineering, Yunnan University, East Outer Ring Road, Chenggong District, Kunming 650500, Yunnan, China.
² School of Information and Engineering, Zhongnan University of Economics and Law, 182 South Lake Avenue, East Lake New Technology Development Zone, Wuhan 430073, Hubei, China.
³ Medical Big Data, Shenzhen Research Institute of Big Data, Longxiang Boulevard, Longgang District, Shenzhen 518172, Guangdong, China.

PMID: 39471412
PMCID: PMC11952928
DOI: 10.1093/bib/bbae551

Multimodal contrastive learning for spatial gene expression prediction using histology images

Wenwen Min et al. Brief Bioinform. 2024.

. 2024 Sep 23;25(6):bbae551.

doi: 10.1093/bib/bbae551.

Authors

Wenwen Min¹, Zhiceng Shi¹, Jun Zhang¹, Jun Wan², Changmiao Wang³

Affiliations

¹ School of Information Science and Engineering, Yunnan University, East Outer Ring Road, Chenggong District, Kunming 650500, Yunnan, China.
² School of Information and Engineering, Zhongnan University of Economics and Law, 182 South Lake Avenue, East Lake New Technology Development Zone, Wuhan 430073, Hubei, China.
³ Medical Big Data, Shenzhen Research Institute of Big Data, Longxiang Boulevard, Longgang District, Shenzhen 518172, Guangdong, China.

PMID: 39471412
PMCID: PMC11952928
DOI: 10.1093/bib/bbae551

Abstract

In recent years, the advent of spatial transcriptomics (ST) technology has unlocked unprecedented opportunities for delving into the complexities of gene expression patterns within intricate biological systems. Despite its transformative potential, the prohibitive cost of ST technology remains a significant barrier to its widespread adoption in large-scale studies. An alternative, more cost-effective strategy involves employing artificial intelligence to predict gene expression levels using readily accessible whole-slide images stained with Hematoxylin and Eosin (H&E). However, existing methods have yet to fully capitalize on multimodal information provided by H&E images and ST data with spatial location. In this paper, we propose mclSTExp, a multimodal contrastive learning with Transformer and Densenet-121 encoder for Spatial Transcriptomics Expression prediction. We conceptualize each spot as a "word", integrating its intrinsic features with spatial context through the self-attention mechanism of a Transformer encoder. This integration is further enriched by incorporating image features via contrastive learning, thereby enhancing the predictive capability of our model. We conducted an extensive evaluation of highly variable genes in two breast cancer datasets and a skin squamous cell carcinoma dataset, and the results demonstrate that mclSTExp exhibits superior performance in predicting spatial gene expression. Moreover, mclSTExp has shown promise in interpreting cancer-specific overexpressed genes, elucidating immune-related genes, and identifying specialized spatial domains annotated by pathologists. Our source code is available at https://github.com/shizhiceng/mclSTExp.

Keywords: histology images; multimodal contrastive learning; spatial transcriptomics; transformer encoder.

PubMed Disclaimer

Figures

**Figure 1**
The architecture of the proposed mclSTExp model. Step 1: mclSTExp seamlessly integrates spot features with their positional information using the self-attention mechanism of Transformer. Subsequently, it fuses H&E image information through contrastive learning, thus learning a multimodal embedding space enriched with diverse features. Step 2: Projected image patches into the learned multimodal embedding space to query the expressions of the nearest k spotsl; inferred the gene expression of the test image by weighted aggregation of these queried spot expressions.

**Figure 2**
Evaluation of gene expression prediction on the HER2+ datasets by the PCC (ACGs) between the observed and predicted gene expression by STnet [21], HisToGene [15], His2ST [22], THItoGene [23], BLEEP [24], and mclSTExp.

**Figure 3**
Evaluation of gene expression prediction on the cSCC datasets by the PCC (ACGs) between the observed and predicted gene expression by STnet [21], HisToGene [15], His2ST [22], THItoGene [23], BLEEP [24], and mclSTExp.

**Figure 4**
Evaluation of gene expression prediction on the Alex+10x datasets by the PCC (ACGs) between the observed and predicted gene expression by STnet [21], HisToGene [15], His2ST [22], THItoGene [23], BLEEP [24], and mclSTExp.

**Figure 5**
Visualize the top seven predicted genes in the HER2+ dataset based on the highest average (P-values) calculated across all tissue sections. The P-values are determined based on the correlation between predicted and observed gene expressions. For each of these seven genes, select the tissue section predicted by our model with the smallest P-value for visualization.

formula image — **Figure 5**
Visualize the top seven predicted genes in the HER2+ dataset based on the highest average (P-values) calculated across all tissue sections. The P-values are determined based on the correlation between predicted and observed gene expressions. For each of these seven genes, select the tissue section predicted by our model with the smallest P-value for visualization.

**Figure 6**
We conducted spatial clustering analysis using six H&E images annotated by pathologists from the HER2+ dataset, while “Observed Exp” represents clustering directly using the sequenced gene expression.

See this image and copyright information in PMC

References

1. Rao A, Barkley D, França GS. et al. Exploring tissue architecture using spatial transcriptomics. Nature 2021;596:211–20. 10.1038/s41586-021-03634-9. - DOI - PMC - PubMed
1. Alon S, Goodwin DR, Sinha A. et al. Expansion sequencing: spatially precise in situ transcriptomics in intact biological systems. Science 2021;371:2656–69. - PMC - PubMed
1. Chen A, Liao S, Cheng M. et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell 2022;185:1777–1792.e21. 10.1016/j.cell.2022.04.003. - DOI - PubMed
1. Longo SK, Guo MG, Ji AL. et al. Integrating single-cell and spatial transcriptomics to elucidate intercellular tissue dynamics. Nat Rev Genet 2021;22:627–44. 10.1038/s41576-021-00370-8. - DOI - PMC - PubMed
1. Zhao E, Stone MR, Ren X. et al. Spatial transcriptomics at subspot resolution with bayesspace. Nat Biotechnol 2021;39:1375–84. 10.1038/s41587-021-00935-2. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Multimodal contrastive learning for spatial gene expression prediction using histology images

Affiliations

Multimodal contrastive learning for spatial gene expression prediction using histology images

Authors

Affiliations

Abstract

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources