Graph Attention-Based Fusion of Pathology Images and Gene Expression for Prediction of Cancer Survival

Yi Zheng, Regan D Conrad, Emily J Green, Eric J Burks, Margrit Betke, Jennifer E Beane, Vijaya B Kolachalama

PMID: 38587959
PMCID: PMC11374469
DOI: 10.1109/TMI.2024.3386108

Graph Attention-Based Fusion of Pathology Images and Gene Expression for Prediction of Cancer Survival

Yi Zheng et al. IEEE Trans Med Imaging. 2024 Sep.

. 2024 Sep;43(9):3085-3097.

doi: 10.1109/TMI.2024.3386108. Epub 2024 Sep 4.

Authors

Yi Zheng, Regan D Conrad, Emily J Green, Eric J Burks, Margrit Betke, Jennifer E Beane, Vijaya B Kolachalama

PMID: 38587959
PMCID: PMC11374469
DOI: 10.1109/TMI.2024.3386108

Abstract

Multimodal machine learning models are being developed to analyze pathology images and other modalities, such as gene expression, to gain clinical and biological insights. However, most frameworks for multimodal data fusion do not fully account for the interactions between different modalities. Here, we present an attention-based fusion architecture that integrates a graph representation of pathology images with gene expression data and concomitantly learns from the fused information to predict patient-specific survival. In our approach, pathology images are represented as undirected graphs, and their embeddings are combined with embeddings of gene expression signatures using an attention mechanism to stratify tumors by patient survival. We show that our framework improves the survival prediction of human non-small cell lung cancers, outperforming existing state-of-the-art approaches that leverage multimodal data. Our framework can facilitate spatial molecular profiling to identify tumor heterogeneity using pathology images and gene expression data, complementing results obtained from more expensive spatial transcriptomic and proteomic technologies.

PubMed Disclaimer

Figures

**Fig. 1:. Graph attention-based fusion framework.**
The mixer framework (left) uses the graph node embeddings and gene expression signature embeddings and jointly learns a spatial fingerprint of the WSI-transcriptomic relationship via an attention-based framework to predict survival. The graph mixer (right) comprises of consecutive node-mixing and channel-mixing layers for learning relationships between adjacent (blue and purple) nodes and more representative node features (blue to green) on the graph. The per-node encoding, per-signature encoding, and prediction modules consist of fully-connected layers. The details of graph mixer, genomic and image fusion, and global attention pooling modules are described in Section II-B.

**Fig. 2:. Whole slide image (WSI) processing and graph construction.**
WSIs were processed using a pipeline involving foreground-background separation, tessellation into image patches followed by construction of an undirected graph. Patch embeddings were generated using a contrastive learning framework (Fig. 3) and used as node features in the graph.

**Fig. 3:. Feature generation and contrastive learning.**
We applied three distinct augmentation functions, including random color distortions, random Gaussian blur, and random cropping followed by resizing back to the original size. The encoder, $θ$ , received an augmented image and generates an embedding vector as the output. These vectors were used for computing contrastive learning loss to train the encoder. After training, we used the embedding vectors for graph construction.

**Fig. 4:. Survival activation map (SAM) on human NSCLC samples.**
The first column shows the H&E WSIs, the second column shows the pathologist annotations of the tissue, the third, and fourth columns indicate the SAMs based on the ISM and the FSM models, respectively. Top row, low-risk LUAD case where annotations are: low-risk related lepidic (dark green) and high-risk related tumor (light gray) histologic patterns. Second row, high-risk LUAD case where annotations are: high-risk related solid histologic pattern (lavender), high-risk related vascular invasion (light green), and low-risk related tumor (light gray) histologic patterns. Third and fourth rows, low-risk and high-risk LUSC cases, respectively where annotations are: tumor tissue (peach). The colorbar is relevant to the heatmaps shown in the last two columns.

**Fig. 5:. Visualization of salient regions using various interpretable methods.**
WSI-level heatmaps highlighting WSI regions associated with survival on four (low- and high-risk LUAD as well as low- and high-risk LUSC) cases are shown (see Fig. 4 for more info). The first column shows the traditional attention-based heatmaps (TAH) generated on the ISM model and the second column shows the ones generated on a multiple instance learning model (Attention MIL [21]). The remaining columns show the co-attention (CoAttn) heatmaps generated on the FSM model for the gene signatures, sig#1-sig#5 (see Section II-A), respectively.

**Fig. 6:. Quantitative comparison of different model interpretability methods with expert annotations.**
The plots display the performance of the FSM SAM (blue), ISM SAM (red), ISM TAH (green), MIL TAH (purple), CoAttn sig#1 (orange), CoAttn sig#2 (brown), CoAttn sig#3 (pink), CoAttn sig#4 (gray), and CoAttn sig#5 (cyan) interpretability methods, in terms of the Dice coefficient across different threshold levels. For each case, the Dice coefficient was computed by generating binarized heatmaps at different thresholds, and comparing them with the pathologist annotations. ISM TAH, MIL TAH, and CoAttn heatmaps are shown in Fig. 5. The four cases correspond to the low- and high-risk LUAD as well as the low- and high-risk LUSC cases presented in Fig. 4, which also include the pathologist annotations and the SAMs. In the LUAD cases, the light gray annotations indicating tumor regions were excluded from the Dice coefficient calculation. This exclusion is because we focused solely on pathologic tumor features or patterns known to be associated with either favorable prognosis in low-risk LUAD or unfavorable prognosis in high-risk LUAD.

See this image and copyright information in PMC

References

1. He B, Bergenstråhle L, Stenbeck L, Abid A, Andersson A, Borg A, Maaskola J, Lundeberg J, and Zou J, “Integrating spatial gene expression and breast tumour morphology via deep learning,” Nat Biomed Eng, vol. 4, no. 8, pp. 827–834, 2020. - PubMed
1. Tan X, Su A, Tran M, and Nguyen Q, “SpaCell: integrating tissue morphology and spatial gene expression to predict disease cells,” Bioinformatics, vol. 36, no. 7, pp. 2293–2294, 2020. - PubMed
1. Velten B, Braunger JM, Argelaguet R, Arnol D, Wirbel J, Bredikhin D, Zeller G, and Stegle O, “Identifying temporal and spatial patterns of variation from multimodal data using MEFISTO,” Nat Methods, vol. 19, no. 2, pp. 179–186, 2022. - PMC - PubMed
1. TCGA Research Network, “The Cancer Genome Atlas Program,” Available: https://portal.gdc.cancer.gov/.
1. Chen RJ and et al., “Pan-cancer Integrative Histology-Genomic Analysis via Multimodal Deep Learning,” Cancer Cell, vol. 40, no. 8, pp. 865–878.e6, 2022. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- IEEE Engineering in Medicine and Biology Society
- PubMed Central
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Graph Attention-Based Fusion of Pathology Images and Gene Expression for Prediction of Cancer Survival

Graph Attention-Based Fusion of Pathology Images and Gene Expression for Prediction of Cancer Survival

Authors

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical