Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep;43(9):3085-3097.
doi: 10.1109/TMI.2024.3386108. Epub 2024 Sep 4.

Graph Attention-Based Fusion of Pathology Images and Gene Expression for Prediction of Cancer Survival

Graph Attention-Based Fusion of Pathology Images and Gene Expression for Prediction of Cancer Survival

Yi Zheng et al. IEEE Trans Med Imaging. 2024 Sep.

Abstract

Multimodal machine learning models are being developed to analyze pathology images and other modalities, such as gene expression, to gain clinical and biological insights. However, most frameworks for multimodal data fusion do not fully account for the interactions between different modalities. Here, we present an attention-based fusion architecture that integrates a graph representation of pathology images with gene expression data and concomitantly learns from the fused information to predict patient-specific survival. In our approach, pathology images are represented as undirected graphs, and their embeddings are combined with embeddings of gene expression signatures using an attention mechanism to stratify tumors by patient survival. We show that our framework improves the survival prediction of human non-small cell lung cancers, outperforming existing state-of-the-art approaches that leverage multimodal data. Our framework can facilitate spatial molecular profiling to identify tumor heterogeneity using pathology images and gene expression data, complementing results obtained from more expensive spatial transcriptomic and proteomic technologies.

PubMed Disclaimer

Figures

Fig. 1:
Fig. 1:. Graph attention-based fusion framework.
The mixer framework (left) uses the graph node embeddings and gene expression signature embeddings and jointly learns a spatial fingerprint of the WSI-transcriptomic relationship via an attention-based framework to predict survival. The graph mixer (right) comprises of consecutive node-mixing and channel-mixing layers for learning relationships between adjacent (blue and purple) nodes and more representative node features (blue to green) on the graph. The per-node encoding, per-signature encoding, and prediction modules consist of fully-connected layers. The details of graph mixer, genomic and image fusion, and global attention pooling modules are described in Section II-B.
Fig. 2:
Fig. 2:. Whole slide image (WSI) processing and graph construction.
WSIs were processed using a pipeline involving foreground-background separation, tessellation into image patches followed by construction of an undirected graph. Patch embeddings were generated using a contrastive learning framework (Fig. 3) and used as node features in the graph.
Fig. 3:
Fig. 3:. Feature generation and contrastive learning.
We applied three distinct augmentation functions, including random color distortions, random Gaussian blur, and random cropping followed by resizing back to the original size. The encoder, θ, received an augmented image and generates an embedding vector as the output. These vectors were used for computing contrastive learning loss to train the encoder. After training, we used the embedding vectors for graph construction.
Fig. 4:
Fig. 4:. Survival activation map (SAM) on human NSCLC samples.
The first column shows the H&E WSIs, the second column shows the pathologist annotations of the tissue, the third, and fourth columns indicate the SAMs based on the ISM and the FSM models, respectively. Top row, low-risk LUAD case where annotations are: low-risk related lepidic (dark green) and high-risk related tumor (light gray) histologic patterns. Second row, high-risk LUAD case where annotations are: high-risk related solid histologic pattern (lavender), high-risk related vascular invasion (light green), and low-risk related tumor (light gray) histologic patterns. Third and fourth rows, low-risk and high-risk LUSC cases, respectively where annotations are: tumor tissue (peach). The colorbar is relevant to the heatmaps shown in the last two columns.
Fig. 5:
Fig. 5:. Visualization of salient regions using various interpretable methods.
WSI-level heatmaps highlighting WSI regions associated with survival on four (low- and high-risk LUAD as well as low- and high-risk LUSC) cases are shown (see Fig. 4 for more info). The first column shows the traditional attention-based heatmaps (TAH) generated on the ISM model and the second column shows the ones generated on a multiple instance learning model (Attention MIL [21]). The remaining columns show the co-attention (CoAttn) heatmaps generated on the FSM model for the gene signatures, sig#1-sig#5 (see Section II-A), respectively.
Fig. 6:
Fig. 6:. Quantitative comparison of different model interpretability methods with expert annotations.
The plots display the performance of the FSM SAM (blue), ISM SAM (red), ISM TAH (green), MIL TAH (purple), CoAttn sig#1 (orange), CoAttn sig#2 (brown), CoAttn sig#3 (pink), CoAttn sig#4 (gray), and CoAttn sig#5 (cyan) interpretability methods, in terms of the Dice coefficient across different threshold levels. For each case, the Dice coefficient was computed by generating binarized heatmaps at different thresholds, and comparing them with the pathologist annotations. ISM TAH, MIL TAH, and CoAttn heatmaps are shown in Fig. 5. The four cases correspond to the low- and high-risk LUAD as well as the low- and high-risk LUSC cases presented in Fig. 4, which also include the pathologist annotations and the SAMs. In the LUAD cases, the light gray annotations indicating tumor regions were excluded from the Dice coefficient calculation. This exclusion is because we focused solely on pathologic tumor features or patterns known to be associated with either favorable prognosis in low-risk LUAD or unfavorable prognosis in high-risk LUAD.

References

    1. He B, Bergenstråhle L, Stenbeck L, Abid A, Andersson A, Borg A, Maaskola J, Lundeberg J, and Zou J, “Integrating spatial gene expression and breast tumour morphology via deep learning,” Nat Biomed Eng, vol. 4, no. 8, pp. 827–834, 2020. - PubMed
    1. Tan X, Su A, Tran M, and Nguyen Q, “SpaCell: integrating tissue morphology and spatial gene expression to predict disease cells,” Bioinformatics, vol. 36, no. 7, pp. 2293–2294, 2020. - PubMed
    1. Velten B, Braunger JM, Argelaguet R, Arnol D, Wirbel J, Bredikhin D, Zeller G, and Stegle O, “Identifying temporal and spatial patterns of variation from multimodal data using MEFISTO,” Nat Methods, vol. 19, no. 2, pp. 179–186, 2022. - PMC - PubMed
    1. TCGA Research Network, “The Cancer Genome Atlas Program,” Available: https://portal.gdc.cancer.gov/.
    1. Chen RJ and et al., “Pan-cancer Integrative Histology-Genomic Analysis via Multimodal Deep Learning,” Cancer Cell, vol. 40, no. 8, pp. 865–878.e6, 2022. - PMC - PubMed

Publication types

MeSH terms