. 2025 Apr:114:105663.

doi: 10.1016/j.ebiom.2025.105663. Epub 2025 Mar 22.

Deep learning informed multimodal fusion of radiology and pathology to predict outcomes in HPV-associated oropharyngeal squamous cell carcinoma

Bolin Song¹, Amaury Leroy², Kailin Yang³, Tanmoy Dam¹, Xiangxue Wang⁴, Himanshu Maurya¹, Tilak Pathak¹, Jonathan Lee⁵, Sarah Stock⁵, Xiao T Li⁶, Pingfu Fu⁷, Cheng Lu⁸, Paula Toro⁹, Deborah J Chute⁹, Shlomo Koyfman¹⁰, Nabil F Saba¹¹, Mihir R Patel¹², Anant Madabhushi¹³

Affiliations

¹ Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA.
² Therapanacea, Paris, France.
³ Department of Radiation Oncology, Holden Comprehensive Cancer Center, Iowa Neuroscience Institute, University of Iowa, Iowa City, IA, USA.
⁴ Institute of Artificial Intelligence in Medicine, School of Artificial Intelligence in Medicine, Nanjing University of Information Science and Technology, Nanjing, Jiangsu, China.
⁵ Diagnostics Institute, Cleveland Clinic, Cleveland, OH, USA.
⁶ Department of Radiology and Imaging Sciences, Emory University Hospital, Atlanta, GA, USA.
⁷ Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA.
⁸ Department of Radiology, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Guangzhou, China.
⁹ Department of Pathology, Cleveland Clinic, Cleveland, OH, USA.
¹⁰ Department of Radiation Oncology, Taussig Cancer Center, Cleveland Clinic, Cleveland, OH, USA.
¹¹ Department of Hematology and Medical Oncology, Winship Cancer Institute, Atlanta, GA, USA.
¹² Department of Otolaryngology, Winship Cancer Institute, Atlanta, GA, USA.
¹³ Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA; Atlanta Veterans Administration Medical Center, Atlanta, GA, USA. Electronic address: anantm@emory.edu.

PMID: 40121941
PMCID: PMC11979917
DOI: 10.1016/j.ebiom.2025.105663

Deep learning informed multimodal fusion of radiology and pathology to predict outcomes in HPV-associated oropharyngeal squamous cell carcinoma

Bolin Song et al. EBioMedicine. 2025 Apr.

. 2025 Apr:114:105663.

doi: 10.1016/j.ebiom.2025.105663. Epub 2025 Mar 22.

Authors

Affiliations

¹ Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA.
² Therapanacea, Paris, France.
³ Department of Radiation Oncology, Holden Comprehensive Cancer Center, Iowa Neuroscience Institute, University of Iowa, Iowa City, IA, USA.
⁴ Institute of Artificial Intelligence in Medicine, School of Artificial Intelligence in Medicine, Nanjing University of Information Science and Technology, Nanjing, Jiangsu, China.
⁵ Diagnostics Institute, Cleveland Clinic, Cleveland, OH, USA.
⁶ Department of Radiology and Imaging Sciences, Emory University Hospital, Atlanta, GA, USA.
⁷ Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA.
⁸ Department of Radiology, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Guangzhou, China.
⁹ Department of Pathology, Cleveland Clinic, Cleveland, OH, USA.
¹⁰ Department of Radiation Oncology, Taussig Cancer Center, Cleveland Clinic, Cleveland, OH, USA.
¹¹ Department of Hematology and Medical Oncology, Winship Cancer Institute, Atlanta, GA, USA.
¹² Department of Otolaryngology, Winship Cancer Institute, Atlanta, GA, USA.
¹³ Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA; Atlanta Veterans Administration Medical Center, Atlanta, GA, USA. Electronic address: anantm@emory.edu.

PMID: 40121941
PMCID: PMC11979917
DOI: 10.1016/j.ebiom.2025.105663

Abstract

Background: We aim to predict outcomes of human papillomavirus (HPV)-associated oropharyngeal squamous cell carcinoma (OPSCC), a subtype of head and neck cancer characterized with improved clinical outcome and better response to therapy. Pathology and radiology focused AI-based prognostic models have been independently developed for OPSCC, but their integration incorporating both primary tumour (PT) and metastatic cervical lymph node (LN) remains unexamined.

Methods: We investigate the prognostic value of an AI approach termed the swintransformer-based multimodal and multi-region data fusion framework (SMuRF). SMuRF integrates features from CT corresponding to the PT and LN, as well as whole slide pathology images from the PT as a predictor of survival and tumour grade in HPV-associated OPSCC. SMuRF employs cross-modality and cross-region window based multi-head self-attention mechanisms to capture interactions between features across tumour habitats and image scales.

Findings: Developed and tested on a cohort of 277 patients with OPSCC with matched radiology and pathology images, SMuRF demonstrated strong performance (C-index = 0.81 for DFS prediction and AUC = 0.75 for tumour grade classification) and emerged as an independent prognostic biomarker for DFS (hazard ratio [HR] = 17, 95% confidence interval [CI], 4.9-58, p < 0.0001) and tumour grade (odds ratio [OR] = 3.7, 95% CI, 1.4-10.5, p = 0.01) controlling for other clinical variables (i.e., T-, N-stage, age, smoking, sex and treatment modalities). Importantly, SMuRF outperformed unimodal models derived from radiology or pathology alone.

Interpretation: Our findings underscore the potential of multimodal deep learning in accurately stratifying OPSCC risk, informing tailored treatment strategies and potentially refining existing treatment algorithms.

Funding: The National Institutes of Health, the U.S. Department of Veterans Affairs and National Institute of Biomedical Imaging and Bioengineering.

Keywords: Deep learning; Multimodal biomarker; Oropharyngeal cancer; Pathology; Radiology.

Published by Elsevier B.V.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests Dr. Madabhushi is an equity holder in Picture Health, Elucid Bioimaging, and Inspirata Inc. Currently he serves on the advisory board of Picture Health, and SimBioSys. He currently consults for Takeda Inc. He also has sponsored research agreements with AstraZeneca and Bristol Myers-Squibb. His technology has been licenced to Picture Health and Elucid Bioimaging. He is also involved in 2 different R01 grants with Inspirata Inc. He also serves as a member for the Frederick National Laboratory Advisory Committee. Dr. Kailin Yang was supported by RSNA Research Fellow Grant and ASTRO-LUNGevity Foundation Radiation Oncology Seed Grant.

Figures

**Fig. 1**
Flowchart of this study: a) multimodal data curation and annotation; b) preprocessing on pathology WSI, fragment contours (green) are generated using the CLAM toolbox, tumour annotations (red) are provided by pathologists. On radiology CT, primary tumour (yellow) and metastatic cervical lymph node annotations (blue) are provided by radiologists. c) multi-region and multiscale fusion with SwinT; d) model inference: survival and grade predictions. Red regions on CT and WSI indicate the prognostic relevant regions that the model is focusing on. W-MSA: window-based multi-head self-attention; SW-MSA: shifted window multi-head self-attention; SwinT: swin-transformer; HIPT: Hierarchical Image Pyramid Transformer.

**Fig. 2**
The Kaplan–Meier survival analysis of SMuRF for OPSCC DFS stratification from training set (a), validation set (b) and test set (c) and for grade classification from training set (d), validation set (e) and test set (f). Model comparisons of 7 comparable models on the test set (g) accounting for input modalities and cancer habitat regions are considered. Model comparisons using 4 different fusion schemes is performed on the test set (h).

**Fig. 3**
Multivariable Cox regression analysis using DFS as endpoint and multivariable logistic regression using grade as endpoint on the test set (a). Beeswarm plot of SHAP variable importance in the multivariable Cox regression analysis (b) and the multivariable logistic regression analysis (d). Mean SHAP values were converted to proportion for each variable, quantifying their contributions to the DFS predictions (c) and the grade classification (e).

**Fig. 4**
SMuRF histogram distributions on validation and test sets (a), and four representative patient examples of clinical information (b), cropped CT scans with primary tumour (c) and metastatic cervical lymph node annotations (e). Corresponding integrated gradient (IG) attention maps (d, f) highlighted the important regions for predictions. The IG results shown that the deep learning model focused regions within the primary tumour and metastatic cervical lymph nodes.

**Fig. 5**
One example high-SMuRF (a) and one example low-SMuRF (b) pathology WSIs with primary tumour annotations (red boundaries) and IG overlaid to highlight prognostic relevant regions (bottom left) detected by SMuRF. For each pathology WSI, two example 2048 × 2048 regions with corresponding attention heatmaps (c, d) within the highlighted prognostically relevant regions are provided. The attention heatmaps of HIPT model at a patch size of 256 (macro-scale) expressed differently for the high- and low-SMuRF patients: there are more condensed high attention areas related to the tumour-collagen fibre interface (c) for the high-SMuRF WSI than the low-SMuRF WSI, which contains primarily the tumour cell clusters (d). The attention heatmaps at a patch size of 16 (micro-scale) highlighted individual collagen fibre (e) for high-SMuRF while emphasized mainly the tumour cells (f) for low-SMuRF WSI. Presence of individual morphologic hallmarks (i.e., tumour-collagen fibre interface, tumour cell clusters) were evaluated and confirmed by a pathologist (T.P.).

See this image and copyright information in PMC

References

1. Lechner M., Liu J., Masterson L., Fenton T.R. HPV-associated oropharyngeal cancer: epidemiology, molecular biology and clinical management. Nat Rev Clin Oncol. 2022;19(5):306–327. doi: 10.1038/s41571-022-00603-7. - DOI - PMC - PubMed
1. Craig S.G., Anderson L.A., Schache A.G., et al. Recommendations for determining HPV status in patients with oropharyngeal cancers under TNM8 guidelines: a two-tier approach. Br J Cancer. 2019;120(8):827–833. doi: 10.1038/s41416-019-0414-9. - DOI - PMC - PubMed
1. Amin D.R., Philips R., Bertoni D.G., et al. Differences in functional and survival outcomes between patients receiving primary surgery vs chemoradiation therapy for treatment of T1-T2 oropharyngeal squamous cell carcinoma. JAMA Otolaryngol Neck Surg. 2023;149(11):980–986. doi: 10.1001/jamaoto.2023.1944. - DOI - PMC - PubMed
1. Kim M.H., Kim J.-H., Lee J.M., et al. Molecular subtypes of oropharyngeal cancer show distinct immune microenvironment related with immune checkpoint blockade response. Br J Cancer. 2020;122(11):1649–1660. doi: 10.1038/s41416-020-0796-8. - DOI - PMC - PubMed
1. Beaty B.T., Moon D.H., Shen C.J., et al. PIK3CA mutation in HPV-associated OPSCC patients receiving deintensified chemoradiation. J Natl Cancer Inst. 2020;112(8):855–858. doi: 10.1093/jnci/djz224. - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R01 CA220581/CA/NCI NIH HHS/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Deep learning informed multimodal fusion of radiology and pathology to predict outcomes in HPV-associated oropharyngeal squamous cell carcinoma

Affiliations

Deep learning informed multimodal fusion of radiology and pathology to predict outcomes in HPV-associated oropharyngeal squamous cell carcinoma

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources