Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Sep 3;9(4):513-532.
doi: 10.1007/s41666-025-00212-w. eCollection 2025 Dec.

Multimodal Data Fusion for Whole-Slide Histopathology Image Classification

Affiliations

Multimodal Data Fusion for Whole-Slide Histopathology Image Classification

Yiran Song et al. J Healthc Inform Res. .

Abstract

Whole slide images (WSIs) are critical for cancer diagnosis but pose computational challenges due to their gigapixel resolution. While automated AI tools can accelerate diagnostic workflows, they often rely on precise annotations and require substantial training data. Integrating multimodal data-such as WSIs and corresponding pathology reports-offers a promising solution to improve classification accuracy and reduce diagnostic variability. In this study, we introduce MPath-Net, an end-to-end multimodal framework that combines WSIs and pathology reports for enhanced cancer subtype classification. Using the TCGA dataset (1684 cases: 916 kidney, 768 lung), we applied multiple-instance learning (MIL) for WSI feature extraction and Sentence-BERT for report encoding, followed by joint fine-tuning for tumor classification. MPath-Net achieved 94.65% accuracy, 0.9553 precision, 0.9472 recall, and 0.9473 F1-score, significantly outperforming baseline models (P < 0.05). In addition, attention heatmaps provided interpretable tumor tissue localization, demonstrating the clinical utility of our approach. These findings suggest that MPath-Net can support pathologists by improving diagnostic accuracy, reducing inter-reader variability, and advancing precision medicine through multimodal AI integration.

Keywords: Classification; Clinical report; Multimodal learning; Pathology; Whole-slide image.

PubMed Disclaimer

Conflict of interest statement

Competing interestsThe authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Block diagram of our proposed end-to-end multi-modal WSI classification model MPath-Net, (A) A typical WSI; (B) Patch extraction from WSI; (C) Image feature extraction; (D) A typical pathology report in pdf format; (E) Preprocessing of pathology report; (F) Text feature embedding generation using Sentence-BERT model
Fig. 2
Fig. 2
Heatmap visualization of WSI: A Original WSI; two baseline methods, B ACMIL, C ABMIL, and D our proposed model MPath-Net. Column 3 shows a zoomed region as marked by a blue box from Column 2

References

    1. Gurcan MN, Boucheron LE, Can A, Madabhushi A, Rajpoot NM, Yener B (2009) Histopathological image analysis: a review. IEEE Rev Biomed Eng 2:147–171 - PMC - PubMed
    1. Silva LAV, Rohr K (2020) Pan-cancer prognosis prediction using multimodal deep learning. In: 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI). IEEE, pp 568–571
    1. Li S, Shi H, Sui D, Hao A and Qin H (2020) A novel pathological images and genomic data fusion framework for breast cancer survival prediction. 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, p. 1384–7. - PubMed
    1. Mobadersany P, Yousefi S, Amgad M et al (2018) Predicting cancer outcomes from histology and genomics using convolutional networks. Proc Natl Acad Sci U S A 115:E2970–E2979 - PMC - PubMed
    1. Li B, Li Y, Eliceiri KW (2021) Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14318–14328 - PMC - PubMed