Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun 6:16:1592950.
doi: 10.3389/fphar.2025.1592950. eCollection 2025.

A pathology-attention multi-instance learning framework for multimodal classification of colorectal lesions

Affiliations

A pathology-attention multi-instance learning framework for multimodal classification of colorectal lesions

Fanglei Fu et al. Front Pharmacol. .

Erratum in

Abstract

Introduction: Colorectal cancer is the third most common cancer worldwide, and accurate pathological diagnosis is crucial for clinical intervention and prognosis assessment. Although deep learning has shown promise in classifying whole slide images (WSIs) in digital pathology, existing weakly supervised methods struggle to fully model the multimodal diagnostic process, which involves both visual feature analysis and pathological knowledge. Additionally, staining variability and tissue heterogeneity hinder model generalization.

Methods: We propose a multimodal weakly supervised learning framework named PAT-MIL (Pathology-Attention-MIL), which performs five-class WSI-level classification. The model integrates dynamic attention mechanisms with expert-defined text prototypes. It includes: (1) the construction of pathology knowledge-driven text prototypes for semantic guidance, (2) a refinement strategy that gradually adjusts category centers to adaptively improve prototype distribution, and (3) a loss balancing method that dynamically adjusts training weights based on gradient feedback to optimize both visual clustering and semantic alignment.

Results: PAT-MIL achieves an accuracy of 86.45% (AUC = 0.9624) on an internal five-class dataset, outperforming ABMIL and DSMIL by +2.96% and +2.19%, respectively. On external datasets CRS-2024 and UniToPatho, the model reaches 95.78% and 84.09% accuracy, exceeding the best baselines by 2.22% and 5.68%, respectively.

Discussion: These results demonstrate that PAT-MIL effectively mitigates staining variability and enhances cross-center generalization through the collaborative modeling of visual and textual modalities. It provides a robust solution for colorectal lesion classification without relying on pixel-level annotations, advancing the field of multimodal pathological image analysis.

Keywords: colorectal cancer; multimodal learning; pathology attention; weakly supervised learning; whole slide image classification.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The reviewer YJ declared a shared parent affiliation with the authors FF, MF, JP, YH, TG, JL, and LZ to the handling editor at the time of review.

Figures

FIGURE 1
FIGURE 1
Workflow of the deep learning model. (A) Data Source and Division: This study utilized 5062 H&E stained WSIs from four different centers. Data from Liuzhou Hospital served as the internal dataset for model training, while data from Xijing Hospital was used as an external test set. Additionally, two publicly available datasets were used to construct extra external datasets to evaluate the model’s generalization capability. (B) Construction and Optimization of Encoder: The image encoder and text encoder used in the model were trained through contrastive learning on large-scale pathology image-text pairs. The text content was optimized and adjusted by pathology experts to capture more robust pathological representations, thereby enhancing the model’s performance in practical applications. (C) Data Preprocessing: After digitizing the slides, the tissue regions were segmented, and the entire WSI was divided into multiple patches to facilitate subsequent feature extraction and analysis. (D) Model Computation Process: The core computation process of the deep learning model is divided into three stages: (1) Slide-level feature generation and prediction based on images; (2) Slide-level feature generation and prediction based on text; (3) Loss calculation dynamically adjusted according to the loss gradient to balance the contributions of image and text features, thereby optimizing the final classification performance.
FIGURE 2
FIGURE 2
Selection of baseline feature extractors.
FIGURE 3
FIGURE 3
Confusion matrices for different datasets.
FIGURE 4
FIGURE 4
Confusion matrix for the external test set.
FIGURE 5
FIGURE 5
t-SNE Dimensionality Reduction Plot–Left: Proposed Method; Right: AB-MIL.
FIGURE 6
FIGURE 6
Visualization heatmap results in the colorectal 5-classification dataset.
FIGURE 7
FIGURE 7
Ablation study.

References

    1. Barbano C. A., Perlo D., Tartaglione E., Fiandrotti A., Bertero L., Cassoni P., et al. (2021). “Unitopatho, a labeled histopathological dataset for colorectal polyps classification and adenoma dysplasia grading,” in 2021 IEEE International Conference on Image Processing (ICIP) (IEEE; ), 76–80.
    1. Bontempo G., Porrello A., Calderara S., Ficarra E., Bolelli F. (2023). “DAS-MIL: distilling across scales for MIL classification of histological WSIs,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (Cham: Springer Nature Switzerland; ), 248–258.
    1. Boserup N., Selvan R. (2022). Efficient self-supervision using patch-based contrastive learning for histopathology image segmentation. arXiv preprint arXiv:2208.10779.
    1. Bray F., Laversanne M., Sung H., Ferlay J., Siegel R. L., Soerjomataram I., et al. (2024). Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA a cancer J. Clin. 74 (3), 229–263. 10.3322/caac.21834 - DOI - PubMed
    1. Chen R. J., Ding T., Lu M. Y., Williamson D. F. K., Jaume G., Song A. H., et al. (2024). Towards a general-purpose foundation model for computational pathology. Nat. Med. 30 (3), 850–862. 10.1038/s41591-024-02857-3 - DOI - PMC - PubMed

LinkOut - more resources