Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Oct 13:2025.10.10.25337691.
doi: 10.1101/2025.10.10.25337691.

Real-World Benchmarking and Validation of Foundation Model Transformers for Endometrial Cancer Subtyping from Histopathology

Affiliations

Real-World Benchmarking and Validation of Foundation Model Transformers for Endometrial Cancer Subtyping from Histopathology

Vincent M Wagner et al. medRxiv. .

Abstract

Purpose: To evaluate whether open-source histopathology foundation model pipelines, paired with attention-based multiple instance learning (MIL), can accurately classify molecular subtypes of endometrial cancer (EC) from whole-slide images (WSIs) and maintain performance in a real-world, independent cohort.

Methods: We assembled a public discovery cohort of 815 patients (1,195 WSIs) from The Cancer Genome Atlas and Clinical Proteomic Tumor Analysis Consortium, and an independent external cohort of 720 patients (1,357 WSIs) with molecular subtyping determined by mismatch repair immunohistochemistry plus TP53 and POLE sequencing. Four ImageNet-pretrained convolutional neural networks (CNNs) and six open-source foundation encoders using two MIL aggregation strategies (TransMIL and CLAM) were benchmarked within the STAMP pipeline. Models were trained with five-fold cross-validation and evaluated on an independent cohort. Macro-area under the receiver operating characteristic curve (AUC) was the primary outcome.

Results: In cross-validation, foundation models outperformed CNNs (macro-AUC 0.799-0.860 vs 0.715-0.829). The best configuration (Virchow2 with CLAM) achieved macro-AUC 0.860 (95%CI, 0.839-0.880), macro-F1 score 0.607, and balanced accuracy 0.647. External validation showed substantial degradation for CNNs, while foundation models retained higher discrimination (macro-AUC 0.667-0.780). UNI2 with CLAM had the highest external macro-AUC (0.780), and Virchow2 with CLAM had the best balanced accuracy (0.525). Subtype-level AUCs for UNI2 with CLAM were highest for p53abn (0.851).

Conclusions: Open-source foundation model pipelines with attention-based MIL can deliver accurate and generalizable molecular subtyping of EC directly from WSIs. These models outperform CNNs in real-world validation, supporting their potential as scalable, cost-effective tools to guide precision oncology and triage confirmatory molecular testing.

PubMed Disclaimer

Conflict of interest statement

Disclaimers: None, the authors declare no potential conflicts of interest.

Figures

Figure 1:
Figure 1:
Schematic and Foundation Feature Extractors. Overview of the computational pathology workflow for molecular subtype prediction from whole-slide image including the convolutional neural network pipeline and vision transformer pipeline with attention pooling using TransMIL or CLAM. The bottom includes a list of the feature extractors used along with information about each including architecture, backbone, training data, parameter count (M) and embedding dimension. Abbreviations: CNN, convolutional neural network; ViT, vision transformer; WSI, whole-slide image; MIL, multiple instance learning; CLAM, clustering-constrained attention multiple instance learning; TCGA, The Cancer Genome Atlas; PAIP, Pathology AI Platform.
Figure 2:
Figure 2:
Model Macro-Metrics in Cross-Validation and External Validation. Comparison of (a) macro–area under the receiver operating characteristic curve (Macro-AUC), (b) macro–F1 score, and (c) balanced accuracy for CNN, TransMIL, and CLAM pipelines across foundation model feature extractors. Light bars represent mean performance across cross-validation folds and dark bars represent mean performance on external validation; error bars show standard deviation. Abbreviations: AUC, area under the curve; F1, F1 score; CLAM, clustering-constrained attention multiple instance learning; CNN, convolutional neural network; MIL, multiple instance learning.
Figure 3:
Figure 3:
Model Subtype-AUC in Cross-Validation and External Validation. Subtype-level area under the receiver operating characteristic curve (AUCs) for (a) p53abn, (b) NSMP, (c) dMMR, and (d) POLE classifiers across CNN, TransMIL, and CLAM pipelines with different feature extractors. Light bars represent mean AUC across cross-validation folds and dark bars represent mean AUC from external validation; error bars show standard deviation. Abbreviations: AUC, area under the curve.
Figure 4:
Figure 4:
Whole Slide Image (WSI), Tiles and Attention Maps. Representative WSIs from each molecular subtype (rows) showing: (left to right) original WSI, tile-level subtype predictions, attention heatmaps, and the top tiles most predictive for that final predicted subtype. Tile prediction maps are color-coded by predicted class as seen along the left side; attention maps highlight regions contributing most to the slide-level prediction with darker areas receiving more attention. Abbreviations: WSI, whole-slide image.
Figure 5:
Figure 5:
Interpretability Analysis by HoVer-Net. (a) Top tiles most predictive for each molecular subtype, (b) corresponding HoVer-Net nuclear segmentation maps, color-coded by cell type: neoplastic (red), connective (green), inflammatory (yellow), dead (black) and (c) violin plots plus jittered dot plot of per-tile proportion (0 to 1) showing the distribution of cell type fractions in top tiles across subtypes.

References

    1. Cancer of the Endometrium - Cancer Stat Facts. SEER https://seer.cancer.gov/statfacts/html/corp.html.
    1. Clarke M. A., Devesa S. S., Hammer A. & Wentzensen N. Racial and Ethnic Differences in Hysterectomy-Corrected Uterine Corpus Cancer Mortality by Stage and Histologic Subtype. JAMA Oncol 8, 895–903 (2022). - PMC - PubMed
    1. Morice P., Leary A., Creutzberg C., Abu-Rustum N. & Darai E. Endometrial cancer. Lancet 387, 1094–1108 (2016). - PubMed
    1. de Boer S. M. et al. Clinical consequences of upfront pathology review in the randomised PORTEC-3 trial for high-risk endometrial cancer. Ann. Oncol. 29, 424–430 (2018). - PMC - PubMed
    1. Cancer Genome Atlas Research Network et al. Integrated genomic and molecular characterization of cervical cancer. Nature 543, 378–384 (2017). - PMC - PubMed

Publication types