This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2025 Oct 13:2025.10.10.25337691.

doi: 10.1101/2025.10.10.25337691.

Real-World Benchmarking and Validation of Foundation Model Transformers for Endometrial Cancer Subtyping from Histopathology

Vincent M Wagner¹, Casey M Cosgrove², Stephanie J Chen³, Daniel T Griffin³, Megan I Samuelson³, Michael J Goodheart¹, Jesus Gonzalez-Bosquet¹

Affiliations

¹ University of Iowa, Department of Obstetrics and Gynecology, Division of Gynecologic Oncology, Iowa City, IA.
² The Ohio State University Comprehensive Cancer Center/James Cancer Hospital, Department of Obstetrics and Gynecologic, Division of Gynecologic Oncology, Columbus, OH.
³ University of Iowa, Department of Pathology, Iowa City, IA.

PMID: 41282936
PMCID: PMC12633087
DOI: 10.1101/2025.10.10.25337691

Real-World Benchmarking and Validation of Foundation Model Transformers for Endometrial Cancer Subtyping from Histopathology

Vincent M Wagner et al. medRxiv. 2025.

[Preprint]. 2025 Oct 13:2025.10.10.25337691.

doi: 10.1101/2025.10.10.25337691.

Authors

Vincent M Wagner¹, Casey M Cosgrove², Stephanie J Chen³, Daniel T Griffin³, Megan I Samuelson³, Michael J Goodheart¹, Jesus Gonzalez-Bosquet¹

Affiliations

¹ University of Iowa, Department of Obstetrics and Gynecology, Division of Gynecologic Oncology, Iowa City, IA.
² The Ohio State University Comprehensive Cancer Center/James Cancer Hospital, Department of Obstetrics and Gynecologic, Division of Gynecologic Oncology, Columbus, OH.
³ University of Iowa, Department of Pathology, Iowa City, IA.

PMID: 41282936
PMCID: PMC12633087
DOI: 10.1101/2025.10.10.25337691

Abstract

Purpose: To evaluate whether open-source histopathology foundation model pipelines, paired with attention-based multiple instance learning (MIL), can accurately classify molecular subtypes of endometrial cancer (EC) from whole-slide images (WSIs) and maintain performance in a real-world, independent cohort.

Methods: We assembled a public discovery cohort of 815 patients (1,195 WSIs) from The Cancer Genome Atlas and Clinical Proteomic Tumor Analysis Consortium, and an independent external cohort of 720 patients (1,357 WSIs) with molecular subtyping determined by mismatch repair immunohistochemistry plus TP53 and POLE sequencing. Four ImageNet-pretrained convolutional neural networks (CNNs) and six open-source foundation encoders using two MIL aggregation strategies (TransMIL and CLAM) were benchmarked within the STAMP pipeline. Models were trained with five-fold cross-validation and evaluated on an independent cohort. Macro-area under the receiver operating characteristic curve (AUC) was the primary outcome.

Results: In cross-validation, foundation models outperformed CNNs (macro-AUC 0.799-0.860 vs 0.715-0.829). The best configuration (Virchow2 with CLAM) achieved macro-AUC 0.860 (95%CI, 0.839-0.880), macro-F1 score 0.607, and balanced accuracy 0.647. External validation showed substantial degradation for CNNs, while foundation models retained higher discrimination (macro-AUC 0.667-0.780). UNI2 with CLAM had the highest external macro-AUC (0.780), and Virchow2 with CLAM had the best balanced accuracy (0.525). Subtype-level AUCs for UNI2 with CLAM were highest for p53abn (0.851).

Conclusions: Open-source foundation model pipelines with attention-based MIL can deliver accurate and generalizable molecular subtyping of EC directly from WSIs. These models outperform CNNs in real-world validation, supporting their potential as scalable, cost-effective tools to guide precision oncology and triage confirmatory molecular testing.

PubMed Disclaimer

Conflict of interest statement

Disclaimers: None, the authors declare no potential conflicts of interest.

Figures

**Figure 1:**
Schematic and Foundation Feature Extractors. Overview of the computational pathology workflow for molecular subtype prediction from whole-slide image including the convolutional neural network pipeline and vision transformer pipeline with attention pooling using TransMIL or CLAM. The bottom includes a list of the feature extractors used along with information about each including architecture, backbone, training data, parameter count (M) and embedding dimension. Abbreviations: CNN, convolutional neural network; ViT, vision transformer; WSI, whole-slide image; MIL, multiple instance learning; CLAM, clustering-constrained attention multiple instance learning; TCGA, The Cancer Genome Atlas; PAIP, Pathology AI Platform.

**Figure 2:**
Model Macro-Metrics in Cross-Validation and External Validation. Comparison of (a) macro–area under the receiver operating characteristic curve (Macro-AUC), (b) macro–F1 score, and (c) balanced accuracy for CNN, TransMIL, and CLAM pipelines across foundation model feature extractors. Light bars represent mean performance across cross-validation folds and dark bars represent mean performance on external validation; error bars show standard deviation. Abbreviations: AUC, area under the curve; F1, F1 score; CLAM, clustering-constrained attention multiple instance learning; CNN, convolutional neural network; MIL, multiple instance learning.

**Figure 3:**
Model Subtype-AUC in Cross-Validation and External Validation. Subtype-level area under the receiver operating characteristic curve (AUCs) for (a) p53abn, (b) NSMP, (c) dMMR, and (d) POLE classifiers across CNN, TransMIL, and CLAM pipelines with different feature extractors. Light bars represent mean AUC across cross-validation folds and dark bars represent mean AUC from external validation; error bars show standard deviation. Abbreviations: AUC, area under the curve.

**Figure 4:**
Whole Slide Image (WSI), Tiles and Attention Maps. Representative WSIs from each molecular subtype (rows) showing: (left to right) original WSI, tile-level subtype predictions, attention heatmaps, and the top tiles most predictive for that final predicted subtype. Tile prediction maps are color-coded by predicted class as seen along the left side; attention maps highlight regions contributing most to the slide-level prediction with darker areas receiving more attention. Abbreviations: WSI, whole-slide image.

**Figure 5:**
Interpretability Analysis by HoVer-Net. (a) Top tiles most predictive for each molecular subtype, (b) corresponding HoVer-Net nuclear segmentation maps, color-coded by cell type: neoplastic (red), connective (green), inflammatory (yellow), dead (black) and (c) violin plots plus jittered dot plot of per-tile proportion (0 to 1) showing the distribution of cell type fractions in top tiles across subtypes.

See this image and copyright information in PMC

References

1. Cancer of the Endometrium - Cancer Stat Facts. SEER https://seer.cancer.gov/statfacts/html/corp.html.
1. Clarke M. A., Devesa S. S., Hammer A. & Wentzensen N. Racial and Ethnic Differences in Hysterectomy-Corrected Uterine Corpus Cancer Mortality by Stage and Histologic Subtype. JAMA Oncol 8, 895–903 (2022). - PMC - PubMed
1. Morice P., Leary A., Creutzberg C., Abu-Rustum N. & Darai E. Endometrial cancer. Lancet 387, 1094–1108 (2016). - PubMed
1. de Boer S. M. et al. Clinical consequences of upfront pathology review in the randomised PORTEC-3 trial for high-risk endometrial cancer. Ann. Oncol. 29, 424–430 (2018). - PMC - PubMed
1. Cancer Genome Atlas Research Network et al. Integrated genomic and molecular characterization of cervical cancer. Nature 543, 378–384 (2017). - PMC - PubMed

Publication types

Actions

Grants and funding

K12 TR004382/TR/NCATS NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Cold Spring Harbor Laboratory
- PubMed Central
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

Real-World Benchmarking and Validation of Foundation Model Transformers for Endometrial Cancer Subtyping from Histopathology

Affiliations

Real-World Benchmarking and Validation of Foundation Model Transformers for Endometrial Cancer Subtyping from Histopathology

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous