This is a preprint.
Real-World Benchmarking and Validation of Foundation Model Transformers for Endometrial Cancer Subtyping from Histopathology
- PMID: 41282936
- PMCID: PMC12633087
- DOI: 10.1101/2025.10.10.25337691
Real-World Benchmarking and Validation of Foundation Model Transformers for Endometrial Cancer Subtyping from Histopathology
Abstract
Purpose: To evaluate whether open-source histopathology foundation model pipelines, paired with attention-based multiple instance learning (MIL), can accurately classify molecular subtypes of endometrial cancer (EC) from whole-slide images (WSIs) and maintain performance in a real-world, independent cohort.
Methods: We assembled a public discovery cohort of 815 patients (1,195 WSIs) from The Cancer Genome Atlas and Clinical Proteomic Tumor Analysis Consortium, and an independent external cohort of 720 patients (1,357 WSIs) with molecular subtyping determined by mismatch repair immunohistochemistry plus TP53 and POLE sequencing. Four ImageNet-pretrained convolutional neural networks (CNNs) and six open-source foundation encoders using two MIL aggregation strategies (TransMIL and CLAM) were benchmarked within the STAMP pipeline. Models were trained with five-fold cross-validation and evaluated on an independent cohort. Macro-area under the receiver operating characteristic curve (AUC) was the primary outcome.
Results: In cross-validation, foundation models outperformed CNNs (macro-AUC 0.799-0.860 vs 0.715-0.829). The best configuration (Virchow2 with CLAM) achieved macro-AUC 0.860 (95%CI, 0.839-0.880), macro-F1 score 0.607, and balanced accuracy 0.647. External validation showed substantial degradation for CNNs, while foundation models retained higher discrimination (macro-AUC 0.667-0.780). UNI2 with CLAM had the highest external macro-AUC (0.780), and Virchow2 with CLAM had the best balanced accuracy (0.525). Subtype-level AUCs for UNI2 with CLAM were highest for p53abn (0.851).
Conclusions: Open-source foundation model pipelines with attention-based MIL can deliver accurate and generalizable molecular subtyping of EC directly from WSIs. These models outperform CNNs in real-world validation, supporting their potential as scalable, cost-effective tools to guide precision oncology and triage confirmatory molecular testing.
Conflict of interest statement
Disclaimers: None, the authors declare no potential conflicts of interest.
Figures
References
-
- Cancer of the Endometrium - Cancer Stat Facts. SEER https://seer.cancer.gov/statfacts/html/corp.html.
-
- Morice P., Leary A., Creutzberg C., Abu-Rustum N. & Darai E. Endometrial cancer. Lancet 387, 1094–1108 (2016). - PubMed
Publication types
Grants and funding
LinkOut - more resources
Full Text Sources
Research Materials
Miscellaneous