Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May:9:e2400859.
doi: 10.1200/PO-24-00859. Epub 2025 May 5.

Training, Validating, and Testing Machine Learning Prediction Models for Endometrial Cancer Recurrence

Affiliations

Training, Validating, and Testing Machine Learning Prediction Models for Endometrial Cancer Recurrence

Jesus Gonzalez Bosquet et al. JCO Precis Oncol. 2025 May.

Abstract

Purpose: Endometrial cancer (EC) is the most common gynecologic cancer in the United States with rising incidence and mortality. Despite optimal treatment, 15%-20% of all patients will recur. To better select patients for adjuvant therapy, it is important to accurately predict patients at risk for recurrence. Our objective was to train, validate, and test models of EC recurrence using lasso regression and other machine learning (ML) and deep learning (DL) analytics in a large, comprehensive data set.

Methods: Data from patients with EC were downloaded from the Oncology Research Information Exchange Network database and stratified into low risk, The International Federation of Gynecology and Obstetrics (FIGO) grade 1 and 2, stage I (N = 329); high risk, or FIGO grade 3 or stages II, III, IV (N = 324); and nonendometrioid histology (N = 239) groups. Clinical, pathologic, genomic, and genetic data were used for the analysis. Genomic data included microRNA, long noncoding RNA, isoforms, and pseudogene expressions. Genetic variation included single-nucleotide variation (SNV) and copy-number variation (CNV). In the discovery phase, we selected variables informative for recurrence (P < .05), using univariate analyses of variance. Then, we trained, validated, and tested multivariate models using selected variables and lasso regression, MATLAB (ML), and TensorFlow (DL).

Results: Recurrence clinic models for low-risk, high-risk, and high-risk nonendometrioid histology had AUCs of 56%, 70%, and 65%, respectively. For training, we selected models with AUC >80%: five for the low-risk group, 20 models for the high-risk group, and 20 for the nonendometrioid group. The two best low-risk models included clinical data and CNVs. For the high-risk group, three of the five best-performing models included pseudogene expression. For the nonendometrioid group, pseudogene expression and SNV were overrepresented in the best models.

Conclusion: Prediction models of EC recurrence built with ML and DL analytics had better performance than models with clinical and pathologic data alone. Prospective validation is required to determine clinical utility.

PubMed Disclaimer

Conflict of interest statement

The following represents disclosure information provided by authors of this manuscript. All relationships are considered compensated unless otherwise noted. Relationships are self-held unless noted. I = Immediate Family Member, Inst = My Institution. Relationships may not relate to the subject matter of this manuscript. For more information about ASCO's conflict of interest policy, please refer to www.asco.org/rwc or ascopubs.org/po/author-center.

Open Payments is a public database containing information reported by companies about payments made to US-licensed physicians (Open Payments).

Figures

FIG 1.
FIG 1.
Selection of best models of EC recurrence after combination of data types. EC recurrence models for all risk groups with performances ≥0.8 measured by the AUC. The three panels represent risk-based groups: (A) Low-risk endometrioid EC best models (blue); (B) high-risk endometrioid EC best models (orange); and (C) nonendometrioid group best models (red). Different performances on all three panels are displayed in ascending order. The x-axis is AUC as a percentage (0%-100%). The red error mark displays the 95% CI. Overall, over 300 models with different combinations of datatypes were tested. We only displayed the best (A) five models for low-risk endometrioid EC, (B) 19 models for high-risk endometrioid EC, and (C) 20 for nonendometrioid EC. Genomic variation: CNV, copy-number variation; EC, endometrial cancer; SNV, single-nucleotide variation. Transcriptome: FUS, fusion transcript expression; ISO, gene isoform expression; LNC, long noncoding RNA expression; MIR, microRNA expression; mRNA, gene expression; PSE, pseudogene expression.

Similar articles

References

    1. Siegel RL, Miller KD, Fuchs HE, et al. Cancer statistics, 2022. CA Cancer J Clin. 2022;72:7–33. - PubMed
    1. Sheikh MA, Althouse AD, Freese KE, et al. USA endometrial cancer projections to 2030: Should we be concerned? Future Oncol. 2014;10:2561–2568. - PubMed
    1. Creutzberg CL, van Putten WL, Koper PC, et al. Surgery and postoperative radiotherapy versus surgery alone for patients with stage-1 endometrial carcinoma: Multicentre randomised trial. PORTEC Study Group. Post Operative Radiation Therapy in Endometrial Carcinoma. Lancet. 2000;355:1404–1411. - PubMed
    1. Keys HM, Roberts JA, Brunetto VL, et al. A phase III trial of surgery with or without adjunctive external pelvic radiation therapy in intermediate risk endometrial adenocarcinoma: A Gynecologic Oncology Group study. Gynecol Oncol. 2004;92:744–751. - PubMed
    1. Nout RA, Smit VT, Putter H, et al. Vaginal brachytherapy versus pelvic external beam radiotherapy for patients with endometrial cancer of high-intermediate risk (PORTEC-2): An open-label, non-inferiority, randomised trial. Lancet. 2010;375:816–823. - PubMed

Publication types

LinkOut - more resources