Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Sep:9:e2500073.
doi: 10.1200/CCI-25-00073. Epub 2025 Sep 25.

Development of Machine Learning Systems to Predict Cancer-Related Symptoms With Validation Across a Health Care System

Affiliations

Development of Machine Learning Systems to Predict Cancer-Related Symptoms With Validation Across a Health Care System

Baijiang Yuan et al. JCO Clin Cancer Inform. 2025 Sep.

Abstract

Purpose: Cancer and its treatment cause symptoms. In this study, we aimed to develop machine learning (ML) systems that predict future symptom deterioration among people receiving treatment for cancer and then validate the systems in a simulated deployment across an entire health care system.

Methods: We trained and tested ML systems that predict a deterioration in nine patient-reported symptoms within 30 days after treatments for aerodigestive cancers, using internal electronic health record (EHR) data at Princess Margaret Cancer Centre (3,229 patients; 20,267 treatments). The primary performance metric was the area under the receiver operating characteristic curve (AUROC). The best-performing systems in the held-out internal test set were then externally validated across 82 cancer centers in Ontario (12,079 patients; 77,003 treatments) by adapting techniques from meta-analysis.

Results: The best ML systems predicted symptom deterioration with AUROCs ranging from 0.66 (95% CI, 0.63 to 0.69) for dyspnea to 0.73 (95% CI, 0.71 to 0.75) for drowsiness in the internal test cohort. Treatments flagged as high-risk were significantly associated with future symptom deterioration (odds ratios [ORs], 2.53-6.56; all P < .001) and emergency department visits for dyspnea (OR, 1.85; P = .008), depression (OR, 1.84; P = .04), and anxiety (OR, 2.66; P < .001). In the external validation cohort, the AUROCs for different symptoms meta-analyzed across centers ranged from 0.67 (95% CI, 0.66 to 0.68) to 0.73 (95% CI, 0.72 to 0.74). Performance across centers displayed significant heterogeneity for six of nine symptoms (I2, 46.4%-66.9%; P = .004 for dyspnea, P < .001 for the rest).

Conclusion: ML can predict future symptoms among people with cancer from routine EHR data, which could guide personalized interventions. Heterogeneous performance across centers must be considered when systems are deployed across a health care system.

PubMed Disclaimer

Conflict of interest statement

The following represents disclosure information provided by authors of this manuscript. All relationships are considered compensated unless otherwise noted. Relationships are self-held unless noted. I = Immediate Family Member, Inst = My Institution. Relationships may not relate to the subject matter of this manuscript. For more information about ASCO's conflict of interest policy, please refer to www.asco.org/rwc or ascopubs.org/cci/author-center.

Open Payments is a public database containing information reported by companies about payments made to US-licensed physicians (Open Payments).

Robert C. Grant

Consulting or Advisory Role: AstraZeneca, Eisai, Knight Therapeutics, Ipsen, Guardant Health, Incyte

Research Funding: Pfizer

No other potential conflicts of interest were reported.

Figures

FIG 1.
FIG 1.
Study schema. The lookback windows for different features were 5 days for laboratory features, 30 days for patient-reported symptoms, and 5 years for emergency department visit data. Icons used in this figure are credited to their respective creators on The Noun Project under CC BY 3.0. These include works by SmashiconsGB (Lock Database), Adrian Fanani (Data cleaning), Hat-TechPK (Data engineering), Scarlett Mckay and Pixelz Studio (Patient), DARAYANI (Infusion), Cherry (Test tube), Vectorstall PK (Medical), yus (Hospital), Siti Solekah (System), DinosoftLabs (Volume control), Angela (Machine learning), Sutriman ID (Calibration), and Gregor Cresnar (Repeat). Adapted from He et al.
FIG 2.
FIG 2.
System performance in the internal test cohort. (A) Radar plots compared the areas under the receiver operating characteristic curve for LR, LGBM, and TCN. (B) Receiver operating characteristic curve for LGBM. (C) Calibration curve with isotonic regression for LGBM. (D) Arrow plot: Starting with 100% sensitivity and the event prevalence, corresponding to providing an alert before every treatment, the arrows point to the sensitivity and positive predictive value when the system is set to alert before 10% of treatments for each symptom. ORs compared symptom deterioration after treatments with alarms with treatments without alarms. LGBM, light gradient-boosting machine; LR, ridge logistic regression; MaxCE, maximum calibration error; MCE, minimum calibration error; OR, odds ratio; TCN, multi-task temporal convolutional neural network.
FIG 3.
FIG 3.
(A) The importance of groups of features predictions in the internal test cohort, measured using the sum of mean absolute SHAP values for each symptom (colored dots) and all symptoms (gray bars). (B) Odds ratios for the risk of emergency department visits within 30 days after an alarm. SHAP, Shapley additive explanation.
FIG 4.
FIG 4.
External validation of the system to predict symptom deterioration trained at Princess Margaret Cancer Centre, tested on 82 other cancer centers across Ontario, Canada. Violin plots show AUROCs for predicting symptom deterioration, with each dot representing performance at a single center. AUROC, area under the receiver operating characteristic curve.

References

    1. Schnipper LE, Davidson NE, Wollins DS, et al. : American Society of Clinical Oncology statement: A conceptual framework to assess the value of cancer treatment options. J Clin Oncol 33:2563-2577, 2015 - PMC - PubMed
    1. Cherny NI, Sullivan R, Dafni U, et al. : A standardised, generic, validated approach to stratify the magnitude of clinical benefit that can be anticipated from anti-cancer therapies: The European Society for Medical Oncology Magnitude of Clinical Benefit Scale (ESMO-MCBS). Ann Oncol 26:1547-1573, 2015 - PubMed
    1. Aaronson NK, Ahmedzai S, Bergman B, et al. : The European Organization for Research and Treatment of Cancer QLQ-C30: A quality-of-life instrument for use in international clinical trials in oncology. J Natl Cancer Inst 85:365-376, 1993 - PubMed
    1. Cella DF, Tulsky DS, Gray G, et al. : The Functional Assessment of Cancer Therapy scale: Development and validation of the general measure. J Clin Oncol 11:570-579, 1993 - PubMed
    1. Bubis LD, Davis L, Mahar A, et al. : Symptom burden in the first year after cancer diagnosis: An analysis of patient-reported outcomes. J Clin Oncol 36:1103-1111, 2018 - PubMed