From manual clinical criteria to machine learning algorithms: Comparing outcome endpoints derived from diverse electronic health record data modalities
- PMID: 40367064
- PMCID: PMC12077705
- DOI: 10.1371/journal.pdig.0000755
From manual clinical criteria to machine learning algorithms: Comparing outcome endpoints derived from diverse electronic health record data modalities
Abstract
Background: Progression free survival (PFS) is a critical clinical outcome endpoint during cancer management and treatment evaluation. Yet, PFS is often missing from publicly available datasets due to the current subjective, expert, and time-intensive nature of generating PFS metrics. Given emerging research in multi-modal machine learning (ML), we explored the benefits and challenges associated with mining different electronic health record (EHR) data modalities and automating extraction of PFS metrics via ML algorithms.
Methods: We analyzed EHR data from 92 pathology-proven GBM patients, obtaining 233 corticosteroid prescriptions, 2080 radiology reports, and 743 brain MRI scans. Three methods were developed to derive clinical PFS: 1) frequency analysis of corticosteroid prescriptions, 2) natural language processing (NLP) of reports, and 3) computer vision (CV) volumetric analysis of imaging. Outputs from these methods were compared to manually annotated clinical guideline PFS metrics.
Results: Employing data-driven methods, standalone progression rates were 63% (prescription), 78% (NLP), and 54% (CV), compared to the 99% progression rate from manually applied clinical guidelines using integrated data sources. The prescription method identified progression an average of 5.2 months later than the clinical standard, while the CV and NLP algorithms identified progression earlier by 2.6 and 6.9 months, respectively. While lesion growth is a clinical guideline progression indicator, only half of patients exhibited increasing contrast-enhancing tumor volumes during scan-based CV analysis.
Conclusion: Our results indicate that data-driven algorithms can extract tumor progression outcomes from existing EHR data. However, ML methods are subject to varying availability bias, supporting contextual information, and pre-processing resource burdens that influence the extracted PFS endpoint distributions. Our scan-based CV results also suggest that the automation of clinical criteria may not align with human intuition. Our findings indicate a need for improved data source integration, validation, and revisiting of clinical criteria in parallel to multi-modal ML algorithm development.
Copyright: © 2025 Chappidi et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures






References
-
- Henriksen OM, Del Mar Álvarez-Torres M, Figueiredo P, Hangel G, Keil VC, Nechifor RE, et al.. High-grade glioma treatment response monitoring biomarkers: a position statement on the evidence supporting the use of advanced MRI techniques in the clinic, the latest bench-to-bedside developments. part 1: perfusion and diffusion techniques. Front Oncol. 2022;12:810263. doi: 10.3389/fonc.2022.810263 - DOI - PMC - PubMed
-
- Le Fèvre C, Lhermitte B, Ahle G, Chambrelant I, Cebula H, Antoni D, et al.. Pseudoprogression versus true progression in glioblastoma patients: a multiapproach literature review: Part 1 - Molecular, morphological and clinical features. Crit Rev Oncol Hematol. 2021;157:103188. doi: 10.1016/j.critrevonc.2020.103188 - DOI - PubMed
-
- Le Fèvre C, Constans J-M, Chambrelant I, Antoni D, Bund C, Leroy-Freschini B, et al.. Pseudoprogression versus true progression in glioblastoma patients: a multiapproach literature review. Part 2 - Radiological features and metric markers. Crit Rev Oncol Hematol. 2021;159:103230. doi: 10.1016/j.critrevonc.2021.103230 - DOI - PubMed
LinkOut - more resources
Full Text Sources