Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 15:11:e64697.
doi: 10.2196/64697.

A Deep Learning-Enabled Workflow to Estimate Real-World Progression-Free Survival in Patients With Metastatic Breast Cancer: Study Using Deidentified Electronic Health Records

Affiliations

A Deep Learning-Enabled Workflow to Estimate Real-World Progression-Free Survival in Patients With Metastatic Breast Cancer: Study Using Deidentified Electronic Health Records

Gowtham Varma et al. JMIR Cancer. .

Abstract

Background: Progression-free survival (PFS) is a crucial endpoint in cancer drug research. Clinician-confirmed cancer progression, namely real-world PFS (rwPFS) in unstructured text (ie, clinical notes), serves as a reasonable surrogate for real-world indicators in ascertaining progression endpoints. Response evaluation criteria in solid tumors (RECIST) is traditionally used in clinical trials using serial imaging evaluations but is impractical when working with real-world data. Manual abstraction of clinical progression from unstructured notes remains the gold standard. However, this process is a resource-intensive, time-consuming process. Natural language processing (NLP), a subdomain of machine learning, has shown promise in accelerating the extraction of tumor progression from real-world data in recent years.

objectives: We aim to configure a pretrained, general-purpose health care NLP framework to transform free-text clinical notes and radiology reports into structured progression events for studying rwPFS on metastatic breast cancer (mBC) cohorts.

Methods: This study developed and validated a novel semiautomated workflow to estimate rwPFS in patients with mBC using deidentified electronic health record data from the Nference nSights platform. The developed workflow was validated in a cohort of 316 patients with hormone receptor-positive, human epidermal growth factor receptor-2 (HER-2) 2-negative mBC, who were started on palbociclib and letrozole combination therapy between January 2015 and December 2021. Ground-truth datasets were curated to evaluate the workflow's performance at both the sentence and patient levels. NLP-captured progression or a change in therapy line were considered outcome events, while death, loss to follow-up, and end of the study period were considered censoring events for rwPFS computation. Peak reduction and cumulative decline in Patient Health Questionnaire-8 (PHQ-8) scores were analyzed in the progressed and nonprogressed patient subgroups.

Results: The configured clinical NLP engine achieved a sentence-level progression capture accuracy of 98.2%. At the patient level, initial progression was captured within ±30 days with 88% accuracy. The median rwPFS for the study cohort (N=316) was 20 (95% CI 18-25) months. In a validation subset (n=100), rwPFS determined by manual curation was 25 (95% CI 15-35) months, closely aligning with the computational workflow's 22 (95% CI 15-35) months. A subanalysis revealed rwPFS estimates of 30 (95% CI 24-39) months from radiology reports and 23 (95% CI 19-28) months from clinical notes, highlighting the importance of integrating multiple note sources. External validation also demonstrated high accuracy (92.5% sentence level; 90.2% patient level). Sensitivity analysis revealed stable rwPFS estimates across varying levels of missing source data and event definitions. Peak reduction in PHQ-8 scores during the study period highlighted significant associations between patient-reported outcomes and disease progression.

Conclusions: This workflow enables rapid and reliable determination of rwPFS in patients with mBC receiving combination therapy. Further validation across more diverse external datasets and other cancer types is needed to ensure broader applicability and generalizability.

Keywords: EHR; ML; NLP; breast; cancer; data-driven oncology; deep learning; documentation; electronic health record; machine learning; metastatic; metastatic breast cancer; natural language processing; notes; oncology; real-world evidence; real-world progression-free survival; report; survival; workflow.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: GV, PK, KP, SC, AvA, MM, SJ, AkA, RB, VT, PL, and VS are current employees of Nference, inc and hold a minority stake in the company. RKY, BSA, and VK are past employees of Nference. SAS is affiliated with Mayo Clinic, Rochester. The authors declare no further competing interests in the findings of the study.

Figures

Figure 1.
Figure 1.. Methodology flow diagram illustrating the workflow. (A) Workflow for real-world progression (rwP) extraction and determining the real-world progression-free survival (rwPFS). (B) The methodology for capturing progression from unstructured texts in routine clinical documents and radiology reports using Nference’s clinical NLP engine that performs clinical concept recognition, association, and sentiment analysis. BERT: Bidirectional Encoder Representations from Transformers; EHR: electronic health record.
Figure 2.
Figure 2.. Cohort attrition diagram: structured codes 174* (ICD-9) and C50* (ICD-10) or >4 positive disease sentiments from the augmented curation disease diagnosis model were used for breast cancer. For evidence of metastasis, 197*, 198* (ICD-9), C78*, and C79* (ICD-10) in conjunction with augmented curation were used; * represents all the children codes within the parent code. ECOG: Eastern Cooperative Oncology Group; EHR: electronic health record; HER-2: human epidermal growth factor receptor-2; HR: hormone receptor; ICD-9: International Classification of Diseases, Ninth Revision; ICD-10: International Statistical Classification of Diseases, Tenth Revision; NLP: natural language processing.
Figure 3.
Figure 3.. Kaplan-Meier survival plots for the overall study cohort and validation sets: (A) Kaplan-Meier survival plots indicating the real-world progression-free survival (rwPFS) and real-world overall survival (rwOS) in the study cohort of patients with metastatic breast cancer using pooled note sources. (B) Patient-level validation of first progression capture and comparing outcomes estimated by computational workflow with manual curation. mBC: metastatic breast cancer.
Figure 4.
Figure 4.. Kaplan-Meier survival plots for real-world progression-free survival (rwPFS) based on the patient note source. Survival plots indicating the real-world rwPFS with progressions captured from solitary sources of radiology reports (RR) and routine clinical documents (CD). mBC: metastatic breast cancer.
Figure 5.
Figure 5.. Kaplan-Meier survival curves for subgroup analysis. Each of the subgroups account for different variations in treatment patterns. The survival curves and risks table showcase the effect of other prior or concomitant systemic therapies on the median real-world progression-free survival (rwPFS).

Similar articles

References

    1. Klonoff DC. The new FDA real-world evidence program to support development of drugs and biologics. J Diabetes Sci Technol. 2020 Mar;14(2):345–349. doi: 10.1177/1932296819832661. doi. Medline. - DOI - PMC - PubMed
    1. Feinberg BA, Gajra A, Zettler ME, Phillips TD, Phillips EG, Jr, Kish JK. Use of real-world evidence to support FDA approval of oncology drugs. Value Health. 2020 Oct;23(10):1358–1365. doi: 10.1016/j.jval.2020.06.006. doi. Medline. - DOI - PubMed
    1. Khozin S, Blumenthal GM, Pazdur R. Real-world data for clinical evidence generation in oncology. J Natl Cancer Inst. 2017 Nov 1;109(11) doi: 10.1093/jnci/djx187. doi. Medline. - DOI - PubMed
    1. Grimaldi S, Terroir M, Caramella C. Advances in oncological treatment: limitations of RECIST 1.1 criteria. Q J Nucl Med Mol Imaging. 2018 Jun;62(2):129–139. doi: 10.23736/S1824-4785.17.03038-2. doi. Medline. - DOI - PubMed
    1. Eisenhauer EA, Therasse P, Bogaerts J, et al. New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1) Eur J Cancer. 2009 Jan;45(2):228–247. doi: 10.1016/j.ejca.2008.10.026. doi. Medline. - DOI - PubMed