Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr:9:e2400227.
doi: 10.1200/CCI-24-00227. Epub 2025 Apr 18.

Decoding Recurrence in Early-Stage and Locoregionally Advanced Non-Small Cell Lung Cancer: Insights From Electronic Health Records and Natural Language Processing

Affiliations

Decoding Recurrence in Early-Stage and Locoregionally Advanced Non-Small Cell Lung Cancer: Insights From Electronic Health Records and Natural Language Processing

Kyeryoung Lee et al. JCO Clin Cancer Inform. 2025 Apr.

Abstract

Purpose: Recurrences after curative resection in early-stage and locoregionally advanced non-small cell lung cancer (NSCLC) are common, necessitating a nuanced understanding of associated risk factors. This study aimed to establish a natural language processing (NLP) system to efficiently curate recurrence data in NSCLC and analyze risk factors longitudinally.

Patients and methods: Electronic health records of 6,351 patients with NSCLC with >700,000 notes were obtained from Mount Sinai's data sets. A deep learning-based customized NLP system was developed to identify cohorts experiencing recurrence. Recurrence types and rates over time were stratified by various clinical features. Cohort description analysis, Kaplan-Meier analysis for overall recurrence-free survival (RFS) and distant metastasis-free survival (DMFS), and Cox proportional hazards analysis were performed.

Results: Of 1,295 patients with stage I-IIIA NSCLC with surgical resections, 336 patients (25.9%) experienced recurrence, as identified through NLP. The NLP system achieved a precision of 94.3%, a recall of 93%, and an F1 score of 93.5. Among 336 patients, 52.4% had local/regional recurrences, 44% distant metastases, and 3.6% unknown recurrence. RFS rates at years 1-5 were 93%, 81%, 73%, 67%, and 61%, respectively (96%, 89%, 84%, 80%, and 75% for distant metastasis). Stage-specific RFS rates at year 5 were 73% (IA), 62% (IB), 47% (IIA), 46% (IIB), and 20% (IIIA). Stage IB patients had a significantly higher likelihood of recurrence versus stage IA (adjusted hazard ratio [aHR], 1.63; P = .02). The RFS was lower in patients with clinically significant TP53 alteration (v TP53-negative or unknown significance), affecting overall RFS (aHR, 1.89; P = .007) and DMFS (aHR, 2.47; P = .009) among stage IA/IB patients.

Conclusion: Our scalable NLP system enabled us to generate real-world insights into NSCLC recurrences, paving the way for predictive models for preventing, diagnosing, and treating NSCLC recurrence.

PubMed Disclaimer

Conflict of interest statement

The following represents disclosure information provided by authors of this manuscript. All relationships are considered compensated unless otherwise noted. Relationships are self-held unless noted. I = Immediate Family Member, Inst = My Institution. Relationships may not relate to the subject matter of this manuscript. For more information about ASCO's conflict of interest policy, please refer to www.asco.org/rwc or ascopubs.org/cci/author-center.

Open Payments is a public database containing information reported by companies about payments made to US-licensed physicians (Open Payments).

Xiaoyan Wang

Employment: IMO Health

No other potential conflicts of interest were reported.

Figures

FIG 1.
FIG 1.
(A) Ontology of recurrence and (B) deidentified sample notes with recurrence annotations and (C) multilayer deep learning NLP architecture for recurrence curation. BiLSTM, bidirectional long short-term memory; CRF, conditional random fields; NLP, natural language processing; PET/CT, positron emission tomography/computed tomography; RLL, right lower lobe; RUL, right upper lobe.
FIG 2.
FIG 2.
Overall recurrence-free survival and distant metastasis-free survival analysis stratified by disease (A and B) stages and (C and D) substage. Only the first recurrence events (locoregional or distant) were extracted and analyzed. Locoregional recurrence events were censored in distant metastasis-free survival analysis.
FIG 3.
FIG 3.
Multivariable CoxPH analysis for overall recurrence-free survival (A) across all stages (I-III) and (B) subgroup of patients with disease stages IA and IB at initial diagnosis. *P < .05; **P < .01; ***P < .001. CoxPH, Cox proportional hazards; EGFR, epidermal growth factor receptor; KRAS, Kirsten rat sarcoma viral oncogene; TP53, tumor suppressor gene 53.
FIG 4.
FIG 4.
Multivariable CoxPH analysis for distant metastasis-free survival (A) across all stages (I-III) and (B) subgroup of patients with disease stages IA and IB at initial diagnosis. *P < .05; **P < .01; ***P < .001. CoxPH, Cox proportional hazards; EGFR, epidermal growth factor receptor; KRAS, Kirsten rat sarcoma viral oncogene; TP53, tumor suppressor gene 53.

References

    1. Jonas DE, Reuland DS, Reddy SM, et al. : Screening for lung cancer with low-dose computed tomography: Updated evidence report and systematic review for the US Preventive Services Task Force. JAMA 325:971-987, 2021 - PubMed
    1. American Cancer Society : Lung Cancer Survival Rates. https://www.cancer.org/cancer/types/lung-cancer/detection-diagnosis-stag...
    1. Oudkerk M, Liu S, Heuvelmans MA, et al. : Lung cancer LDCT screening and mortality reduction—Evidence, pitfalls and future perspectives. Nat Rev Clin Oncol 18:135-151, 2021 - PubMed
    1. Rajaram R, Huang Q, Li RZ, et al. : Recurrence-free survival in surgically-resected non-small cell lung cancer patients: A systematic literature review and meta-analysis. Chest 165:1260-1270, 2024 - PubMed
    1. American Cancer Society : What Is Cancer Recurrence? https://www.cancer.org/cancer/survivorship/long-term-health-concerns/rec...

MeSH terms

LinkOut - more resources