Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Feb;12(2):157-8; e169-7.
doi: 10.1200/JOP.2015.004622. Epub 2015 Aug 25.

ReCAP: Feasibility and Accuracy of Extracting Cancer Stage Information From Narrative Electronic Health Record Data

Affiliations

ReCAP: Feasibility and Accuracy of Extracting Cancer Stage Information From Narrative Electronic Health Record Data

Jeremy L Warner et al. J Oncol Pract. 2016 Feb.

Abstract

Purpose: Cancer stage, one of the most important prognostic factors for cancer-specific survival, is often documented in narrative form in electronic health records (EHRs). Such documentation results in tedious and time-consuming abstraction efforts by tumor registrars and other secondary users. This information may be amenable to extraction by automated methods.

Methods: We developed a natural language processing algorithm to extract stage statements from machine-readable EHR documents, including automated rules to choose the most likely stage when discordance was present in the EHR. These methods were developed in a training set of patients with lung cancer, independently validated in a test set of patients with lung cancer, and compared with the gold standard of Vanderbilt Cancer Registry–determined stage (when available).

Results: In the combined data set of 2,323 patients (training set, n = 1,103; validation set, n = 1,220), 751,880 documents were analyzed. A stage statement was extracted from 2,239 (98.6%) patient EHRs (median, 24 documents per patient). Stage discordance was common, affecting 83.6% of these EHRs. Nevertheless, algorithmically derived stage accuracy was high in the validation set (κ = 0.906; 95% CI, 0.873 to 0.939), when including notes generated within 14 weeks from diagnosis.

Conclusion: Accurate stage determination can be achieved through automated methods applied to narrative text, despite the frequent presence of discordance in such data. Our results also indicate that stage can be automatically captured in a shorter timeframe than the 6-month window used by cancer registries, as early as 5 weeks from diagnosis. These methods may be generalizable to large narrative cancer data sets.

PubMed Disclaimer

Comment in

LinkOut - more resources