Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Mar 15;179(6):749-58.
doi: 10.1093/aje/kwt441. Epub 2014 Jan 30.

Using natural language processing to improve efficiency of manual chart abstraction in research: the case of breast cancer recurrence

Using natural language processing to improve efficiency of manual chart abstraction in research: the case of breast cancer recurrence

David S Carrell et al. Am J Epidemiol. .

Abstract

The increasing availability of electronic health records (EHRs) creates opportunities for automated extraction of information from clinical text. We hypothesized that natural language processing (NLP) could substantially reduce the burden of manual abstraction in studies examining outcomes, like cancer recurrence, that are documented in unstructured clinical text, such as progress notes, radiology reports, and pathology reports. We developed an NLP-based system using open-source software to process electronic clinical notes from 1995 to 2012 for women with early-stage incident breast cancers to identify whether and when recurrences were diagnosed. We developed and evaluated the system using clinical notes from 1,472 patients receiving EHR-documented care in an integrated health care system in the Pacific Northwest. A separate study provided the patient-level reference standard for recurrence status and date. The NLP-based system correctly identified 92% of recurrences and estimated diagnosis dates within 30 days for 88% of these. Specificity was 96%. The NLP-based system overlooked 5 of 65 recurrences, 4 because electronic documents were unavailable. The NLP-based system identified 5 other recurrences incorrectly classified as nonrecurrent in the reference standard. If used in similar cohorts, NLP could reduce by 90% the number of EHR charts abstracted to identify confirmed breast cancer recurrence cases at a rate comparable to traditional abstraction.

Keywords: breast cancer recurrence; chart abstraction; natural language processing.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Architecture of the natural language processing (NLP) system used to identify recurrent breast cancer diagnoses among Group Health patients, Pacific Northwest, 1995–2012. cTAKES, Clinical Text Analysis and Knowledge Extraction System.
Figure 2.
Figure 2.
Breast cancer recurrence hits per calendar day for Group Health patients, Pacific Northwest, 1995–2012, found by the natural language processing system's clinical module in electronic charts for random samples of A) 16 breast cancer patients without recurrence, and B) 16 breast cancer patients with clinically confirmed recurrence.

Comment in

Similar articles

Cited by

References

    1. Floyd JS, Heckbert SR, Weiss NS, et al. Use of administrative data to estimate the incidence of statin-related rhabdomyolysis. JAMA. 2012;307(15):1580–1582. - PMC - PubMed
    1. Dean BB, Lam J, Natoli JL, et al. Review: use of electronic medical records for health outcomes research: a literature review. Med Care Res Rev. 2009;66(6):611–638. - PubMed
    1. Hicks J. The Potential of Claims Data to Support the Measurement of Health Care Quality. Policy Analysis. Santa Monica, CA: RAND Graduate School; 2003. p. 272.
    1. Meystre SM, Savova GK, Kipper-Schuler KC, et al. Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform. 2008:128–144. - PubMed
    1. Jha AK. The promise of electronic records: Around the corner or down the road? JAMA. 2011;306(8):880–881. - PubMed

Publication types