Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Aug:128:32-38.
doi: 10.1016/j.ijmedinf.2019.05.008. Epub 2019 May 13.

Automated extraction of sudden cardiac death risk factors in hypertrophic cardiomyopathy patients by natural language processing

Affiliations

Automated extraction of sudden cardiac death risk factors in hypertrophic cardiomyopathy patients by natural language processing

Sungrim Moon et al. Int J Med Inform. 2019 Aug.

Abstract

Background: The management of hypertrophic cardiomyopathy (HCM) patients requires the knowledge of risk factors associated with sudden cardiac death (SCD). SCD risk factors such as syncope and family history of SCD (FH-SCD) as well as family history of HCM (FH-HCM) are documented in electronic health records (EHRs) as clinical narratives. Automated extraction of risk factors from clinical narratives by natural language processing (NLP) may expedite management workflow of HCM patients. The aim of this study was to develop and deploy NLP algorithms for automated extraction of syncope, FH-SCD, and FH-HCM from clinical narratives.

Methods and results: We randomly selected 200 patients from the Mayo HCM registry for development (n = 100) and testing (n = 100) of NLP algorithms for extraction of syncope, FH-SCD as well as FH-HCM from clinical narratives of EHRs. The clinical reference standard was manually abstracted by 2 independent annotators. Performance of NLP algorithms was compared to aggregation and summarization of data entries in the HCM registry for syncope, FH-SCD, and FH-HCM. We also compared the NLP algorithms with billing codes for syncope as well as responses to patient survey questions for FH-SCD and FH-HCM. These analyses demonstrated NLP had superior sensitivity (0.96 vs 0.39, p < 0.001) and comparable specificity (0.90 vs 0.92, p = 0.74) and PPV (0.90 vs 0.83, p = 0.37) compared to billing codes for syncope. For FH-SCD, NLP outperformed survey responses for all parameters (sensitivity: 0.91 vs 0.59, p = 0.002; specificity: 0.98 vs 0.50, p < 0.001; PPV: 0.97 vs 0.38, p < 0.001). NLP also achieved superior sensitivity (0.95 vs 0.24, p < 0.001) with comparable specificity (0.95 vs 1.0, p-value not calculable) and positive predictive value (PPV) (0.92 vs 1.0, p = 0.09) compared to survey responses for FH-HCM.

Conclusions: Automated extraction of syncope, FH-SCD and FH-HCM using NLP is feasible and has promise to increase efficiency of workflow for providers managing HCM patients.

Keywords: Electronic health records; Hypertrophic cardiomyopathy; Natural language processing; Sudden cardiac death.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Study Design Study subjects were identified from 1,273 patients who participated in a dedicated HCM registry. Patients from the registry who did not have clinical narratives in electronic format were excluded (n = 277). From this cohort (n = 996) 200 subjects were randomly selected and allocated to training and test sets of 100 subjects each. HCM = hypertrophic cardiomyopathy.
Figure 2.
Figure 2.
Extraction of Risk Factors in HCM by Automated Approaches Diverse automated technologies for risk factor extractions were used for each data type. FH-HCM = family history of hypertrophic cardiomyopathy; FH-SCD = family history sudden of cardiac death; HCM = hypertrophic cardiomyopathy; NLP = natural language processing.
Figure 3.
Figure 3.
Information Extraction by Med Tagger - IE Med Tagger - IE processes clinical narratives containing unique identification numbers for each patient as input and generates risk factor status as output. The status of each risk factor is displayed as “Yes” (present) or “No” (absent). Sentences used for classification are displayed as evidence. Legend: IE = Information extraction.

Similar articles

Cited by

References

    1. Jensen PB, Jensen LJ and Brunak S. Mining electronic health records: towards better research applications and clinical care. Nat Rev Genet. 2012;13:395–405. - PubMed
    1. Maddox TM, Albert NM, Borden WB, Curtis LH, Ferguson TB Jr., Kao DP, Marcus GM, Peterson ED, Redberg R, Rumsfeld JS, Shah ND, Tcheng JE, American Heart Association Council on Quality of C, Outcomes R, Council on Cardiovascular Disease in the Y, Council on Clinical C, Council on Functional G, Translational B and Stroke C. The Learning Healthcare System and Cardiovascular Care: A Scientific Statement From the American Heart Association. Circulation. 2017;135:e826–e857. - PubMed
    1. Rosenbloom ST, Denny JC, Xu H, Lorenzi N, Stead WW and Johnson KB. Data from clinical notes: a perspective on the tension between structure and flexible documentation. J Am Med Inform Assoc. 2011;18:181–6. - PMC - PubMed
    1. Demner-Fushman D, Chapman WW and McDonald CJ. What can natural language processing do for clinical decision support? J Biomed Inform. 2009;42:760–72. - PMC - PubMed
    1. Afzal N, Mallipeddi VP, Sohn S, Liu H, Chaudhry R, Scott CG, Kullo IJ and Arruda-Olson AM. Natural language processing of clinical notes for identification of critical limb ischemia. Int J Med Inform. 2018;111:83–89. - PMC - PubMed

Publication types