Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Dec;58 Suppl(0):S120-S127.
doi: 10.1016/j.jbi.2015.06.030. Epub 2015 Jul 22.

Agile text mining for the 2014 i2b2/UTHealth Cardiac risk factors challenge

Affiliations

Agile text mining for the 2014 i2b2/UTHealth Cardiac risk factors challenge

James Cormack et al. J Biomed Inform. 2015 Dec.

Abstract

This paper describes the use of an agile text mining platform (Linguamatics' Interactive Information Extraction Platform, I2E) to extract document-level cardiac risk factors in patient records as defined in the i2b2/UTHealth 2014 challenge. The approach uses a data-driven rule-based methodology with the addition of a simple supervised classifier. We demonstrate that agile text mining allows for rapid optimization of extraction strategies, while post-processing can leverage annotation guidelines, corpus statistics and logic inferred from the gold standard data. We also show how data imbalance in a training set affects performance. Evaluation of this approach on the test data gave an F-Score of 91.7%, one percent behind the top performing system.

Keywords: Clinical natural language processing; Information extraction; Text mining.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Distribution of annotations per risk factor
Figure 2
Figure 2
System outline
Figure 3
Figure 3
Concrete example of the query refinement strategy
Figure 4
Figure 4
Smoking Categorization evidence
Figure 5
Figure 5
Plain text table
Figure 6
Figure 6
Extracting temporal information for Medications
Figure 7
Figure 7
Extracting temporal information for A1C
Figure 8
Figure 8
F-scores of individual categories

Similar articles

Cited by

References

    1. Stubbs A, Uzuner Ö. Annotating Risk Factors for Heart Disease in Clinical Narratives for Diabetic Patients. Journal of Biomedical Informatics. 2015 To appear - This Issue. - PMC - PubMed
    1. Stubbs A, Kotfila C, Xu H, Uzuner Ö. Identifying risk factors for heart disease over time: Overview of 2014 i2b2/UTHealth shared task Track 2. Journal of Biomedical Informatics (To appear- This Issue) 2015 - PMC - PubMed
    1. Chute C, Beck S, Fisk T, Mohr D. The Enterprise Data Trust at Mayo Clinic: a semantically integrated warehouse of biomedical data. Journal of American Medical Informatics Association. 2010;15:131–5. - PMC - PubMed
    1. Friedlin J, McDonald C. A natural language processing system to extract and code concepts relating to congestive heart failure from chest radiology reports. AMIA Annual Symposium Proceedings; 2006. - PMC - PubMed
    1. Fan J, Prasad R, Yabut R, Loomis R, Zisook D, Mattison J, Huang Y. Part-of-speech Tagging for Clinical Text: Wall or Bridge Between Institutions?. AMIA Annual Symposium; 2011. - PMC - PubMed

MeSH terms