Agile text mining for the 2014 i2b2/UTHealth Cardiac risk factors challenge

James Cormack¹, Chinmoy Nath², David Milward³, Kalpana Raja², Siddhartha R Jonnalagadda²

Affiliations

¹ Linguamatics Ltd., 324 Cambridge Science Park, Milton Road, Cambridge CB4 0WG, UK. Electronic address: james.cormack@linguamatics.com.
² Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, 750 N. Lake Shore Drive, 11th Floor, Chicago, IL 60611, USA.
³ Linguamatics Ltd., 324 Cambridge Science Park, Milton Road, Cambridge CB4 0WG, UK.

PMID: 26209007
PMCID: PMC4737484
DOI: 10.1016/j.jbi.2015.06.030

Agile text mining for the 2014 i2b2/UTHealth Cardiac risk factors challenge

James Cormack et al. J Biomed Inform. 2015 Dec.

. 2015 Dec;58 Suppl(0):S120-S127.

doi: 10.1016/j.jbi.2015.06.030. Epub 2015 Jul 22.

Authors

James Cormack¹, Chinmoy Nath², David Milward³, Kalpana Raja², Siddhartha R Jonnalagadda²

Affiliations

¹ Linguamatics Ltd., 324 Cambridge Science Park, Milton Road, Cambridge CB4 0WG, UK. Electronic address: james.cormack@linguamatics.com.
² Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, 750 N. Lake Shore Drive, 11th Floor, Chicago, IL 60611, USA.
³ Linguamatics Ltd., 324 Cambridge Science Park, Milton Road, Cambridge CB4 0WG, UK.

PMID: 26209007
PMCID: PMC4737484
DOI: 10.1016/j.jbi.2015.06.030

Abstract

This paper describes the use of an agile text mining platform (Linguamatics' Interactive Information Extraction Platform, I2E) to extract document-level cardiac risk factors in patient records as defined in the i2b2/UTHealth 2014 challenge. The approach uses a data-driven rule-based methodology with the addition of a simple supervised classifier. We demonstrate that agile text mining allows for rapid optimization of extraction strategies, while post-processing can leverage annotation guidelines, corpus statistics and logic inferred from the gold standard data. We also show how data imbalance in a training set affects performance. Evaluation of this approach on the test data gave an F-Score of 91.7%, one percent behind the top performing system.

Keywords: Clinical natural language processing; Information extraction; Text mining.

PubMed Disclaimer

Figures

**Figure 1**
Distribution of annotations per risk factor

**Figure 3**
Concrete example of the query refinement strategy

**Figure 4**
Smoking Categorization evidence

**Figure 6**
Extracting temporal information for Medications

**Figure 7**
Extracting temporal information for A1C

**Figure 8**
F-scores of individual categories

See this image and copyright information in PMC

References

1. Stubbs A, Uzuner Ö. Annotating Risk Factors for Heart Disease in Clinical Narratives for Diabetic Patients. Journal of Biomedical Informatics. 2015 To appear - This Issue. - PMC - PubMed
1. Stubbs A, Kotfila C, Xu H, Uzuner Ö. Identifying risk factors for heart disease over time: Overview of 2014 i2b2/UTHealth shared task Track 2. Journal of Biomedical Informatics (To appear- This Issue) 2015 - PMC - PubMed
1. Chute C, Beck S, Fisk T, Mohr D. The Enterprise Data Trust at Mayo Clinic: a semantically integrated warehouse of biomedical data. Journal of American Medical Informatics Association. 2010;15:131–5. - PMC - PubMed
1. Friedlin J, McDonald C. A natural language processing system to extract and code concepts relating to congestive heart failure from chest radiology reports. AMIA Annual Symposium Proceedings; 2006. - PMC - PubMed
1. Fan J, Prasad R, Yabut R, Loomis R, Zisook D, Mattison J, Huang Y. Part-of-speech Tagging for Clinical Text: Wall or Bridge Between Institutions?. AMIA Annual Symposium; 2011. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Medical
- MedlinePlus Consumer Health Information
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Agile text mining for the 2014 i2b2/UTHealth Cardiac risk factors challenge

Affiliations

Agile text mining for the 2014 i2b2/UTHealth Cardiac risk factors challenge

Authors

Affiliations

Abstract

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical