A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries

Min Jiang¹, Yukun Chen, Mei Liu, S Trent Rosenbloom, Subramani Mani, Joshua C Denny, Hua Xu

Affiliations

PMID: 21508414
PMCID: PMC3168315
DOI: 10.1136/amiajnl-2011-000163

Comparative Study

A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries

Min Jiang et al. J Am Med Inform Assoc. 2011 Sep-Oct.

. 2011 Sep-Oct;18(5):601-6.

doi: 10.1136/amiajnl-2011-000163. Epub 2011 Apr 20.

Authors

Min Jiang¹, Yukun Chen, Mei Liu, S Trent Rosenbloom, Subramani Mani, Joshua C Denny, Hua Xu

Affiliation

¹ Department of Biomedical Informatics, Vanderbilt University, School of Medicine, Nashville, Tennessee 37232, USA.

PMID: 21508414
PMCID: PMC3168315
DOI: 10.1136/amiajnl-2011-000163

Abstract

Objective: The authors' goal was to develop and evaluate machine-learning-based approaches to extracting clinical entities-including medical problems, tests, and treatments, as well as their asserted status-from hospital discharge summaries written using natural language. This project was part of the 2010 Center of Informatics for Integrating Biology and the Bedside/Veterans Affairs (VA) natural-language-processing challenge.

Design: The authors implemented a machine-learning-based named entity recognition system for clinical text and systematically evaluated the contributions of different types of features and ML algorithms, using a training corpus of 349 annotated notes. Based on the results from training data, the authors developed a novel hybrid clinical entity extraction system, which integrated heuristic rule-based modules with the ML-base named entity recognition module. The authors applied the hybrid system to the concept extraction and assertion classification tasks in the challenge and evaluated its performance using a test data set with 477 annotated notes.

Measurements: Standard measures including precision, recall, and F-measure were calculated using the evaluation script provided by the Center of Informatics for Integrating Biology and the Bedside/VA challenge organizers. The overall performance for all three types of clinical entities and all six types of assertions across 477 annotated notes were considered as the primary metric in the challenge.

Results and discussion: Systematic evaluation on the training set showed that Conditional Random Fields outperformed Support Vector Machines, and semantic information from existing natural-language-processing systems largely improved performance, although contributions from different types of features varied. The authors' hybrid entity extraction system achieved a maximum overall F-score of 0.8391 for concept extraction (ranked second) and 0.9313 for assertion classification (ranked fourth, but not statistically different than the first three systems) on the test data set in the challenge.

PubMed Disclaimer

Conflict of interest statement

Competing interests: None.

Figures

**Figure 1**
Architecture of the Medical Named Entity Tagger, a hybrid system for clinical Named Entity Recognition (NER). CRF, conditional random fields.

See this image and copyright information in PMC

References

1. Sager N, Friedman C, Chi E, et al. The analysis and processing of clinical narrative. MedInfo 1986:1101–5
1. Sager N, Friedman C, Lyman M. Medical Language Processing: Computer Management of Narrative Data. Reading, MA: Addison-Wesley, 1987
1. Hripcsak G, Friedman C, Alderson PO, et al. Unlocking clinical data from narrative reports: a study of natural language processing. Ann Intern Med 1995;122:681–8 - PubMed
1. Friedman C, Alderson PO, Austin JH, et al. A general natural-language text processor for clinical radiology. J Am Med Inform Assoc 1994;1:161–74 - PMC - PubMed
1. Hripcsak G, Austin JH, Alderson PO, et al. Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports. Radiology 2002;224:157–63 - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries

Affiliation

A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources