Negation's not solved: generalizability versus optimizability in clinical natural language processing

Stephen Wu¹, Timothy Miller², James Masanz³, Matt Coarr⁴, Scott Halgrim⁵, David Carrell⁵, Cheryl Clark⁴

Affiliations

¹ Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, United States of America; Oregon Health and Science University, Portland, Oregon, United States of America.
² Children's Hospital Boston Informatics Program, Harvard Medical School, Boston, Massachusetts, United States of America.
³ Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, United States of America.
⁴ Human Language Technology Department, The MITRE Corporation, Bedford, Massachusetts, United States of America.
⁵ Group Health Research Institute, Seattle, Washington, United States of America.

PMID: 25393544
PMCID: PMC4231086
DOI: 10.1371/journal.pone.0112774

Negation's not solved: generalizability versus optimizability in clinical natural language processing

Stephen Wu et al. PLoS One. 2014.

. 2014 Nov 13;9(11):e112774.

doi: 10.1371/journal.pone.0112774. eCollection 2014.

Authors

Stephen Wu¹, Timothy Miller², James Masanz³, Matt Coarr⁴, Scott Halgrim⁵, David Carrell⁵, Cheryl Clark⁴

Affiliations

¹ Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, United States of America; Oregon Health and Science University, Portland, Oregon, United States of America.
² Children's Hospital Boston Informatics Program, Harvard Medical School, Boston, Massachusetts, United States of America.
³ Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, United States of America.
⁴ Human Language Technology Department, The MITRE Corporation, Bedford, Massachusetts, United States of America.
⁵ Group Health Research Institute, Seattle, Washington, United States of America.

PMID: 25393544
PMCID: PMC4231086
DOI: 10.1371/journal.pone.0112774

Abstract

A review of published work in clinical natural language processing (NLP) may suggest that the negation detection task has been "solved." This work proposes that an optimizable solution does not equal a generalizable solution. We introduce a new machine learning-based Polarity Module for detecting negation in clinical text, and extensively compare its performance across domains. Using four manually annotated corpora of clinical text, we show that negation detection performance suffers when there is no in-domain development (for manual methods) or training data (for machine learning-based methods). Various factors (e.g., annotation guidelines, named entity characteristics, the amount of data, and lexical and syntactic context) play a role in making generalizability difficult, but none completely explains the phenomenon. Furthermore, generalizability remains challenging because it is unclear whether to use a single source for accurate data, combine all sources into a single model, or apply domain adaptation methods. The most reliable means to improve negation detection is to manually annotate in-domain training data (or, perhaps, manually modify rules); this is a strategy for optimizing performance, rather than generalizing it. These results suggest a direction for future work in domain-adaptive and task-adaptive methods for clinical NLP.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1. The cTAKES Pipeline.**
The SHARPn Polarity Module is an Attribute Discovery algorithm. Training and evaluations use gold standard NEs (skip NER).

**Figure 2. Significance bands of model performance for each test corpus.**
These are labeled with successive letters from right to left in Table 4.

**Figure 3. Learning curve for i2b2 training data on various corpora.**
For each proportion of the i2b2 corpus (x axis), the reported F-score (y axis) is an average of 5 randomly sampled runs.

**Figure 4. The effect of named entity length (in number of words) on performance for each of 6 training configurations.**
SHARP, MiPACQ, and i2b2 test sets are used for evaluation.

**Figure 5. The effect of named entity semantic group on the F-score of 6 models.**
SHARP, MiPACQ, and i2b2 test sets are used for evaluation.

See this image and copyright information in PMC

References

1. Chapman W, Bridewell W, Hanbury P, Cooper G, Buchanan B (2001) A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of biomedical informatics 34: 301–310. - PubMed
1. Clark C, Aberdeen J, Coarr M, Tresner-Kirsch D, Wellner B, et al. (2011) MITRE system for clinical assertion status classification. J Am Med Inform Assoc 18: 563–567. - PMC - PubMed
1. Elkin PL, Brown SH, Bauer BA, Husser CS, Carruth W, et al. (2005) A controlled trial of automated classification of negation from clinical notes. BMC Med Inform Decis Mak 5: 13. - PMC - PubMed
1. Goldin I, Chapman WW (2003) Learning to detect negation with ‘not’in medical texts. Proc Workshop on Text Analysis and Search for Bioinformatics, ACM SIGIR.
1. Huang Y, Lowe HJ (2007) A novel hybrid approach to automated negation detection in clinical radiology reports. J Am Med Inform Assoc 14: 304–311. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Negation's not solved: generalizability versus optimizability in clinical natural language processing

Affiliations

Negation's not solved: generalizability versus optimizability in clinical natural language processing

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources