Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Nov 13;9(11):e112774.
doi: 10.1371/journal.pone.0112774. eCollection 2014.

Negation's not solved: generalizability versus optimizability in clinical natural language processing

Affiliations

Negation's not solved: generalizability versus optimizability in clinical natural language processing

Stephen Wu et al. PLoS One. .

Abstract

A review of published work in clinical natural language processing (NLP) may suggest that the negation detection task has been "solved." This work proposes that an optimizable solution does not equal a generalizable solution. We introduce a new machine learning-based Polarity Module for detecting negation in clinical text, and extensively compare its performance across domains. Using four manually annotated corpora of clinical text, we show that negation detection performance suffers when there is no in-domain development (for manual methods) or training data (for machine learning-based methods). Various factors (e.g., annotation guidelines, named entity characteristics, the amount of data, and lexical and syntactic context) play a role in making generalizability difficult, but none completely explains the phenomenon. Furthermore, generalizability remains challenging because it is unclear whether to use a single source for accurate data, combine all sources into a single model, or apply domain adaptation methods. The most reliable means to improve negation detection is to manually annotate in-domain training data (or, perhaps, manually modify rules); this is a strategy for optimizing performance, rather than generalizing it. These results suggest a direction for future work in domain-adaptive and task-adaptive methods for clinical NLP.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. The cTAKES Pipeline.
The SHARPn Polarity Module is an Attribute Discovery algorithm. Training and evaluations use gold standard NEs (skip NER).
Figure 2
Figure 2. Significance bands of model performance for each test corpus.
These are labeled with successive letters from right to left in Table 4.
Figure 3
Figure 3. Learning curve for i2b2 training data on various corpora.
For each proportion of the i2b2 corpus (x axis), the reported F-score (y axis) is an average of 5 randomly sampled runs.
Figure 4
Figure 4. The effect of named entity length (in number of words) on performance for each of 6 training configurations.
SHARP, MiPACQ, and i2b2 test sets are used for evaluation.
Figure 5
Figure 5. The effect of named entity semantic group on the F-score of 6 models.
SHARP, MiPACQ, and i2b2 test sets are used for evaluation.

References

    1. Chapman W, Bridewell W, Hanbury P, Cooper G, Buchanan B (2001) A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of biomedical informatics 34: 301–310. - PubMed
    1. Clark C, Aberdeen J, Coarr M, Tresner-Kirsch D, Wellner B, et al. (2011) MITRE system for clinical assertion status classification. J Am Med Inform Assoc 18: 563–567. - PMC - PubMed
    1. Elkin PL, Brown SH, Bauer BA, Husser CS, Carruth W, et al. (2005) A controlled trial of automated classification of negation from clinical notes. BMC Med Inform Decis Mak 5: 13. - PMC - PubMed
    1. Goldin I, Chapman WW (2003) Learning to detect negation with ‘not’in medical texts. Proc Workshop on Text Analysis and Search for Bioinformatics, ACM SIGIR.
    1. Huang Y, Lowe HJ (2007) A novel hybrid approach to automated negation detection in clinical radiology reports. J Am Med Inform Assoc 14: 304–311. - PMC - PubMed

Publication types