Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2001 Nov-Dec;8(6):598-609.
doi: 10.1136/jamia.2001.0080598.

Use of general-purpose negation detection to augment concept indexing of medical documents: a quantitative study using the UMLS

Affiliations

Use of general-purpose negation detection to augment concept indexing of medical documents: a quantitative study using the UMLS

P G Mutalik et al. J Am Med Inform Assoc. 2001 Nov-Dec.

Abstract

Objectives: To test the hypothesis that most instances of negated concepts in dictated medical documents can be detected by a strategy that relies on tools developed for the parsing of formal (computer) languages-specifically, a lexical scanner ("lexer") that uses regular expressions to generate a finite state machine, and a parser that relies on a restricted subset of context-free grammars, known as LALR(1) grammars.

Methods: A diverse training set of 40 medical documents from a variety of specialties was manually inspected and used to develop a program (Negfinder) that contained rules to recognize a large set of negated patterns occurring in the text. Negfinder's lexer and parser were developed using tools normally used to generate programming language compilers. The input to Negfinder consisted of medical narrative that was preprocessed to recognize UMLS concepts: the text of a recognized concept had been replaced with a coded representation that included its UMLS concept ID. The program generated an index with one entry per instance of a concept in the document, where the presence or absence of negation of that concept was recorded. This information was used to mark up the text of each document by color-coding it to make it easier to inspect. The parser was then evaluated in two ways: 1) a test set of 60 documents (30 discharge summaries, 30 surgical notes) marked-up by Negfinder was inspected visually to quantify false-positive and false-negative results; and 2) a different test set of 10 documents was independently examined for negatives by a human observer and by Negfinder, and the results were compared.

Results: In the first evaluation using marked-up documents, 8,358 instances of UMLS concepts were detected in the 60 documents, of which 544 were negations detected by the program and verified by human observation (true-positive results, or TPs). Thirteen instances were wrongly flagged as negated (false-positive results, or FPs), and the program missed 27 instances of negation (false-negative results, or FNs), yielding a sensitivity of 95.3 percent and a specificity of 97.7 percent. In the second evaluation using independent negation detection, 1,869 concepts were detected in 10 documents, with 135 TPs, 12 FPs, and 6 FNs, yielding a sensitivity of 95.7 percent and a specificity of 91.8 percent. One of the words "no," "denies/denied," "not," or "without" was present in 92.5 percent of all negations.

Conclusions: Negation of most concepts in medical narrative can be reliably detected by a simple strategy. The reliability of detection depends on several factors, the most important being the accuracy of concept matching.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Excerpts from a discharge summary at various stages of the Negfinder pipeline. Top, Original document. Middle, Document transformed by coding of recognized concepts from UMLS 2000. Concepts are indicated by ~#:#:# where the three numbers indicate the UMLS concept ID, the byte offset in the text, and the length of the phrase. Thus, “pneumonia” is replaced by ~32285:17:9. The only words that remain are stop words and phrases or standard headings; unrecorded homonyms (see discussion in text) such as “rubs,”“S1,” and “S2”; and unrecorded variants of standard terms such as “gallop,” which is a variant of the UMLS preferred form “gallop rhythm.”Bottom, Negfinder mark-up simulated in monochrome. Negating phrases are marked in italics, identified concepts in bold; of these, negated concepts are also italicized.
Figure 2
Figure 2
Results of Evaluation 1, showing performance of Negfinder on a test set of 60 documents (30 discharge summaries, 30 surgical notes) using human evaluation of color-coded text previously marked up by Negfinder. This test has the possibility of priming bias.
Figures 3
Figures 3
Results of Evaluation 2, showing performance of Negfinder on a test set of 10 documents (5 discharge summaries, 5 surgical notes), using an unbiased design of independent evaluation by a human observer and Negfinder.

Similar articles

Cited by

References

    1. Hersh WR. Information Retrieval: A Health Care Perspective. New York: Springer-Verlag, 1996.
    1. Salton G. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Reading, Mass.: Addison-Wesley, 1989.
    1. Lindberg DA, Humphreys BL, McCray AT. The Unified Medical Language System. Methods Inf Med. 1993;32:281–91. - PMC - PubMed
    1. Aronson A, Rindflesch T, Browne A. Exploiting a large thesaurus for information retrieval. Proc RIAO '94 Conf; New York; October 1994. 1994:197–216.
    1. Rindflesch TC, Aronson AR. Ambiguity resolution while mapping free text to the UMLS Metathesaurus. Proc Annu Symp Comput Appl Med Care. 1994:240–44. - PMC - PubMed

Publication types