Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Jan-Feb;15(1):14-24.
doi: 10.1197/jamia.M2408. Epub 2007 Oct 18.

Identifying patient smoking status from medical discharge records

Affiliations

Identifying patient smoking status from medical discharge records

Ozlem Uzuner et al. J Am Med Inform Assoc. 2008 Jan-Feb.

Abstract

The authors organized a Natural Language Processing (NLP) challenge on automatically determining the smoking status of patients from information found in their discharge records. This challenge was issued as a part of the i2b2 (Informatics for Integrating Biology to the Bedside) project, to survey, facilitate, and examine studies in medical language understanding for clinical narratives. This article describes the smoking challenge, details the data and the annotation process, explains the evaluation metrics, discusses the characteristics of the systems developed for the challenge, presents an analysis of the results of received system runs, draws conclusions about the state of the art, and identifies directions for future research. A total of 11 teams participated in the smoking challenge. Each team submitted up to three system runs, providing a total of 23 submissions. The submitted system runs were evaluated with microaveraged and macroaveraged precision, recall, and F-measure. The systems submitted to the smoking challenge represented a variety of machine learning and rule-based algorithms. Despite the differences in their approaches to smoking status identification, many of these systems provided good results. There were 12 system runs with microaveraged F-measures above 0.84. Analysis of the results highlighted the fact that discharge summaries express smoking status using a limited number of textual features (e.g., "smok", "tobac", "cigar", Social History, etc.). Many of the effective smoking status identifiers benefit from these features.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Results from ▶ sorted by microaveraged F-measure.

Similar articles

Cited by

References

    1. Chang JT, Altman RB. Promises of text processing: natural language processing meets AI Drug Discov Today 2002;7:992-993. - PubMed
    1. Lovis C, Baud RH. Fast exact string pattern matching algorithms adapted to the characteristics of the medical language J Am Med Inform Assoc 2000;7:378-391. - PMC - PubMed
    1. Friedman C, Alderson PO, Austin JH, Cimino JJ, Johnson SB. A general natural language text processor for clinical radiology J Am Med Inform Assoc 1994;1:161-174. - PMC - PubMed
    1. Haug PJ, Koehler S, Lau LM, Wang P, Rocha R, Huff SM. Experience with a mixed semantic/syntactic parser Proc Annu Symp Comput Appl Med Care 1995;19:284-288. - PMC - PubMed
    1. Goryachev S, Sordo M, Zeng QT. A suite of natural language processing tools developed for the i2b2 project AMIA Annu Symp Proc 2006:931. - PMC - PubMed

Publication types