Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Dec;58 Suppl(Suppl):S30-S38.
doi: 10.1016/j.jbi.2015.06.015. Epub 2015 Jul 29.

Automatic detection of protected health information from clinic narratives

Affiliations

Automatic detection of protected health information from clinic narratives

Hui Yang et al. J Biomed Inform. 2015 Dec.

Abstract

This paper presents a natural language processing (NLP) system that was designed to participate in the 2014 i2b2 de-identification challenge. The challenge task aims to identify and classify seven main Protected Health Information (PHI) categories and 25 associated sub-categories. A hybrid model was proposed which combines machine learning techniques with keyword-based and rule-based approaches to deal with the complexity inherent in PHI categories. Our proposed approaches exploit a rich set of linguistic features, both syntactic and word surface-oriented, which are further enriched by task-specific features and regular expression template patterns to characterize the semantics of various PHI categories. Our system achieved promising accuracy on the challenge test data with an overall micro-averaged F-measure of 93.6%, which was the winner of this de-identification challenge.

Keywords: Clinical text mining; De-identification; Hybrid model; Natural language processing; Protected Health Information (PHI).

PubMed Disclaimer

Figures

Figure 1
Figure 1
Example of clinical record with annotated PHI categories
Figure 2
Figure 2
System Diagram for the De-identification Task

References

    1. Aberdeen J, Bayer S, Yeniterzi R, et al. The MITRE Identification Scrubber Toolkit: design, training, and assessment. Int J Med Inform. 2010;79:849–59. - PubMed
    1. Beckwith BA, Mahaadevan R, Balis UJ, Kuo F. Development and evaluation of an open source software tool for deidentification of pathology reports. BMC Med Inform Decis Mak. 2006;6:12. - PMC - PubMed
    1. Benton A, Hill S, Ungar L, et al. A system for de-identifying medical message board text. BMC Bioinformatics. 2011;12(Suppl 3):S2. - PMC - PubMed
    1. Deleger L, Molnar K, Savova G, et al. Large-scale evaluation of automated clinical note de-identification and its impact on information extraction. J Am Med Inform Assoc. 2013;20:84–94. - PMC - PubMed
    1. Ferrández O, South BR, Shen S, Friedlin FJ, Samore MH. Evaluating current automatic de-identification methods with Veteran’s health administration clinical documents. BMC Med Res Methodol. 2012;12:109. - PMC - PubMed