Automatic detection of protected health information from clinic narratives

Hui Yang¹, Jonathan M Garibaldi²

Affiliations

¹ School of Computer Science, University of Nottingham, Nottingham, UK; Advanced Data Analysis Centre, University of Nottingham, Nottingham, UK. Electronic address: Hui.Yang@nottingham.ac.uk.
² School of Computer Science, University of Nottingham, Nottingham, UK; Advanced Data Analysis Centre, University of Nottingham, Nottingham, UK.

PMID: 26231070
PMCID: PMC4989090
DOI: 10.1016/j.jbi.2015.06.015

Automatic detection of protected health information from clinic narratives

Hui Yang et al. J Biomed Inform. 2015 Dec.

. 2015 Dec;58 Suppl(Suppl):S30-S38.

doi: 10.1016/j.jbi.2015.06.015. Epub 2015 Jul 29.

Authors

Hui Yang¹, Jonathan M Garibaldi²

Affiliations

¹ School of Computer Science, University of Nottingham, Nottingham, UK; Advanced Data Analysis Centre, University of Nottingham, Nottingham, UK. Electronic address: Hui.Yang@nottingham.ac.uk.
² School of Computer Science, University of Nottingham, Nottingham, UK; Advanced Data Analysis Centre, University of Nottingham, Nottingham, UK.

PMID: 26231070
PMCID: PMC4989090
DOI: 10.1016/j.jbi.2015.06.015

Abstract

This paper presents a natural language processing (NLP) system that was designed to participate in the 2014 i2b2 de-identification challenge. The challenge task aims to identify and classify seven main Protected Health Information (PHI) categories and 25 associated sub-categories. A hybrid model was proposed which combines machine learning techniques with keyword-based and rule-based approaches to deal with the complexity inherent in PHI categories. Our proposed approaches exploit a rich set of linguistic features, both syntactic and word surface-oriented, which are further enriched by task-specific features and regular expression template patterns to characterize the semantics of various PHI categories. Our system achieved promising accuracy on the challenge test data with an overall micro-averaged F-measure of 93.6%, which was the winner of this de-identification challenge.

Keywords: Clinical text mining; De-identification; Hybrid model; Natural language processing; Protected Health Information (PHI).

PubMed Disclaimer

Figures

**Figure 1**
Example of clinical record with annotated PHI categories

**Figure 2**
System Diagram for the De-identification Task

See this image and copyright information in PMC

References

1. Aberdeen J, Bayer S, Yeniterzi R, et al. The MITRE Identification Scrubber Toolkit: design, training, and assessment. Int J Med Inform. 2010;79:849–59. - PubMed
1. Beckwith BA, Mahaadevan R, Balis UJ, Kuo F. Development and evaluation of an open source software tool for deidentification of pathology reports. BMC Med Inform Decis Mak. 2006;6:12. - PMC - PubMed
1. Benton A, Hill S, Ungar L, et al. A system for de-identifying medical message board text. BMC Bioinformatics. 2011;12(Suppl 3):S2. - PMC - PubMed
1. Deleger L, Molnar K, Savova G, et al. Large-scale evaluation of automated clinical note de-identification and its impact on information extraction. J Am Med Inform Assoc. 2013;20:84–94. - PMC - PubMed
1. Ferrández O, South BR, Shen S, Friedlin FJ, Samore MH. Evaluating current automatic de-identification methods with Veteran’s health administration clinical documents. BMC Med Res Methodol. 2012;12:109. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Automatic detection of protected health information from clinic narratives

Affiliations

Automatic detection of protected health information from clinic narratives

Authors

Affiliations

Abstract

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources