Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Nov:75S:S28-S33.
doi: 10.1016/j.jbi.2017.06.005. Epub 2017 Jun 7.

Learning to identify Protected Health Information by integrating knowledge- and data-driven algorithms: A case study on psychiatric evaluation notes

Affiliations

Learning to identify Protected Health Information by integrating knowledge- and data-driven algorithms: A case study on psychiatric evaluation notes

Azad Dehghan et al. J Biomed Inform. 2017 Nov.

Abstract

De-identification of clinical narratives is one of the main obstacles to making healthcare free text available for research. In this paper we describe our experience in expanding and tailoring two existing tools as part of the 2016 CEGS N-GRID Shared Tasks Track 1, which evaluated de-identification methods on a set of psychiatric evaluation notes for up to 25 different types of Protected Health Information (PHI). The methods we used rely on machine learning on either a large or small feature space, with additional strategies, including two-pass tagging and multi-class models, which both proved to be beneficial. The results show that the integration of the proposed methods can identify Health Information Portability and Accountability Act (HIPAA) defined PHIs with overall F1-scores of ∼90% and above. Yet, some classes (Profession, Organization) proved again to be challenging given the variability of expressions used to reference given information.

Keywords: Clinical text mining; De-identification; Electronic health record; Information extraction; Named entity recognition.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest

None.

Figures

Figure 1
Figure 1
An example of combining multiple CRFs producing output for the Doctor category

Similar articles

Cited by

References

    1. Meystre SM, Ferrández Ó, Friedlin FJ, South BR, Shen S, Samore MH. Text de-identification for privacy protection: a study of its impact on clinical text information content. J Biomed Inform. 2014 Aug;50:142–50. doi: 10.1016/j.jbi.2014.01.011.. - DOI - PubMed
    1. Carrell D, Malin B, Aberdeen J, Bayer S, Clark C, Wellner B, et al. Hiding in plain sight: use of realistic surrogates to reduce exposure of protected health information in clinical text. J Am Med Inform Assoc. 2013;20(2):342–348. http://dx.doi.org/10.1136/amiajnl-2012–001034. - DOI - PMC - PubMed
    1. Scaiano M, Middleton G, Arbuckle L, Kolhatkar V, Peyton L, Dowling M, et al. A unified framework for evaluating the risk of re-identification of text de-identification tools. J Biomed Inform. 2016;63:174–183. http://doi.org/10.1016/j.jbi.2016.07.015. - DOI - PubMed
    1. Kayaalp M, Browne AC, Sagan P, McGee T, McDonald CJ. Proceedings of the AMIA Annual Symposium. Chicago, IL: 2015. Challenges and Insights in Using HIPAA Privacy Rule for Clinical Text Annotation; pp. 707–716. - PMC - PubMed
    1. Carrell DS, Cronkite DJ, Malin BA, Aberdeen JS, Hirschman L. Is the Juice Worth the Squeeze? Costs and Benefits of Multiple Human Annotators for Clinical Text De-identification. Methods Inf Med. 2016;55(4):356–364. http://dx.doi.org/10.3414/ME15–01–0122. - DOI - PMC - PubMed

Publication types