Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 May 30:2020:241-250.
eCollection 2020.

A Comparative Analysis of Speed and Accuracy for Three Off-the-Shelf De-Identification Tools

Affiliations

A Comparative Analysis of Speed and Accuracy for Three Off-the-Shelf De-Identification Tools

Paul M Heider et al. AMIA Jt Summits Transl Sci Proc. .

Abstract

A growing quantity of health data is being stored in Electronic Health Records (EHR). The free-text section of these clinical notes contains important patient and treatment information for research but also contains Personally Identifiable Information (PII), which cannot be freely shared within the research community without compromising patient confidentiality and privacy rights. Significant work has been invested in investigating automated approaches to text de-identification, the process of removing or redacting PII. Few studies have examined the performance of existing de-identification pipelines in a controlled comparative analysis. In this study, we use publicly available corpora to analyze speed and accuracy differences between three de-identification systems that can be run off-the-shelf: Amazon Comprehend Medical PHId, Clinacuity's CliniDeID, and the National Library of Medicine's Scrubber. No single system dominated all the compared metrics. NLM Scrubber was the fastest while CliniDeID generally had the highest accuracy.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
PII categories coverage
Figure 2:
Figure 2:
Accuracy for shared categories (left), specialty categories (center), and at the PII supertype level (right)

References

    1. Obeid JS, Beskow LM, Rape M, Gouripeddi R, Black RA, Cimino JJ, et al. A survey of practices for the use of electronic health records to support research recruitment. Journal of Clinical and Translational Science. 2017;1(4):246–252. - PMC - PubMed
    1. Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF. Extracting information from textual documents in the electronic health record: a review of recent research. Yearbook of medical informatics. 2008:128–44. - PubMed
    1. Shivade C, Raghavan P, Fosler-Lussier E, Embi PJ, Elhadad N, Johnson SB, et al. A review of approaches to identifying patient phenotype cohorts using electronic health records. Journal of the American Medical Informatics Association. 2014 11;21(2):221–230. - PMC - PubMed
    1. HIPAA Privacy Rule, 45 CFR Part 160, Part 164(A,E). U.S. Department of Health and Humans Services. 2002
    1. Federal Policy for the Protection of Human Subjects (’Common Rule’) [Internet] 2009 [cited 2018-1120]; Available from: https://www.hhs.gov/ohrp/regulations-and-policy/regulations/common-rule/....

LinkOut - more resources