Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Sep-Oct;15(5):601-10.
doi: 10.1197/jamia.M2702. Epub 2008 Jun 25.

A software tool for removing patient identifying information from clinical documents

Affiliations

A software tool for removing patient identifying information from clinical documents

F Jeff Friedlin et al. J Am Med Inform Assoc. 2008 Sep-Oct.

Abstract

We created a software tool that accurately removes all patient identifying information from various kinds of clinical data documents, including laboratory and narrative reports. We created the Medical De-identification System (MeDS), a software tool that de-identifies clinical documents, and performed 2 evaluations. Our first evaluation used 2,400 Health Level Seven (HL7) messages from 10 different HL7 message producers. After modifying the software based on the results of this first evaluation, we performed a second evaluation using 7,190 pathology report HL7 messages. We compared the results of MeDS de-identification process to a gold standard of human review to find identifying strings. For both evaluations, we calculated the number of successful scrubs, missed identifiers, and over-scrubs committed by MeDS and evaluated the readability and interpretability of the scrubbed messages. We categorized all missed identifiers into 3 groups: (1) complete HIPAA-specified identifiers, (2) HIPAA-specified identifier fragments, (3) non-HIPAA-specified identifiers (such as provider names and addresses). In the results of the first-pass evaluation, MeDS scrubbed 11,273 (99.06%) of the 11,380 HIPAA-specified identifiers and 38,095 (98.26%) of the 38,768 non-HIPAA-specified identifiers. In our second evaluation (status postmodification to the software), MeDS scrubbed 79,993 (99.47%) of the 80,418 HIPAA-specified identifiers and 12,689 (96.93%) of the 13,091 non-HIPAA-specified identifiers. Approximately 95% of scrubbed messages were both readable and interpretable. We conclude that MeDS successfully de-identified a wide range of medical documents from numerous sources and creates scrubbed reports that retain their interpretability, thereby maintaining their usefulness for research.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Nineteen patient identifiers that require removal for de-identification per HIPAA regulations.
Figure 2
Figure 2
Processing schema of the de-identification software.
Figure 3
Figure 3
Examples of alternative date display formats found in sample messages.
Figure 4
Figure 4
Algorithm used by the name scrubbing process.
Figure 5
Figure 5
Example of a scrubbed HL7 narrative report message (endoscopy report).

References

    1. NOVA: Public Broadcasting System [homepage on the internet]. Louis Lasagna. Hippocratic Oath—Modern Version; 1964. Available from: http://www.pbs.org/wgbh/nova/doctors/oath_modern.html. Accessed July 17, 2008.
    1. Tilton SH. Right to privacy and confidentiality of medical records Occup Med 1996;11:17-29. - PubMed
    1. Kurtz G. EMR confidentiality and information security J Healthc Inf Manag 2003;17:41-48. - PubMed
    1. Health and Human Services HIPAA Web sitehttp://www.hhs.gov/ocr/hipaa/ 2003. Accessed July 1, 2006.
    1. U.S. Department of Health and Human Services. Standards for Privacy of Individually Identifiable Health Information; Final Rule. Code of Federal Regulations, Title 45, Parts 160 and 164. Available at: http://hhs.gov/ocr/combinedregtext.pdf. Accessed May 1, 2006.