Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Dec 9;6(1):e10.
doi: 10.1017/cts.2021.880. eCollection 2022.

Ensuring a safe(r) harbor: Excising personally identifiable information from structured electronic health record data

Affiliations

Ensuring a safe(r) harbor: Excising personally identifiable information from structured electronic health record data

Emily R Pfaff et al. J Clin Transl Sci. .

Abstract

Recent findings have shown that the continued expansion of the scope and scale of data collected in electronic health records are making the protection of personally identifiable information (PII) more challenging and may inadvertently put our institutions and patients at risk if not addressed. As clinical terminologies expand to include new terms that may capture PII (e.g., Patient First Name, Patient Phone Number), institutions may start using them in clinical data capture (and in some cases, they already have). Once in use, PII-containing values associated with these terms may find their way into laboratory or observation data tables via extract-transform-load jobs intended to process structured data, putting institutions at risk of unintended disclosure. Here we aim to inform the informatics community of these findings, as well as put out a call to action for remediation by the community.

Keywords: Electronic health records; data privacy; medical terminologies.

PubMed Disclaimer

Conflict of interest statement

Melissa Haendel is a co-founder of Pryzm Health. Kristin Kostka reports consulting fees from the National Institutes of Health. Matvey Palchuk is a full-time employee of TriNetX, LLC. Emily Niehaus is a full-time employee of Palantir Technologies. All other authors have no competing interests to declare.

Figures

Fig. 1.
Fig. 1.
Removing an entire column known to contain personally identifiable information (PII) (a) is significantly easier than identifying PII-containing rows (b) that exist among nonidentifying records.

References

    1. Office of the Secretary. Department of Health and Human Services. Federal Register 2013; 78(17): 5566–5702.
    1. Methods for De-identification of PHI. HHS.gov [Internet], 2021. (https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-id...)
    1. Heider PM, Obeid JS, Meystre SM. A comparative analysis of speed and accuracy for three off-the-shelf de-identification tools. AMIA Joint Summits on Translational Science Proceedings 2020; 2020: 241–250. - PMC - PubMed
    1. Pintus R, Yang Y, Rushmeier H. Athena. Journal on Computing and Cultural Heritage 2015; 8(1): 1–25. DOI 10.1145/2659020. - DOI
    1. Kim Y, Heider P, Meystre S. Ensemble-based methods to improve de-identification of electronic health record narratives. AMIA Annual Symposium Proceedings 2018; 2018: 663–672. - PMC - PubMed

LinkOut - more resources