Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Mar 29;7(1):6.
doi: 10.5334/egems.270.

Assessing and Minimizing Re-identification Risk in Research Data Derived from Health Care Records

Affiliations

Assessing and Minimizing Re-identification Risk in Research Data Derived from Health Care Records

Gregory E Simon et al. EGEMS (Wash DC). .

Abstract

Background: Sharing of research data derived from health system records supports the rigor and reproducibility of primary research and can accelerate research progress through secondary use. But public sharing of such data can create risk of re-identifying individuals, exposing sensitive health information.

Method: We describe a framework for assessing re-identification risk that includes: identifying data elements in a research dataset that overlap with external data sources, identifying small classes of records defined by unique combinations of those data elements, and considering the pattern of population overlap between the research dataset and an external source. We also describe alternative strategies for mitigating risk when the external data source can or cannot be directly examined.

Results: We illustrate this framework using the example of a large database used to develop and validate models predicting suicidal behavior after an outpatient visit. We identify elements in the research dataset that might create risk and propose a specific risk mitigation strategy: deleting indicators for health system (a proxy for state of residence) and visit year.

Discussion: Researchers holding health system data must balance the public health value of data sharing against the duty to protect the privacy of health system members. Specific steps can provide a useful estimate of re-identification risk and point to effective risk mitigation strategies.

Keywords: HIPAA; confidentiality; data sharing; electronic health records; privacy.

PubMed Disclaimer

Conflict of interest statement

The authors have no competing interests to declare.

Figures

Figure 1
Figure 1
Possible relationships between populations covered by sensitive research dataset and identified external dataset. Stars represent a small cell or class of individuals defined by data elements common to two datasets.
Figure 2
Figure 2
Potential linkage of suicide risk prediction dataset to state mortality records using shared data elements.
Figure 3
Figure 3
Relationship of specific small cell in suicide risk prediction dataset to matching records in state mortality data.
Figure 4
Figure 4
Relationship of specific small cell in suicide risk prediction dataset to matching records in state mortality data after exclusion of health system (i.e., state) identifier.
Figure 5
Figure 5
Relationship of specific small cell in suicide risk prediction dataset to matching records in national mortality data after exclusion of health system (i.e., state) identifier.

References

    1. NIH Sharing Policies and Related Guidance on NIH-Funded Research Resources. Accessed May 31, 2018.
    1. National Institute of Mental Health Data Archive. 2018. Accessed May 31, 2018.
    1. El Emam, K, Jonker, E, Arbuckle, L and Malin, B. A systematic review of re-identification attacks on health data. PLoS One. 2011; 6(12): e28071 DOI: 10.1371/journal.pone.0028071 - DOI - PMC - PubMed
    1. Staff HIN. The Biggest Healthcare Data Breaches of 2018 (So Far). Healthcare IT News; 2018.
    1. Largest Healthcare Data Breaches of 2017. HIPAA Journal; 2018.

LinkOut - more resources