Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Mar-Apr;17(2):169-77.
doi: 10.1136/jamia.2009.000026.

Evaluating re-identification risks with respect to the HIPAA privacy rule

Affiliations

Evaluating re-identification risks with respect to the HIPAA privacy rule

Kathleen Benitez et al. J Am Med Inform Assoc. 2010 Mar-Apr.

Abstract

Objective: Many healthcare organizations follow data protection policies that specify which patient identifiers must be suppressed to share "de-identified" records. Such policies, however, are often applied without knowledge of the risk of "re-identification". The goals of this work are: (1) to estimate re-identification risk for data sharing policies of the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule; and (2) to evaluate the risk of a specific re-identification attack using voter registration lists.

Measurements: We define several risk metrics: (1) expected number of re-identifications; (2) estimated proportion of a population in a group of size g or less, and (3) monetary cost per re-identification. For each US state, we estimate the risk posed to hypothetical datasets, protected by the HIPAA Safe Harbor and Limited Dataset policies by an attacker with full knowledge of patient identifiers and with limited knowledge in the form of voter registries.

Results: The percentage of a state's population estimated to be vulnerable to unique re-identification (ie, g=1) when protected via Safe Harbor and Limited Datasets ranges from 0.01% to 0.25% and 10% to 60%, respectively. In the voter attack, this number drops for many states, and for some states is 0%, due to the variable availability of voter registries in the real world. We also find that re-identification cost ranges from $0 to $17,000, further confirming risk variability.

Conclusions: This work illustrates that blanket protection policies, such as Safe Harbor, leave different organizations vulnerable to re-identification at different rates. It provides justification for locally performed re-identification risk estimates prior to sharing data.

PubMed Disclaimer

Conflict of interest statement

Competing interests: None.

Figures

Figure 1
Figure 1
Example of de-identification and re-identification using public records.
Figure 2
Figure 2
Interplay of data sources in re-identification.
Figure 3
Figure 3
g-Distinct risk analysis for the state of Ohio. (A) g=1 to 5 (B) g=1 to 20 000.
Figure 4
Figure 4
Trust differential (plotted on log scale) between Limited Dataset and Safe Harbor for the state of Ohio.
Figure 5
Figure 5
Distribution of g-distinct computations for all US states, clockwise from top left: (A) <SAFE, GENERAL>; (B) <LIMITED, GENERAL>; (C) <SAFE, VOTER>; and (D) <LIMITED, VOTER>.
Figure 6
Figure 6
Ranks for top and bottom two states. (A) <LIMITED, GENERAL>; (B) <LIMITED, VOTER>.
Figure 7
Figure 7
Ranks for top and bottom two states. (A) <SAFE, GENERAL>; (B) <SAFE, VOTER>.

References

    1. Blumenthal D. Stimulating the adoption of health information technology. N Engl J Med 2009;360:1477–9 - PubMed
    1. Safran C, Bloomrosen M, Hammond E, et al. Toward a national framework for the secondary use of health data: an American Medical Informatics Association white paper. J Am Med Inform Assoc 2007;14:1–9 - PMC - PubMed
    1. Weiner M, Embi P. Toward reuse of clinical data for research and quality improvement: the end of the beginning? Ann Intern Med 2009;151:359–60 - PubMed
    1. National Institutes of Health Final NIH statement on sharing research data NOT-OD-03–032. February 26, 2003
    1. National Institutes of Health Policy for sharing of data obtained in NIH supported or conducted genome-wide association studies (GWAS) NOT-OD-07–088. August 28, 2007

Publication types

MeSH terms