Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010:2010:163-172.
doi: 10.1145/1882992.1883017.

Beyond Safe Harbor: Automatic Discovery of Health Information De-identification Policy Alternatives

Beyond Safe Harbor: Automatic Discovery of Health Information De-identification Policy Alternatives

Kathleen Benitez et al. IHI. 2010.

Abstract

Regulations in various countries permit the reuse of health information without patient authorization provided the data is "de-identified". In the United States, for instance, the Privacy Rule of the Health Insurance Portability and Accountability Act defines two distinct approaches to achieve de-identification; the first is Safe Harbor, which requires the removal of a list of identifiers and the second is Expert Determination, which requires that an expert certify the re-identification risk inherent in the data is sufficiently low. In reality, most healthcare organizations eschew the expert route because there are no standardized approaches and Safe Harbor is much simpler to interpret. This, however, precludes a wide range of worthwhile endeavors that are dependent on features suppressed by Safe Harbor, such as gerontological studies requiring detailed ages over 89. In response, we propose a novel approach to automatically discover alternative de-identification policies that contain no more re-identification risk than Safe Harbor. We model this task as a lattice-search problem, introduce a measure to capture the re-identification risk, and develop an algorithm that efficiently discovers polices by exploring the lattice. Using a cohort of approximately 3000 patient records from the Vanderbilt University Medical Center, as well as the Adult dataset from the UCI Machine Learning Repository, we also experimentally verify that a large number of alternative policies can be discovered in an efficient manner.

Keywords: De-identification; Privacy; Safe Harbor.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A general architecture of the alternative policy discovery process.
Figure 2
Figure 2
Standard generalization hierarchy for Age.
Figure 3
Figure 3
A generalization hierarchy that supports the HIPAA Safe Harbor policy for Age.
Figure 4
Figure 4
Sample dataset and data sharing policies, where Gender and Age are quasi-identifiers.
Figure 5
Figure 5
Example lattice for the de-identification policies that can be discovered from the sample dataset in Figure 4.
Figure 6
Figure 6
Analysis of policy search runtime for the datasets.
Figure 7
Figure 7
Autoregressive analysis of policy discovery time of iteration x on iteration x − 1 for the BPS algorithm with the Van dataset.
Figure 8
Figure 8
Effectiveness plots at the iteration level for the Vanderbilt dataset.
Figure 9
Figure 9
Effectiveness plots at the node search level.

References

    1. Adam N, Wortman J. Security control methods for statistical databases. ACM Comput. Surv. 1989;21:515–556.
    1. Beach J. Health care databases under HIPAA: statistical approaches to de-identification of protected health information. Presented at DIMACS Workshop on Privacy & Confidentiality of Health Data.2003.
    1. Benitez K, Malin B. Evaluating re-identification risks with respect to the HIPAA privacy rule. Journal of the American Medical Informatics Association. 2010;17(2):169–177. - PMC - PubMed
    1. Boulos M, Curtis A, AbdelMalik P. Musings on privacy issues in health research involving disaggregate geographic data about individuals. International Journal of Health Geographics. 2009;8:46. - PMC - PubMed
    1. Burton P, Hansell A, Fortier I, et al. Size matters: just how big is big?: Quantifying realistic sample size requirements for human genome epidemiology. International Journal of Epidemiology. 2008;38:263–273. - PMC - PubMed

LinkOut - more resources