Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2009 Sep-Oct;16(5):670-82.
doi: 10.1197/jamia.M3144. Epub 2009 Jun 30.

A globally optimal k-anonymity method for the de-identification of health data

Affiliations
Comparative Study

A globally optimal k-anonymity method for the de-identification of health data

Khaled El Emam et al. J Am Med Inform Assoc. 2009 Sep-Oct.

Abstract

Background: Explicit patient consent requirements in privacy laws can have a negative impact on health research, leading to selection bias and reduced recruitment. Often legislative requirements to obtain consent are waived if the information collected or disclosed is de-identified.

Objective: The authors developed and empirically evaluated a new globally optimal de-identification algorithm that satisfies the k-anonymity criterion and that is suitable for health datasets.

Design: Authors compared OLA (Optimal Lattice Anonymization) empirically to three existing k-anonymity algorithms, Datafly, Samarati, and Incognito, on six public, hospital, and registry datasets for different values of k and suppression limits. Measurement Three information loss metrics were used for the comparison: precision, discernability metric, and non-uniform entropy. Each algorithm's performance speed was also evaluated.

Results: The Datafly and Samarati algorithms had higher information loss than OLA and Incognito; OLA was consistently faster than Incognito in finding the globally optimal de-identification solution.

Conclusions: For the de-identification of health datasets, OLA is an improvement on existing k-anonymity algorithms in terms of information loss and performance.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Examples of value generalization hierarchies for three common quasi-identifiers: (a) admission date, (b) gender, and (c) age in years.
Figure 2
Figure 2
(a) An example of a lattice of generalizations. Each node indicates the generalization level for each of the three variables, and in parenthesis the percentage of suppression and the value of the Prec information loss metric. (b) The same lattice showing two generalization strategies through it. The two strategies go through the node <d0, g1, a2>.
Figure 3
Figure 3
Three possible datasets representing different nodes in the lattice. Dataset (a) represents node 0, g0, a0>. Dataset (b) represents node <d0, g0, a1> and is a generalization of (a). Dataset (c) represents node <d0, g1, a0> and is a generalization of (a). We assume that the objective is to achieve 3-anonymity.
Figure 4
Figure 4
Panel (a) is a sub-lattice of the lattice in ▶. Panel (b) is a sub-lattice of (a). Sub-lattices used in the illustrative example of finding the k-anonymous nodes. The shaded nodes are k-anonymous.
Figure 5
Figure 5
An example of a frequency set for the datasets shown in ▶. Table (b) is an age generalization of Table (a).
Figure 8
Figure 8
The performance metrics comparing our algorithm to Incognito. The results are for the 5% suppression limit. Our algorithm is the 100% value on the y-axis, and if Incognito performs more computations then its value is above 100%, and if it performs less computation then its value is below 100%. The panels show: (a) the total number of nodes for which we need to compute if they are k-anonymous, (b) the node complexity score given by Equation 3, and (c) the number of nodes for which information loss needs to be computed.

References

    1. Ness R. Influence of the HIPAA privacy rule on health research J Am Med Assoc 2007;298(18):2164-2170. - PubMed
    1. Institute of Medicine Health research and the privacy of health information—The HIPAA privacy rule, 2008http://www.iom.edu/CMS/3740/43729.aspx 2007. Accessed August 4, 2009.
    1. Institute of Medicine 2006. Effect of the HIPAA privacy rule on health research: Proceedings of a workshop presented to the National Cancer Policy Forum.
    1. Association of Academic Health Centers HIPAA creating barriers to research and discovery 2008.
    1. Wilson J. Health insurance portability and accountability Act privacy rule causes ongoing concerns among clinicians and researchers Ann Intern Med 2006;145(4):313-316. - PubMed

Publication types