Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Nov 13;1(2):e14.
doi: 10.2196/ijmr.2140.

An Approach to Reducing Information Loss and Achieving Diversity of Sensitive Attributes in k-anonymity Methods

Affiliations

An Approach to Reducing Information Loss and Achieving Diversity of Sensitive Attributes in k-anonymity Methods

Sunyong Yoo et al. Interact J Med Res. .

Abstract

Electronic Health Records (EHRs) enable the sharing of patients' medical data. Since EHRs include patients' private data, access by researchers is restricted. Therefore k-anonymity is necessary to keep patients' private data safe without damaging useful medical information. However, k-anonymity cannot prevent sensitive attribute disclosure. An alternative, l-diversity, has been proposed as a solution to this problem and is defined as: each Q-block (ie, each set of rows corresponding to the same value for identifiers) contains at least l well-represented values for each sensitive attribute. While l-diversity protects against sensitive attribute disclosure, it is limited in that it focuses only on diversifying sensitive attributes. The aim of the study is to develop a k-anonymity method that not only minimizes information loss but also achieves diversity of the sensitive attribute. This paper proposes a new privacy protection method that uses conditional entropy and mutual information. This method considers both information loss as well as diversity of sensitive attributes. Conditional entropy can measure the information loss by generalization, and mutual information is used to achieve the diversity of sensitive attributes. This method can offer appropriate Q-blocks for generalization. We used the adult database from the UCI Machine Learning Repository and found that the proposed method can greatly reduce information loss compared with a recent l-diversity study. It can also achieve the diversity of sensitive attributes by counting the number of Q-blocks that have leaks of diversity. This study provides a privacy protection method that can improve data utility and protect against sensitive attribute disclosure. The method is viable and should be of interest for further privacy protection in EHR applications.

Keywords: Conditional entropy; Information loss; Mutual information; k-anonymity; l-diversity.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

Figure 1
Figure 1
Equations (1) to (8).
Figure 2
Figure 2
Individual conditional entropies and mutual information for a pair of correlated subsystems.
Figure 3
Figure 3
Simplified concept of the proposed method.
Figure 4
Figure 4
Comparison of total information loss with respect to the number of instances.
Figure 5
Figure 5
Comparison of the number of Q-blocks, which are l=1 (homogeneity attack), l=2 (background knowledge attack), and l=3 (safe), to measure the diversity (the size of Q-block is set to 3).
Figure 6
Figure 6
Comparison of execution time with respect to the number of instances.

Similar articles

Cited by

References

    1. Fienberg SE. Sharing statistical data in the biomedical and health sciences: ethical, institutional, legal, and professional dimensions. Annu Rev Public Health. 1994;15:1–18. doi: 10.1146/annurev.pu.15.050194.000245. - DOI - PubMed
    1. Hensler P. Electronic Medical Records (EMR) Clendening, Johnson & Bohrer; 2012. [2012-11-02]. http://www.google.ca/url?sa=t&rct=j&q=p%20bohrer%20electronic%20medical%....
    1. Kirwan JR. Making original data from clinical studies available for alternative analysis. J Rheumatol. 1997 May;24(5):822–5. - PubMed
    1. Hrynaszkiewicz I, Altman DG. Towards agreement on best practice for publishing raw clinical trial data. Trials. 2009;10:17. doi: 10.1186/1745-6215-10-17. http://www.trialsjournal.com/content/10//17 - DOI - PMC - PubMed
    1. GPO US. 2008. [2012-11-03]. Part 46-Projection of human subjects http://www.gpo.gov/fdsys/pkg/CFR-2008-title45-vol1/content-detail.html.

LinkOut - more resources