An Approach to Reducing Information Loss and Achieving Diversity of Sensitive Attributes in k-anonymity Methods
- PMID: 23612074
- PMCID: PMC3626125
- DOI: 10.2196/ijmr.2140
An Approach to Reducing Information Loss and Achieving Diversity of Sensitive Attributes in k-anonymity Methods
Abstract
Electronic Health Records (EHRs) enable the sharing of patients' medical data. Since EHRs include patients' private data, access by researchers is restricted. Therefore k-anonymity is necessary to keep patients' private data safe without damaging useful medical information. However, k-anonymity cannot prevent sensitive attribute disclosure. An alternative, l-diversity, has been proposed as a solution to this problem and is defined as: each Q-block (ie, each set of rows corresponding to the same value for identifiers) contains at least l well-represented values for each sensitive attribute. While l-diversity protects against sensitive attribute disclosure, it is limited in that it focuses only on diversifying sensitive attributes. The aim of the study is to develop a k-anonymity method that not only minimizes information loss but also achieves diversity of the sensitive attribute. This paper proposes a new privacy protection method that uses conditional entropy and mutual information. This method considers both information loss as well as diversity of sensitive attributes. Conditional entropy can measure the information loss by generalization, and mutual information is used to achieve the diversity of sensitive attributes. This method can offer appropriate Q-blocks for generalization. We used the adult database from the UCI Machine Learning Repository and found that the proposed method can greatly reduce information loss compared with a recent l-diversity study. It can also achieve the diversity of sensitive attributes by counting the number of Q-blocks that have leaks of diversity. This study provides a privacy protection method that can improve data utility and protect against sensitive attribute disclosure. The method is viable and should be of interest for further privacy protection in EHR applications.
Keywords: Conditional entropy; Information loss; Mutual information; k-anonymity; l-diversity.
Conflict of interest statement
Conflicts of Interest: None declared.
Figures






Similar articles
-
Privacy Protection of Healthcare Data over Social Networks Using Machine Learning Algorithms.Comput Intell Neurosci. 2022 Mar 24;2022:9985933. doi: 10.1155/2022/9985933. eCollection 2022. Comput Intell Neurosci. 2022. Retraction in: Comput Intell Neurosci. 2023 Aug 2;2023:9815652. doi: 10.1155/2023/9815652. PMID: 35371203 Free PMC article. Retracted.
-
Anatomisation with slicing: a new privacy preservation approach for multiple sensitive attributes.Springerplus. 2016 Jul 4;5(1):964. doi: 10.1186/s40064-016-2490-0. eCollection 2016. Springerplus. 2016. PMID: 27429874 Free PMC article.
-
A Python library to check the level of anonymity of a dataset.Sci Data. 2022 Dec 26;9(1):785. doi: 10.1038/s41597-022-01894-2. Sci Data. 2022. PMID: 36572676 Free PMC article.
-
Designing a Novel Approach Using a Greedy and Information-Theoretic Clustering-Based Algorithm for Anonymizing Microdata Sets.Entropy (Basel). 2023 Dec 1;25(12):1613. doi: 10.3390/e25121613. Entropy (Basel). 2023. PMID: 38136493 Free PMC article.
-
Concept analysis: lack of anonymity.J Adv Nurs. 2017 May;73(5):1075-1084. doi: 10.1111/jan.13236. Epub 2017 Feb 8. J Adv Nurs. 2017. PMID: 27987322 Review.
Cited by
-
Algorithms to anonymize structured medical and healthcare data: A systematic review.Front Bioinform. 2022 Dec 22;2:984807. doi: 10.3389/fbinf.2022.984807. eCollection 2022. Front Bioinform. 2022. PMID: 36619476 Free PMC article.
-
"Big data" and the electronic health record.Yearb Med Inform. 2014 Aug 15;9(1):97-104. doi: 10.15265/IY-2014-0003. Yearb Med Inform. 2014. PMID: 25123728 Free PMC article. Review.
-
Privacy Policy and Technology in Biomedical Data Science.Annu Rev Biomed Data Sci. 2018 Jul;1:115-129. doi: 10.1146/annurev-biodatasci-080917-013416. Annu Rev Biomed Data Sci. 2018. PMID: 31058261 Free PMC article.
References
-
- Hensler P. Electronic Medical Records (EMR) Clendening, Johnson & Bohrer; 2012. [2012-11-02]. http://www.google.ca/url?sa=t&rct=j&q=p%20bohrer%20electronic%20medical%....
-
- Kirwan JR. Making original data from clinical studies available for alternative analysis. J Rheumatol. 1997 May;24(5):822–5. - PubMed
-
- Hrynaszkiewicz I, Altman DG. Towards agreement on best practice for publishing raw clinical trial data. Trials. 2009;10:17. doi: 10.1186/1745-6215-10-17. http://www.trialsjournal.com/content/10//17 - DOI - PMC - PubMed
-
- GPO US. 2008. [2012-11-03]. Part 46-Projection of human subjects http://www.gpo.gov/fdsys/pkg/CFR-2008-title45-vol1/content-detail.html.
LinkOut - more resources
Full Text Sources