Ten quick tips for protecting health data using de-identification and perturbation of structured datasets
- PMID: 40986522
- PMCID: PMC12456793
- DOI: 10.1371/journal.pcbi.1013507
Ten quick tips for protecting health data using de-identification and perturbation of structured datasets
Abstract
Structured patient data generated within the health data ecosystem are shared both internally for operational use and also externally for research and public health benefit. Protecting individual privacy and health data confidentiality in these contexts relies on data de-identification and anonymisation, although there are no universally accepted standards for these processes and the techniques involved can be technically complex. We present practical recommendations grounded in the principle of data minimisation-avoiding unnecessary granularity and identifying variables that could lead to re-identification when combined with other datasets. We provide practical guidance for anonymising and perturbing structured health data in ways that support compliance with data protection laws, describing technical and operational methods for reducing re-identification risk that include rounding numerical values, replacing precise values with ranges, adding jitter to numeric fields, aggregating data, management of date values and separating sensitive fields from identifying data to prevent linkage leading to re-identification. While some methods require advanced technical knowledge, we focus here on accessible strategies that can be implemented without specialist expertise, recognising the importance of the legal and governance frameworks in which anonymisation occurs. These guidelines support researchers, data managers and institutions in sharing health data responsibly, maintaining data utility while upholding privacy and promoting ethical and legal data stewardship for data-driven health research.
Copyright: © 2025 Lulamba et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures



References
-
- World Health Organisation. Sharing and reuse of health-related data for research purposes: WHO policy and implementation guidance. 2022. Available from: https://iris.who.int/bitstream/handle/10665/352859/9789240044968-eng.pdf...
-
- Sweeney L, Abu A, Winn J. Identifying participants in the personal genome project by name (A re-identification experiment). arXiv; 2013. doi: 10.48550/arXiv.1304.7605 - DOI
-
- Sweeney L. Simple demographics often identify people uniquely. Carnegie Mellon University; 2000.
-
- Ni C, Cang LS, Gope P, Min G. Data anonymization evaluation for big data and IoT environment. Inf Sci. 2022;605:381–92. doi: 10.1016/j.ins.2022.05.040 - DOI
MeSH terms
LinkOut - more resources
Full Text Sources