Reducing patient re-identification risk for laboratory results within research datasets
- PMID: 22822040
- PMCID: PMC3555327
- DOI: 10.1136/amiajnl-2012-001026
Reducing patient re-identification risk for laboratory results within research datasets
Abstract
Objective: To try to lower patient re-identification risks for biomedical research databases containing laboratory test results while also minimizing changes in clinical data interpretation.
Materials and methods: In our threat model, an attacker obtains 5-7 laboratory results from one patient and uses them as a search key to discover the corresponding record in a de-identified biomedical research database. To test our models, the existing Vanderbilt TIME database of 8.5 million Safe Harbor de-identified laboratory results from 61 280 patients was used. The uniqueness of unaltered laboratory results in the dataset was examined, and then two data perturbation models were applied-simple random offsets and an expert-derived clinical meaning-preserving model. A rank-based re-identification algorithm to mimic an attack was used. The re-identification risk and the retention of clinical meaning for each model's perturbed laboratory results were assessed.
Results: Differences in re-identification rates between the algorithms were small despite substantial divergence in altered clinical meaning. The expert algorithm maintained the clinical meaning of laboratory results better (affecting up to 4% of test results) than simple perturbation (affecting up to 26%).
Discussion and conclusion: With growing impetus for sharing clinical data for research, and in view of healthcare-related federal privacy regulation, methods to mitigate risks of re-identification are important. A practical, expert-derived perturbation algorithm that demonstrated potential utility was developed. Similar approaches might enable administrators to select data protection scheme parameters that meet their preferences in the trade-off between the protection of privacy and the retention of clinical meaning of shared data.
Conflict of interest statement
Figures
References
-
- Boaden R, Joyce P. Developing the electronic health record: what about patient safety? Health Serv Manage Res 2006;19:94–104 - PubMed
-
- Chaudhry B, Wang J, Wu S, et al. Systematic review: impact of health information technology on quality, efficiency, and costs of medical care. Ann Intern Med 2006;144:742–52 - PubMed
-
- Evans DC, Nichol WP, Perlin JB. Effect of the implementation of an enterprise-wide electronic health record on productivity in the Veterans Health Administration. Health Econ Policy Law 2006;1:163–9 - PubMed
-
- James B. E-health: steps on the road to interoperability. Health Aff (Millwood) 2005;Suppl Web Exclusives:W5–26–W5–30. - PubMed
-
- Soti P, Pandey S. Business process optimization for RHIOs. J Healthc Inf Manag 2007;21:40–7 - PubMed
Publication types
MeSH terms
Grants and funding
- U01 HG006385/HG/NHGRI NIH HHS/United States
- R01LM009018/LM/NLM NIH HHS/United States
- R01 LM009018/LM/NLM NIH HHS/United States
- T32GM07347/GM/NIGMS NIH HHS/United States
- 1U01HG006378/HG/NHGRI NIH HHS/United States
- R01 LM009989/LM/NLM NIH HHS/United States
- R01LM010828/LM/NLM NIH HHS/United States
- T32 GM007347/GM/NIGMS NIH HHS/United States
- U01 HG006378/HG/NHGRI NIH HHS/United States
- R01 LM010828/LM/NLM NIH HHS/United States
- R01 LM007995/LM/NLM NIH HHS/United States
- 1U01HG006385/HG/NHGRI NIH HHS/United States
- T15 LM007450/LM/NLM NIH HHS/United States
- R01LM007995/LM/NLM NIH HHS/United States
- T15LM007450/LM/NLM NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources
