Reducing patient re-identification risk for laboratory results within research datasets
- PMID: 22822040
- PMCID: PMC3555327
- DOI: 10.1136/amiajnl-2012-001026
Reducing patient re-identification risk for laboratory results within research datasets
Abstract
Objective: To try to lower patient re-identification risks for biomedical research databases containing laboratory test results while also minimizing changes in clinical data interpretation.
Materials and methods: In our threat model, an attacker obtains 5-7 laboratory results from one patient and uses them as a search key to discover the corresponding record in a de-identified biomedical research database. To test our models, the existing Vanderbilt TIME database of 8.5 million Safe Harbor de-identified laboratory results from 61 280 patients was used. The uniqueness of unaltered laboratory results in the dataset was examined, and then two data perturbation models were applied-simple random offsets and an expert-derived clinical meaning-preserving model. A rank-based re-identification algorithm to mimic an attack was used. The re-identification risk and the retention of clinical meaning for each model's perturbed laboratory results were assessed.
Results: Differences in re-identification rates between the algorithms were small despite substantial divergence in altered clinical meaning. The expert algorithm maintained the clinical meaning of laboratory results better (affecting up to 4% of test results) than simple perturbation (affecting up to 26%).
Discussion and conclusion: With growing impetus for sharing clinical data for research, and in view of healthcare-related federal privacy regulation, methods to mitigate risks of re-identification are important. A practical, expert-derived perturbation algorithm that demonstrated potential utility was developed. Similar approaches might enable administrators to select data protection scheme parameters that meet their preferences in the trade-off between the protection of privacy and the retention of clinical meaning of shared data.
Conflict of interest statement
Figures





Similar articles
-
Secure privacy-preserving record linkage system from re-identification attack.PLoS One. 2025 Jan 9;20(1):e0314486. doi: 10.1371/journal.pone.0314486. eCollection 2025. PLoS One. 2025. PMID: 39787068 Free PMC article.
-
Evaluating re-identification risks with respect to the HIPAA privacy rule.J Am Med Inform Assoc. 2010 Mar-Apr;17(2):169-77. doi: 10.1136/jamia.2009.000026. J Am Med Inform Assoc. 2010. PMID: 20190059 Free PMC article.
-
Design and implementation of a privacy preserving electronic health record linkage tool in Chicago.J Am Med Inform Assoc. 2015 Sep;22(5):1072-80. doi: 10.1093/jamia/ocv038. Epub 2015 Jun 23. J Am Med Inform Assoc. 2015. PMID: 26104741 Free PMC article.
-
Sharing traumatic stress research data: assessing and reducing the risk of re-identification.Eur J Psychotraumatol. 2025 Dec;16(1):2499296. doi: 10.1080/20008066.2025.2499296. Epub 2025 May 19. Eur J Psychotraumatol. 2025. PMID: 40387730 Free PMC article. Review.
-
Privacy preserving interactive record linkage (PPIRL).J Am Med Inform Assoc. 2014 Mar-Apr;21(2):212-20. doi: 10.1136/amiajnl-2013-002165. Epub 2013 Nov 7. J Am Med Inform Assoc. 2014. PMID: 24201028 Free PMC article. Review.
Cited by
-
Lost in Anonymization - A Data Anonymization Reference Classification Merging Legal and Technical Considerations.J Law Med Ethics. 2020 Mar;48(1):228-231. doi: 10.1177/1073110520917025. J Law Med Ethics. 2020. PMID: 32342783 Free PMC article. No abstract available.
-
Reidentification of Participants in Shared Clinical Data Sets: Experimental Study.JMIR AI. 2024 Mar 15;3:e52054. doi: 10.2196/52054. JMIR AI. 2024. PMID: 38875581 Free PMC article.
-
Detecting the Presence of an Individual in Phenotypic Summary Data.AMIA Annu Symp Proc. 2018 Dec 5;2018:760-769. eCollection 2018. AMIA Annu Symp Proc. 2018. PMID: 30815118 Free PMC article.
-
Regulating the Secondary Use of Data for Research: Arguments Against Genetic Exceptionalism.Front Genet. 2019 Dec 20;10:1254. doi: 10.3389/fgene.2019.01254. eCollection 2019. Front Genet. 2019. PMID: 31956328 Free PMC article.
-
Information technology for clinical, translational and comparative effectiveness research. Findings from the section clinical research informatics.Yearb Med Inform. 2014 Aug 15;9(1):224-7. doi: 10.15265/IY-2014-0040. Yearb Med Inform. 2014. PMID: 25123747 Free PMC article. Review.
References
-
- Boaden R, Joyce P. Developing the electronic health record: what about patient safety? Health Serv Manage Res 2006;19:94–104 - PubMed
-
- Chaudhry B, Wang J, Wu S, et al. Systematic review: impact of health information technology on quality, efficiency, and costs of medical care. Ann Intern Med 2006;144:742–52 - PubMed
-
- Evans DC, Nichol WP, Perlin JB. Effect of the implementation of an enterprise-wide electronic health record on productivity in the Veterans Health Administration. Health Econ Policy Law 2006;1:163–9 - PubMed
-
- James B. E-health: steps on the road to interoperability. Health Aff (Millwood) 2005;Suppl Web Exclusives:W5–26–W5–30. - PubMed
-
- Soti P, Pandey S. Business process optimization for RHIOs. J Healthc Inf Manag 2007;21:40–7 - PubMed
Publication types
MeSH terms
Grants and funding
- U01 HG006385/HG/NHGRI NIH HHS/United States
- R01LM009018/LM/NLM NIH HHS/United States
- R01 LM009018/LM/NLM NIH HHS/United States
- T32GM07347/GM/NIGMS NIH HHS/United States
- 1U01HG006378/HG/NHGRI NIH HHS/United States
- R01 LM009989/LM/NLM NIH HHS/United States
- R01LM010828/LM/NLM NIH HHS/United States
- T32 GM007347/GM/NIGMS NIH HHS/United States
- U01 HG006378/HG/NHGRI NIH HHS/United States
- R01 LM010828/LM/NLM NIH HHS/United States
- R01 LM007995/LM/NLM NIH HHS/United States
- 1U01HG006385/HG/NHGRI NIH HHS/United States
- T15 LM007450/LM/NLM NIH HHS/United States
- R01LM007995/LM/NLM NIH HHS/United States
- T15LM007450/LM/NLM NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources