. 2013 Jan 1;20(1):95-101.

doi: 10.1136/amiajnl-2012-001026. Epub 2012 Jul 21.

Reducing patient re-identification risk for laboratory results within research datasets

Ravi V Atreya¹, Joshua C Smith, Allison B McCoy, Bradley Malin, Randolph A Miller

Affiliations

PMID: 22822040
PMCID: PMC3555327
DOI: 10.1136/amiajnl-2012-001026

Reducing patient re-identification risk for laboratory results within research datasets

Ravi V Atreya et al. J Am Med Inform Assoc. 2013.

. 2013 Jan 1;20(1):95-101.

doi: 10.1136/amiajnl-2012-001026. Epub 2012 Jul 21.

Authors

Ravi V Atreya¹, Joshua C Smith, Allison B McCoy, Bradley Malin, Randolph A Miller

Affiliation

¹ Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN 37232-8340, USA. ravi.v.atreya@vanderbilt.edu

PMID: 22822040
PMCID: PMC3555327
DOI: 10.1136/amiajnl-2012-001026

Abstract

Objective: To try to lower patient re-identification risks for biomedical research databases containing laboratory test results while also minimizing changes in clinical data interpretation.

Materials and methods: In our threat model, an attacker obtains 5-7 laboratory results from one patient and uses them as a search key to discover the corresponding record in a de-identified biomedical research database. To test our models, the existing Vanderbilt TIME database of 8.5 million Safe Harbor de-identified laboratory results from 61 280 patients was used. The uniqueness of unaltered laboratory results in the dataset was examined, and then two data perturbation models were applied-simple random offsets and an expert-derived clinical meaning-preserving model. A rank-based re-identification algorithm to mimic an attack was used. The re-identification risk and the retention of clinical meaning for each model's perturbed laboratory results were assessed.

Results: Differences in re-identification rates between the algorithms were small despite substantial divergence in altered clinical meaning. The expert algorithm maintained the clinical meaning of laboratory results better (affecting up to 4% of test results) than simple perturbation (affecting up to 26%).

Discussion and conclusion: With growing impetus for sharing clinical data for research, and in view of healthcare-related federal privacy regulation, methods to mitigate risks of re-identification are important. A practical, expert-derived perturbation algorithm that demonstrated potential utility was developed. Similar approaches might enable administrators to select data protection scheme parameters that meet their preferences in the trade-off between the protection of privacy and the retention of clinical meaning of shared data.

PubMed Disclaimer

Conflict of interest statement

Competing interests: None.

Figures

**Figure 1**
Illustration of the threat model in this study. The attacker leverages a known patient's laboratory panel as a search key to discover a corresponding record in a biomedical research database. EMR, Electronic medical record; HIPAA, Health Insurance Portability and Accountability Act.

**Figure 2**
Top-10 match rate as a function of perturbation level for the protection algorithms. CBC, complete blood count; CHEM7, blood test measuring electrolytes, glucose, and renal function.

**Figure 3**
Proportion of CBC and CHEM7 laboratory results where a perturbation algorithm changed result range bins. CBC, complete blood count; CHEM7, blood test measuring electrolytes, glucose, and renal function.

**Figure 4**
The proportion of consecutive laboratory pairs that retained their original monotonic trajectory after perturbation.

**Figure 5**
Disclosure risk-data utility map that compares the proportion of results that retain their clinical meaning after perturbation and the rate of correctly re-identifying a search key in a dataset. The points along the lines represent the perturbation rates of, from left to right, 20, 15, 10, 7, 5, and 2%. The rightmost point represents analysis of unaltered test panel results. CBC, complete blood count; CHEM7, blood test measuring electrolytes, glucose, and renal function.

See this image and copyright information in PMC

Cited by

Lost in Anonymization - A Data Anonymization Reference Classification Merging Legal and Technical Considerations.
Vokinger KN, Stekhoven DJ, Krauthammer M. Vokinger KN, et al. J Law Med Ethics. 2020 Mar;48(1):228-231. doi: 10.1177/1073110520917025. J Law Med Ethics. 2020. PMID: 32342783 Free PMC article. No abstract available.
Reidentification of Participants in Shared Clinical Data Sets: Experimental Study.
Wiepert D, Malin BA, Duffy JR, Utianski RL, Stricker JL, Jones DT, Botha H. Wiepert D, et al. JMIR AI. 2024 Mar 15;3:e52054. doi: 10.2196/52054. JMIR AI. 2024. PMID: 38875581 Free PMC article.
Detecting the Presence of an Individual in Phenotypic Summary Data.
Liu Y, Wan Z, Xia W, Kantarcioglu M, Vorobeychik Y, Clayton EW, Kho A, Carrell D, Malin BA. Liu Y, et al. AMIA Annu Symp Proc. 2018 Dec 5;2018:760-769. eCollection 2018. AMIA Annu Symp Proc. 2018. PMID: 30815118 Free PMC article.
Regulating the Secondary Use of Data for Research: Arguments Against Genetic Exceptionalism.
Martani A, Geneviève LD, Pauli-Magnus C, McLennan S, Elger BS. Martani A, et al. Front Genet. 2019 Dec 20;10:1254. doi: 10.3389/fgene.2019.01254. eCollection 2019. Front Genet. 2019. PMID: 31956328 Free PMC article.
Information technology for clinical, translational and comparative effectiveness research. Findings from the section clinical research informatics.
Daniel C, Choquet R. Daniel C, et al. Yearb Med Inform. 2014 Aug 15;9(1):224-7. doi: 10.15265/IY-2014-0040. Yearb Med Inform. 2014. PMID: 25123747 Free PMC article. Review.

See all "Cited by" articles

References

1. Boaden R, Joyce P. Developing the electronic health record: what about patient safety? Health Serv Manage Res 2006;19:94–104 - PubMed
1. Chaudhry B, Wang J, Wu S, et al. Systematic review: impact of health information technology on quality, efficiency, and costs of medical care. Ann Intern Med 2006;144:742–52 - PubMed
1. Evans DC, Nichol WP, Perlin JB. Effect of the implementation of an enterprise-wide electronic health record on productivity in the Veterans Health Administration. Health Econ Policy Law 2006;1:163–9 - PubMed
1. James B. E-health: steps on the road to interoperability. Health Aff (Millwood) 2005;Suppl Web Exclusives:W5–26–W5–30. - PubMed
1. Soti P, Pandey S. Business process optimization for RHIOs. J Healthc Inf Manag 2007;21:40–7 - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Reducing patient re-identification risk for laboratory results within research datasets

Affiliation

Reducing patient re-identification risk for laboratory results within research datasets

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources