Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jan 1;20(1):95-101.
doi: 10.1136/amiajnl-2012-001026. Epub 2012 Jul 21.

Reducing patient re-identification risk for laboratory results within research datasets

Affiliations

Reducing patient re-identification risk for laboratory results within research datasets

Ravi V Atreya et al. J Am Med Inform Assoc. .

Abstract

Objective: To try to lower patient re-identification risks for biomedical research databases containing laboratory test results while also minimizing changes in clinical data interpretation.

Materials and methods: In our threat model, an attacker obtains 5-7 laboratory results from one patient and uses them as a search key to discover the corresponding record in a de-identified biomedical research database. To test our models, the existing Vanderbilt TIME database of 8.5 million Safe Harbor de-identified laboratory results from 61 280 patients was used. The uniqueness of unaltered laboratory results in the dataset was examined, and then two data perturbation models were applied-simple random offsets and an expert-derived clinical meaning-preserving model. A rank-based re-identification algorithm to mimic an attack was used. The re-identification risk and the retention of clinical meaning for each model's perturbed laboratory results were assessed.

Results: Differences in re-identification rates between the algorithms were small despite substantial divergence in altered clinical meaning. The expert algorithm maintained the clinical meaning of laboratory results better (affecting up to 4% of test results) than simple perturbation (affecting up to 26%).

Discussion and conclusion: With growing impetus for sharing clinical data for research, and in view of healthcare-related federal privacy regulation, methods to mitigate risks of re-identification are important. A practical, expert-derived perturbation algorithm that demonstrated potential utility was developed. Similar approaches might enable administrators to select data protection scheme parameters that meet their preferences in the trade-off between the protection of privacy and the retention of clinical meaning of shared data.

PubMed Disclaimer

Conflict of interest statement

Competing interests: None.

Figures

Figure 1
Figure 1
Illustration of the threat model in this study. The attacker leverages a known patient's laboratory panel as a search key to discover a corresponding record in a biomedical research database. EMR, Electronic medical record; HIPAA, Health Insurance Portability and Accountability Act.
Figure 2
Figure 2
Top-10 match rate as a function of perturbation level for the protection algorithms. CBC, complete blood count; CHEM7, blood test measuring electrolytes, glucose, and renal function.
Figure 3
Figure 3
Proportion of CBC and CHEM7 laboratory results where a perturbation algorithm changed result range bins. CBC, complete blood count; CHEM7, blood test measuring electrolytes, glucose, and renal function.
Figure 4
Figure 4
The proportion of consecutive laboratory pairs that retained their original monotonic trajectory after perturbation.
Figure 5
Figure 5
Disclosure risk-data utility map that compares the proportion of results that retain their clinical meaning after perturbation and the rate of correctly re-identifying a search key in a dataset. The points along the lines represent the perturbation rates of, from left to right, 20, 15, 10, 7, 5, and 2%. The rightmost point represents analysis of unaltered test panel results. CBC, complete blood count; CHEM7, blood test measuring electrolytes, glucose, and renal function.

Similar articles

Cited by

References

    1. Boaden R, Joyce P. Developing the electronic health record: what about patient safety? Health Serv Manage Res 2006;19:94–104 - PubMed
    1. Chaudhry B, Wang J, Wu S, et al. Systematic review: impact of health information technology on quality, efficiency, and costs of medical care. Ann Intern Med 2006;144:742–52 - PubMed
    1. Evans DC, Nichol WP, Perlin JB. Effect of the implementation of an enterprise-wide electronic health record on productivity in the Veterans Health Administration. Health Econ Policy Law 2006;1:163–9 - PubMed
    1. James B. E-health: steps on the road to interoperability. Health Aff (Millwood) 2005;Suppl Web Exclusives:W5–26–W5–30. - PubMed
    1. Soti P, Pandey S. Business process optimization for RHIOs. J Healthc Inf Manag 2007;21:40–7 - PubMed

Publication types