Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011;6(12):e28071.
doi: 10.1371/journal.pone.0028071. Epub 2011 Dec 2.

A systematic review of re-identification attacks on health data

Affiliations

A systematic review of re-identification attacks on health data

Khaled El Emam et al. PLoS One. 2011.

Erratum in

Abstract

Background: Privacy legislation in most jurisdictions allows the disclosure of health data for secondary purposes without patient consent if it is de-identified. Some recent articles in the medical, legal, and computer science literature have argued that de-identification methods do not provide sufficient protection because they are easy to reverse. Should this be the case, it would have significant and important implications on how health information is disclosed, including: (a) potentially limiting its availability for secondary purposes such as research, and (b) resulting in more identifiable health information being disclosed. Our objectives in this systematic review were to: (a) characterize known re-identification attacks on health data and contrast that to re-identification attacks on other kinds of data, (b) compute the overall proportion of records that have been correctly re-identified in these attacks, and (c) assess whether these demonstrate weaknesses in current de-identification methods.

Methods and findings: Searches were conducted in IEEE Xplore, ACM Digital Library, and PubMed. After screening, fourteen eligible articles representing distinct attacks were identified. On average, approximately a quarter of the records were re-identified across all studies (0.26 with 95% CI 0.046-0.478) and 0.34 for attacks on health data (95% CI 0-0.744). There was considerable uncertainty around the proportions as evidenced by the wide confidence intervals, and the mean proportion of records re-identified was sensitive to unpublished studies. Two of fourteen attacks were performed with data that was de-identified using existing standards. Only one of these attacks was on health data, which resulted in a success rate of 0.00013.

Conclusions: The current evidence shows a high re-identification rate but is dominated by small-scale studies on data that was not de-identified according to existing standards. This evidence is insufficient to draw conclusions about the efficacy of de-identification methods.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have read the journal's policy and have the following conflicts: all co-authors perform consulting to federal and provincial governments and commercial entities in the US and Canada on de-identification. KEE and BM sit on federal and provincial government advisory committees related to health information privacy. This does not alter the authors' adherence to all the PLoS ONE policies on sharing data and materials.

Figures

Figure 1
Figure 1. PRISMA diagram.
PRISMA diagram summarizing the steps involved in the systematic review of the re-identification attack literature.
Figure 2
Figure 2. Caterpillar plot (all studies).
Caterpillar plot of the individual mean and confidence intervals for all studies with overall mean proportion.
Figure 3
Figure 3. Caterpillar plot (health studies).
Caterpillar plot of the individual mean and confidence intervals for health studies with overall mean proportion.
Figure 4
Figure 4. Senstivitiy (all studies).
The number of new studies with success rates below/above the current mean that would need to be performed to significantly change the current mean for all studies.
Figure 5
Figure 5. Sensitivity (health studies).
The number of new studies with success rates below/above the current mean that would need to be performed to significantly change the current mean for health studies.
Figure 6
Figure 6. Funnel plot (all studies).
Funnel plot showing the proportion of records re-identified in all studies against standard error. The points were slightly jittered to reveal overlap.
Figure 7
Figure 7. Funnel plot (health studies).
Funnel plot showing the proportion of records re-identified in health studies against standard error. The points were slightly jittered to reveal overlap.

References

    1. Fung BCM, Wang K, Chen R, Yu PS. Privacy-Preserving Data Publishing: A Survey of Recent Developments. ACM Computing Surveys. 2010;42
    1. Perun H, Orr M, Dimitriadis F. 2005. Guide to the Ontario Personal Health Information Protection Act: Irwin Law.
    1. U.S. Department of Health and Human Services. Standards for privacy of individually identifiable health information, final rule, 45 CFR, pt 160–164. 2002. U.S. Department of Health and Human Services. - PubMed
    1. Panel on Research Ethics. 2010. Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans (2nd ed)
    1. Willison DJ, Emerson C, Szala-Meneok KV, Gibson E, Schwartz L, et al. Access to medical records for research purposes: varying perceptions across research ethics boards. J Med Ethics. 2008;34:308–314. - PubMed

Publication types