A multi-institution evaluation of clinical profile anonymization
- PMID: 26567325
- PMCID: PMC4954623
- DOI: 10.1093/jamia/ocv154
A multi-institution evaluation of clinical profile anonymization
Abstract
Background and objective: There is an increasing desire to share de-identified electronic health records (EHRs) for secondary uses, but there are concerns that clinical terms can be exploited to compromise patient identities. Anonymization algorithms mitigate such threats while enabling novel discoveries, but their evaluation has been limited to single institutions. Here, we study how an existing clinical profile anonymization fares at multiple medical centers.
Methods: We apply a state-of-the-artk-anonymization algorithm, withkset to the standard value 5, to the International Classification of Disease, ninth edition codes for patients in a hypothyroidism association study at three medical centers: Marshfield Clinic, Northwestern University, and Vanderbilt University. We assess utility when anonymizing at three population levels: all patients in 1) the EHR system; 2) the biorepository; and 3) a hypothyroidism study. We evaluate utility using 1) changes to the number included in the dataset, 2) number of codes included, and 3) regions generalization and suppression were required.
Results: Our findings yield several notable results. First, we show that anonymizing in the context of the entire EHR yields a significantly greater quantity of data by reducing the amount of generalized regions from ∼15% to ∼0.5%. Second, ∼70% of codes that needed generalization only generalized two or three codes in the largest anonymization.
Conclusions: Sharing large volumes of clinical data in support of phenome-wide association studies is possible while safeguarding privacy to the underlying individuals.
Keywords: anonymization; clinical codes; generalization; privacy; secondary use.
© The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Figures


References
-
- Murdoch TB, Detsky AS. The inevitable application of big data to health care. JAMA. 2013;309:1351–1352. - PubMed
-
- Bates DW, Saria S, Ohno-Machado L, Shah A, Escobar G. Big data in health care: using analytics to identify and manage high-risk and high-cost patients. Health Aff. 2014;33:1123–1131. - PubMed
Publication types
MeSH terms
Grants and funding
- U01 HG006385/HG/NHGRI NIH HHS/United States
- R01GM105688/GM/NIGMS NIH HHS/United States
- UL1TR000135/TR/NCATS NIH HHS/United States
- R01LM010685/LM/NLM NIH HHS/United States
- 8UL1TR000150-05/TR/NCATS NIH HHS/United States
- UL1 TR001422/TR/NCATS NIH HHS/United States
- R01 LM009989/LM/NLM NIH HHS/United States
- U01HG006388/HG/NHGRI NIH HHS/United States
- R01 HG006844/HG/NHGRI NIH HHS/United States
- U01 HG006378/HG/NHGRI NIH HHS/United States
- R01 GM105688/GM/NIGMS NIH HHS/United States
- U01 HG008673/HG/NHGRI NIH HHS/United States
- R01HG006844/HG/NHGRI NIH HHS/United States
- U01HG006385/HG/NHGRI NIH HHS/United States
- U01HG006378/HG/NHGRI NIH HHS/United States
- U01HG006389/HG/NHGRI NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical