Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Multicenter Study
. 2016 Apr;23(e1):e131-7.
doi: 10.1093/jamia/ocv154. Epub 2015 Nov 13.

A multi-institution evaluation of clinical profile anonymization

Affiliations
Multicenter Study

A multi-institution evaluation of clinical profile anonymization

Raymond Heatherly et al. J Am Med Inform Assoc. 2016 Apr.

Abstract

Background and objective: There is an increasing desire to share de-identified electronic health records (EHRs) for secondary uses, but there are concerns that clinical terms can be exploited to compromise patient identities. Anonymization algorithms mitigate such threats while enabling novel discoveries, but their evaluation has been limited to single institutions. Here, we study how an existing clinical profile anonymization fares at multiple medical centers.

Methods: We apply a state-of-the-artk-anonymization algorithm, withkset to the standard value 5, to the International Classification of Disease, ninth edition codes for patients in a hypothyroidism association study at three medical centers: Marshfield Clinic, Northwestern University, and Vanderbilt University. We assess utility when anonymizing at three population levels: all patients in 1) the EHR system; 2) the biorepository; and 3) a hypothyroidism study. We evaluate utility using 1) changes to the number included in the dataset, 2) number of codes included, and 3) regions generalization and suppression were required.

Results: Our findings yield several notable results. First, we show that anonymizing in the context of the entire EHR yields a significantly greater quantity of data by reducing the amount of generalized regions from ∼15% to ∼0.5%. Second, ∼70% of codes that needed generalization only generalized two or three codes in the largest anonymization.

Conclusions: Sharing large volumes of clinical data in support of phenome-wide association studies is possible while safeguarding privacy to the underlying individuals.

Keywords: anonymization; clinical codes; generalization; privacy; secondary use.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Distribution of generalizations required for each a) Marshfield, b) Northwestern, and c) Vanderbilt dataset. Note the y-axis is depicted in a log scale.
Figure 2:
Figure 2:
Generalization in the datasets by PheWAS code. Each row corresponds to one of the three sites in the study. Notice that anonymizing at the EHR-level leads to fewer merged codes than anonymizing at the study-level.

References

    1. Murdoch TB, Detsky AS. The inevitable application of big data to health care. JAMA. 2013;309:1351–1352. - PubMed
    1. Bates DW, Saria S, Ohno-Machado L, Shah A, Escobar G. Big data in health care: using analytics to identify and manage high-risk and high-cost patients. Health Aff. 2014;33:1123–1131. - PubMed
    1. Richesson RL, Hammond RE, Nahm M, et al. Electronic health records based phenotyping in next-generation clinical trials: a perspective from the NIH Health Care Systems Collaboratory. JAMIA. 2013;e2:e226–e231. - PMC - PubMed
    1. Pathak J, Kho AN, Denny JC. Electronic health records-driven phenotyping: challenges, recent advances, and perspectives. J Am Med Inform. 2013;e2:e206–e211. - PMC - PubMed
    1. Collins FS, Hudson KL, Briggs JP, Lauer MS. PCORnet: turning a dream into reality. JAMIA. 2014;21(4):576–577. - PMC - PubMed

Publication types