Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Apr 27;107(17):7898-903.
doi: 10.1073/pnas.0911686107. Epub 2010 Apr 12.

Anonymization of electronic medical records for validating genome-wide association studies

Affiliations

Anonymization of electronic medical records for validating genome-wide association studies

Grigorios Loukides et al. Proc Natl Acad Sci U S A. .

Abstract

Genome-wide association studies (GWAS) facilitate the discovery of genotype-phenotype relations from population-based sequence databases, which is an integral facet of personalized medicine. The increasing adoption of electronic medical records allows large amounts of patients' standardized clinical features to be combined with the genomic sequences of these patients and shared to support validation of GWAS findings and to enable novel discoveries. However, disseminating these data "as is" may lead to patient reidentification when genomic sequences are linked to resources that contain the corresponding patients' identity information based on standardized clinical features. This work proposes an approach that provably prevents this type of data linkage and furnishes a result that helps support GWAS. Our approach automatically extracts potentially linkable clinical features and modifies them in a way that they can no longer be used to link a genomic sequence to a small number of patients, while preserving the associations between genomic sequences and specific sets of clinical features corresponding to GWAS-related diseases. Extensive experiments with real patient data derived from the Vanderbilt's University Medical Center verify that our approach generates data that eliminate the threat of individual reidentification, while supporting GWAS validation and clinical case analysis tasks.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Biomedical datasets (fictional) and policies used by the proposed anonymization approach. (A) Research data, (B) identified EMR data, (C) utility policy, (D) privacy policy, and (E) a 5-anonymization for the research data.
Fig. 2.
Fig. 2.
Reidentification risk (shown as a cumulative distribution function).
Fig. 3.
Fig. 3.
Utility constraint satisfaction at various levels of protection for (A) VNEC and (B) VNECKC.
Fig. 4.
Fig. 4.
Relative error in query answering for the single-visit case and for (A) VNEC and (B) VNECKC. Points correspond to the mean RE, and error bars are of 1 SD.

References

    1. Donnelly P. Progress and challenges in genome-wide association studies in humans. Nature. 2008;456:728–731. - PubMed
    1. Gurwitz D, Lunshof JE, Altman RB. A call for the creation of personalized medicine databases. Nat Rev Drug Discov. 2006;5:23–26. - PubMed
    1. Mailman MD, et al. The NCBI dbGaP database of genotypes and phenotypes. Nat Genet. 2007;39:1181–1186. - PMC - PubMed
    1. National Institutes of Health Policy for sharing of data obtained in NIH supported or conducted genome-wide association studies. 2007 NOT-OD-07-088. Available at: http://grants.nih.gov/grants/guide/notice-files/NOT-OD-07-088.html. Accessed February 16, 2010.
    1. US Department of Health and Human Services Standards for privacy of individually identifiable health information; final rule. Federal Register 45 (2002) 2002 parts 160 and 164. - PubMed

Publication types

MeSH terms