Anonymization of electronic medical records for validating genome-wide association studies
- PMID: 20385806
- PMCID: PMC2867915
- DOI: 10.1073/pnas.0911686107
Anonymization of electronic medical records for validating genome-wide association studies
Abstract
Genome-wide association studies (GWAS) facilitate the discovery of genotype-phenotype relations from population-based sequence databases, which is an integral facet of personalized medicine. The increasing adoption of electronic medical records allows large amounts of patients' standardized clinical features to be combined with the genomic sequences of these patients and shared to support validation of GWAS findings and to enable novel discoveries. However, disseminating these data "as is" may lead to patient reidentification when genomic sequences are linked to resources that contain the corresponding patients' identity information based on standardized clinical features. This work proposes an approach that provably prevents this type of data linkage and furnishes a result that helps support GWAS. Our approach automatically extracts potentially linkable clinical features and modifies them in a way that they can no longer be used to link a genomic sequence to a small number of patients, while preserving the associations between genomic sequences and specific sets of clinical features corresponding to GWAS-related diseases. Extensive experiments with real patient data derived from the Vanderbilt's University Medical Center verify that our approach generates data that eliminate the threat of individual reidentification, while supporting GWAS validation and clinical case analysis tasks.
Conflict of interest statement
The authors declare no conflict of interest.
Figures
References
-
- Donnelly P. Progress and challenges in genome-wide association studies in humans. Nature. 2008;456:728–731. - PubMed
-
- Gurwitz D, Lunshof JE, Altman RB. A call for the creation of personalized medicine databases. Nat Rev Drug Discov. 2006;5:23–26. - PubMed
-
- National Institutes of Health Policy for sharing of data obtained in NIH supported or conducted genome-wide association studies. 2007 NOT-OD-07-088. Available at: http://grants.nih.gov/grants/guide/notice-files/NOT-OD-07-088.html. Accessed February 16, 2010.
-
- US Department of Health and Human Services Standards for privacy of individually identifiable health information; final rule. Federal Register 45 (2002) 2002 parts 160 and 164. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
