A cryptographic approach to securely share and query genomic sequences
- PMID: 18779075
- DOI: 10.1109/TITB.2007.908465
A cryptographic approach to securely share and query genomic sequences
Abstract
To support large-scale biomedical research projects, organizations need to share person-specific genomic sequences without violating the privacy of their data subjects. In the past, organizations protected subjects' identities by removing identifiers, such as name and social security number; however, recent investigations illustrate that deidentified genomic data can be "reidentified" to named individuals using simple automated methods. In this paper, we present a novel cryptographic framework that enables organizations to support genomic data mining without disclosing the raw genomic sequences. Organizations contribute encrypted genomic sequence records into a centralized repository, where the administrator can perform queries, such as frequency counts, without decrypting the data. We evaluate the efficiency of our framework with existing databases of single nucleotide polymorphism (SNP) sequences and demonstrate that the time needed to complete count queries is feasible for real world applications. For example, our experiments indicate that a count query over 40 SNPs in a database of 5000 records can be completed in approximately 30 min with off-the-shelf technology. We further show that approximation strategies can be applied to significantly speed up query execution times with minimal loss in accuracy. The framework can be implemented on top of existing information and network technologies in biomedical environments.
Similar articles
-
MSQT for choosing SNP assays from multiple DNA alignments.Bioinformatics. 2007 Oct 15;23(20):2784-7. doi: 10.1093/bioinformatics/btm428. Epub 2007 Sep 4. Bioinformatics. 2007. PMID: 17785349
-
Nested Containment List (NCList): a new algorithm for accelerating interval query of genome alignment and interval databases.Bioinformatics. 2007 Jun 1;23(11):1386-93. doi: 10.1093/bioinformatics/btl647. Epub 2007 Jan 18. Bioinformatics. 2007. PMID: 17234640
-
A computational model to protect patient data from location-based re-identification.Artif Intell Med. 2007 Jul;40(3):223-39. doi: 10.1016/j.artmed.2007.04.002. Epub 2007 Jun 1. Artif Intell Med. 2007. PMID: 17544262
-
UCSC genome browser: deep support for molecular biomedical research.Biotechnol Annu Rev. 2008;14:63-108. doi: 10.1016/S1387-2656(08)00003-3. Biotechnol Annu Rev. 2008. PMID: 18606360 Review.
-
Securing electronic health records without impeding the flow of information.Int J Med Inform. 2007 May-Jun;76(5-6):471-9. doi: 10.1016/j.ijmedinf.2006.09.015. Epub 2007 Jan 3. Int J Med Inform. 2007. PMID: 17204451 Review.
Cited by
-
Reporting actionable research results: shared secrets can save lives.Sci Transl Med. 2012 Jul 18;4(143):143cm8. doi: 10.1126/scitranslmed.3003958. Sci Transl Med. 2012. PMID: 22814848 Free PMC article.
-
A Sequence Obfuscation Method for Protecting Personal Genomic Privacy.Front Genet. 2022 Apr 13;13:876686. doi: 10.3389/fgene.2022.876686. eCollection 2022. Front Genet. 2022. PMID: 35495121 Free PMC article.
-
The disclosure of diagnosis codes can breach research participants' privacy.J Am Med Inform Assoc. 2010 May-Jun;17(3):322-7. doi: 10.1136/jamia.2009.002725. J Am Med Inform Assoc. 2010. PMID: 20442151 Free PMC article.
-
A secure SNP panel scheme using homomorphically encrypted K-mers without SNP calling on the user side.BMC Genomics. 2019 Apr 4;20(Suppl 2):188. doi: 10.1186/s12864-019-5473-z. BMC Genomics. 2019. PMID: 30967116 Free PMC article.
-
Protecting genomic data analytics in the cloud: state of the art and opportunities.BMC Med Genomics. 2016 Oct 13;9(1):63. doi: 10.1186/s12920-016-0224-3. BMC Med Genomics. 2016. PMID: 27733153 Free PMC article.
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous