Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Apr 12;38(8):2202-2210.
doi: 10.1093/bioinformatics/btac070.

Efficient privacy-preserving whole-genome variant queries

Affiliations

Efficient privacy-preserving whole-genome variant queries

Mete Akgün et al. Bioinformatics. .

Abstract

Motivation: Diagnosis and treatment decisions on genomic data have become widespread as the cost of genome sequencing decreases gradually. In this context, disease-gene association studies are of great importance. However, genomic data are very sensitive when compared to other data types and contains information about individuals and their relatives. Many studies have shown that this information can be obtained from the query-response pairs on genomic databases. In this work, we propose a method that uses secure multi-party computation to query genomic databases in a privacy-protected manner. The proposed solution privately outsources genomic data from arbitrarily many sources to the two non-colluding proxies and allows genomic databases to be safely stored in semi-honest cloud environments. It provides data privacy, query privacy and output privacy by using XOR-based sharing and unlike previous solutions, it allows queries to run efficiently on hundreds of thousands of genomic data.

Results: We measure the performance of our solution with parameters similar to real-world applications. It is possible to query a genomic database with 3 000 000 variants with five genomic query predicates under 400 ms. Querying 1 048 576 genomes, each containing 1 000 000 variants, for the presence of five different query variants can be achieved approximately in 6 min with a small amount of dedicated hardware and connectivity. These execution times are in the right range to enable real-world applications in medical research and healthcare. Unlike previous studies, it is possible to query multiple databases with response times fast enough for practical application. To the best of our knowledge, this is the first solution that provides this performance for querying large-scale genomic data.

Availability and implementation: https://gitlab.com/DIFUTURE/privacy-preserving-variant-queries.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
General system architecture of our solution. Genomic variant stores H1HN communicate with the two non-colluding proxy servers D1 and D2. Users can query all data through these proxy servers in a secure manner
Fig. 2.
Fig. 2.
Encoding of a single variant
Fig. 3.
Fig. 3.
Generation of the variant tree
Fig. 4.
Fig. 4.
Comparison of time performance of our solution and Demmler et al.’s solution (Demmler et al., 2017) under various numbers of variants/numbers of query variants. (a) Runtime with a single patient, a varying number of variants, a fixed variant length = 48 bit, and 5 query variants, (b) runtime with a single patient, 100 000 variants, a fixed variant length = 48 bit and a varying number of query variants
Fig. 5.
Fig. 5.
Time performance of our solution under various numbers of masks/numbers of patients. (a) Runtime with a varying number of masks, 1 000 000 variants, a fixed variant length = 48 bit, and 5 query variants, (b) Runtime with a varying number of patients, 1 000 000 variants, a fixed variant length = 64 bit and 5 query variants

References

    1. Akgün M. et al. (2015) Privacy preserving processing of genomic data: a survey. J. Biomed. Inf., 56, 103–111. - PubMed
    1. Amendola L.M. et al.; CSER Consortium. (2018) The clinical sequencing evidence-generating research consortium: integrating genomic sequencing in diverse and medically underserved populations. Am. J. Hum. Genet., 103, 319–327. - PMC - PubMed
    1. Asharov G. et al. (2013) More efficient oblivious transfer and extensions for faster secure computation. In: Sadeghi,A. et al. (eds.) 2013 ACM SIGSAC Conference on Computer and Communications Security, CCS’13, Berlin, Germany, November 4–8, 2013. ACM, Berlin, Germany, pp. 535–548.
    1. Asharov G. et al. (2018) Privacy-preserving search of similar patients in genomic data. PoPETs, 2018, 104–124.
    1. Aziz M.M.A. et al. (2017) Privacy-preserving techniques of genomic data—a survey. Brief Bioinform., 20, 887–895. - PMC - PubMed

Publication types