Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov 17;12(11):1108-1120.e4.
doi: 10.1016/j.cels.2021.07.010. Epub 2021 Aug 30.

Ultrafast homomorphic encryption models enable secure outsourcing of genotype imputation

Affiliations

Ultrafast homomorphic encryption models enable secure outsourcing of genotype imputation

Miran Kim et al. Cell Syst. .

Abstract

Genotype imputation is a fundamental step in genomic data analysis, where missing variant genotypes are predicted using the existing genotypes of nearby "tag" variants. Although researchers can outsource genotype imputation, privacy concerns may prohibit genetic data sharing with an untrusted imputation service. Here, we developed secure genotype imputation using efficient homomorphic encryption (HE) techniques. In HE-based methods, the genotype data are secure while it is in transit, at rest, and in analysis. It can only be decrypted by the owner. We compared secure imputation with three state-of-the-art non-secure methods and found that HE-based methods provide genetic data security with comparable accuracy for common variants. HE-based methods have time and memory requirements that are comparable or lower than those for the non-secure methods. Our results provide evidence that HE-based methods can practically perform resource-intensive computations for high-throughput genetic data analysis. The source code is freely available for download at https://github.com/K-miran/secure-imputation.

Keywords: genetic data encryption; genomic privacy; genotype imputation.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

Figure 1.
Figure 1.
Illustration of secure genotype imputation. a) Illustration of the genotype imputation scenario. The incomplete genotypes are measured by genotyping arrays with missing genotypes (represented by stars). Encryption generates random-looking string from the genotypes. At the server, encrypted genotypes are encoded, then they are used to compute the missing variant genotype probabilities. The encrypted probabilities are sent to the researcher, who decrypts the probabilities identifies the genotypes with highest probabilities (italic values). b) Building of the plaintext model for genotype imputation. The server uses a publicly available panel to build genotype estimation models for each variant. The models are stored in plaintext domain. The model in the current study is a linear model where each variant genotype is modeled using genotypes of variants within a k variant vicinity of the target variant. c) The plaintext models implemented under the secure frameworks.
Figure 2.
Figure 2.
Accuracy benchmarks. Accuracy for all genotypes (a) and the non-reference genotypes (b) are shown for each method (x-axis). Average accuracy value is shown at the top of each bar for comparison. Precision-recall curves are plotted for all genotypes (c) and the non-reference genotypes (d). Plaintext indicates the non-secure methods.
Figure 3.
Figure 3.
The population stratification of the accuracy is shown for EUR all genotypes (a) and non-ref genotypes (b), AMR all (c) and non-ref (d) genotypes, and AFR all (e), and non-ref genotypes (f). Precision-recall curve for rare variants (g). The box plots illustrate the super-population-specific minor allele frequency distribution (y-axis) for the common (top) and un-common variants (bottom) (h). ALL indicates the MAF distribution for all populations. The center and the two ends of the boxplots show the median and 25–75% values of the MAF distributions.
Figure 4.
Figure 4.
Memory and time requirements of the secure methods. Each method is divided into 4 steps, (1) key generation, (2) Encryption, (3) Evaluation, (4) Decryption. The bar plots show the time requirements (a) using 20K, 40K, and 80K target variant sets. The aggregated time (b) and the maximum memory usage of the methods are also shown (c).
Figure 5.
Figure 5.
Illustration of a secure outsourced imputation service (a). The time (b) and memory requirements (c) are illustrated in the bar plots where colors indicate security context. The y-axis shows the time (in seconds) and main memory (in gigabytes) used by each method to perform the imputation of the 80K variants where the secure outsourced method includes the plaintext model training and secure model evaluation steps.

Comment in

References

    1. Agarwala V, Flannick J, Sunyaev S, Altshuler D, Consortium G. et al. (2013), ‘Evaluating empirical bounds on complex disease genetic architecture’, Nature genetics 45(12), 1418. - PMC - PubMed
    1. Albrecht M, Chase M, Chen H, Ding J, Goldwasser S, Gorbunov S, Halevi S, Hoffstein J, Laine K, Lauter K, Lokam S, Micciancio D, Moody D, Morrison T, Sahai A. & Vaikuntanathan V. (2018), Homomorphic encryption security standard, Technical report, HomomorphicEncryption.org, Toronto, Canada.
    1. Albrecht MR, Player R. & Scott S. (2015), ‘On the concrete hardness of learning with errors’, J. Mathematical Cryptology 9(3), 169–203. URL: http://www.degruyter.com/view/j/jmc.2015.9.issue-3/jmc-2015-0016/jmc-201...
    1. Allen HL, Estrada K, Lettre G, Berndt SI, Weedon MN, Rivadeneira F, Willer CJ, Jackson AU, Vedantam S, Raychaudhuri S. et al. (2010), ‘Hundreds of variants clustered in genomic loci and biological pathways affect human height’, Nature 467(7317), 832–838. - PMC - PubMed
    1. Belmont JW, Hardenbol P, Willis TD, Yu F, Yang H, Ch’Ang LY, Huang W, Liu B, Shen Y, Tam PKH, Tsui LC, Waye MMY, Wong JTF, Zeng C, Zhang Q, Chee MS, Galver LM, Kruglyak S, Murray SS, Oliphant AR, Montpetit A, Chagnon F, Ferretti V, Leboeuf M, Phillips MS, Verner A, Duan S, Lind DL, Miller RD, Rice J, Saccone NL, Taillon-Miller P, Xiao M, Sekine A, Sorimachi K, Tanaka Y, Tsunoda T, Yoshino E, Bentley DR, Hunt S, Powell D, Zhang H, Matsuda I, Fukushima Y, Macer DR, Suda E, Rotimi C, Adebamowo CA, Aniagwu T, Marshall PA, Matthew O, Nkwodimmah C, Royal CD, Leppert MF, Dixon M, Cunningham F, Kanani A, Thorisson GA, Chen PE, Cutler DJ, Kashuk CS, Donnelly P, Marchini J, McVean GA, Myers SR, Cardon LR, Morris A, Weir BS, Mullikin JC, Feolo M, Daly MJ, Qiu R, Kent A, Dunston GM, Kato K, Niikawa N, Watkin J, Gibbs RA, Sodergren E, Weinstock GM, Wilson RK, Fulton LL, Rogers J, Birren BW, Han H, Wang H, Godbout M, Wallenburg JC, L’Archevêque P, Bellemare G, Todani K, Fujita T, Tanaka S, Holden AL, Collins FS, Brooks LD, McEwen JE, Guyer MS, Jordan E, Peterson JL, Spiegel J, Sung LM, Zacharia LF, Kennedy K, Dunn MG, Seabrook R, Shillito M, Skene B, Stewart JG, Valle DL, Clayton EW, Jorde LB, Chakravarti A, Cho MK, Duster T, Foster MW, Jasperse M, Knoppers BM, Kwok PY, Licinio J, Long JC, Ossorio P, Wang VO, Rotimi CN, Spallone P, Terry SF, Lander ES, Lai EH, Nickerson DA, Abecasis GR, Altshuler D, Boehnke M, Deloukas P, Douglas JA, Gabriel SB, Hudson RR, Hudson TJ, Kruglyak L, Nakamura Y, Nussbaum RL, Schaffner SF, Sherry ST, Stein LD & Tanaka T. (2003), ‘The international hapmap project’, Nature 426(6968), 789–796. - PubMed

Publication types