. 2021 Nov 17;12(11):1108-1120.e4.

doi: 10.1016/j.cels.2021.07.010. Epub 2021 Aug 30.

Ultrafast homomorphic encryption models enable secure outsourcing of genotype imputation

Miran Kim¹, Arif Ozgun Harmanci², Jean-Philippe Bossuat³, Sergiu Carpov⁴, Jung Hee Cheon⁵, Ilaria Chillotti⁶, Wonhee Cho⁷, David Froelicher³, Nicolas Gama⁸, Mariya Georgieva⁸, Seungwan Hong⁷, Jean-Pierre Hubaux³, Duhyeong Kim⁷, Kristin Lauter⁹, Yiping Ma¹⁰, Lucila Ohno-Machado¹¹, Heidi Sofia¹², Yongha Son¹³, Yongsoo Song¹⁴, Juan Troncoso-Pastoriza³, Xiaoqian Jiang¹⁵

Affiliations

¹ Department of Computer Science and Engineering and Graduate School of Artificial Intelligence, Ulsan National Institute of Science and Technology, Ulsan 44919, Republic of Korea.
² Center for Precision Health, School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX 77030, USA. Electronic address: arif.o.harmanci@uth.tmc.edu.
³ École polytechnique fédérale de Lausanne, Lausanne, Switzerland.
⁴ Inpher, EPFL Innovation Park Bàtiment A, 3rd Fl, 1015 Lausanne, Switzerland; CEA, LIST, 91191 Gif-sur-Yvette Cedex, France.
⁵ Department of Mathematical Sciences, Seoul National University, Seoul 08826, Republic of Korea; Crypto Lab Inc., Seoul 08826, Republic of Korea.
⁶ Zama, Paris, France and imec-COSIC, KU Leuven, Leuven, Belgium.
⁷ Department of Mathematical Sciences, Seoul National University, Seoul 08826, Republic of Korea.
⁸ Inpher, EPFL Innovation Park Bàtiment A, 3rd Fl, 1015 Lausanne, Switzerland.
⁹ West Coast Head of Research Science, Facebook AI Research (FAIR), Seattle, Washington.
¹⁰ University of Pennsylvania, Philadelphia, PA 19104, USA.
¹¹ UCSD Health Department of Biomedical Informatics, University of California, San Diego, San Diego, CA 92093, USA.
¹² National Institutes of Health (NIH) - National Human Genome Research Institute, Bethesda, MD 20892, USA.
¹³ Samsung SDS, Seoul, Republic of Korea.
¹⁴ Department of Computer Science and Engineering, Seoul National University, Seoul 08826, Republic of Korea.
¹⁵ Center for Secure Artificial intelligence For hEalthcare (SAFE), School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX 77030, USA. Electronic address: xiaoqian.jiang@uth.tmc.edu.

PMID: 34464590
PMCID: PMC9898842
DOI: 10.1016/j.cels.2021.07.010

Ultrafast homomorphic encryption models enable secure outsourcing of genotype imputation

Miran Kim et al. Cell Syst. 2021.

. 2021 Nov 17;12(11):1108-1120.e4.

doi: 10.1016/j.cels.2021.07.010. Epub 2021 Aug 30.

Authors

Affiliations

¹ Department of Computer Science and Engineering and Graduate School of Artificial Intelligence, Ulsan National Institute of Science and Technology, Ulsan 44919, Republic of Korea.
² Center for Precision Health, School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX 77030, USA. Electronic address: arif.o.harmanci@uth.tmc.edu.
³ École polytechnique fédérale de Lausanne, Lausanne, Switzerland.
⁴ Inpher, EPFL Innovation Park Bàtiment A, 3rd Fl, 1015 Lausanne, Switzerland; CEA, LIST, 91191 Gif-sur-Yvette Cedex, France.
⁵ Department of Mathematical Sciences, Seoul National University, Seoul 08826, Republic of Korea; Crypto Lab Inc., Seoul 08826, Republic of Korea.
⁶ Zama, Paris, France and imec-COSIC, KU Leuven, Leuven, Belgium.
⁷ Department of Mathematical Sciences, Seoul National University, Seoul 08826, Republic of Korea.
⁸ Inpher, EPFL Innovation Park Bàtiment A, 3rd Fl, 1015 Lausanne, Switzerland.
⁹ West Coast Head of Research Science, Facebook AI Research (FAIR), Seattle, Washington.
¹⁰ University of Pennsylvania, Philadelphia, PA 19104, USA.
¹¹ UCSD Health Department of Biomedical Informatics, University of California, San Diego, San Diego, CA 92093, USA.
¹² National Institutes of Health (NIH) - National Human Genome Research Institute, Bethesda, MD 20892, USA.
¹³ Samsung SDS, Seoul, Republic of Korea.
¹⁴ Department of Computer Science and Engineering, Seoul National University, Seoul 08826, Republic of Korea.
¹⁵ Center for Secure Artificial intelligence For hEalthcare (SAFE), School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX 77030, USA. Electronic address: xiaoqian.jiang@uth.tmc.edu.

PMID: 34464590
PMCID: PMC9898842
DOI: 10.1016/j.cels.2021.07.010

Abstract

Genotype imputation is a fundamental step in genomic data analysis, where missing variant genotypes are predicted using the existing genotypes of nearby "tag" variants. Although researchers can outsource genotype imputation, privacy concerns may prohibit genetic data sharing with an untrusted imputation service. Here, we developed secure genotype imputation using efficient homomorphic encryption (HE) techniques. In HE-based methods, the genotype data are secure while it is in transit, at rest, and in analysis. It can only be decrypted by the owner. We compared secure imputation with three state-of-the-art non-secure methods and found that HE-based methods provide genetic data security with comparable accuracy for common variants. HE-based methods have time and memory requirements that are comparable or lower than those for the non-secure methods. Our results provide evidence that HE-based methods can practically perform resource-intensive computations for high-throughput genetic data analysis. The source code is freely available for download at https://github.com/K-miran/secure-imputation.

Keywords: genetic data encryption; genomic privacy; genotype imputation.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

**Figure 1.**
Illustration of secure genotype imputation. a) Illustration of the genotype imputation scenario. The incomplete genotypes are measured by genotyping arrays with missing genotypes (represented by stars). Encryption generates random-looking string from the genotypes. At the server, encrypted genotypes are encoded, then they are used to compute the missing variant genotype probabilities. The encrypted probabilities are sent to the researcher, who decrypts the probabilities identifies the genotypes with highest probabilities (italic values). b) Building of the plaintext model for genotype imputation. The server uses a publicly available panel to build genotype estimation models for each variant. The models are stored in plaintext domain. The model in the current study is a linear model where each variant genotype is modeled using genotypes of variants within a k variant vicinity of the target variant. c) The plaintext models implemented under the secure frameworks.

**Figure 2.**
Accuracy benchmarks. Accuracy for all genotypes (a) and the non-reference genotypes (b) are shown for each method (x-axis). Average accuracy value is shown at the top of each bar for comparison. Precision-recall curves are plotted for all genotypes (c) and the non-reference genotypes (d). Plaintext indicates the non-secure methods.

**Figure 3.**
The population stratification of the accuracy is shown for EUR all genotypes (a) and non-ref genotypes (b), AMR all (c) and non-ref (d) genotypes, and AFR all (e), and non-ref genotypes (f). Precision-recall curve for rare variants (g). The box plots illustrate the super-population-specific minor allele frequency distribution (y-axis) for the common (top) and un-common variants (bottom) (h). ALL indicates the MAF distribution for all populations. The center and the two ends of the boxplots show the median and 25–75% values of the MAF distributions.

**Figure 4.**
Memory and time requirements of the secure methods. Each method is divided into 4 steps, (1) key generation, (2) Encryption, (3) Evaluation, (4) Decryption. The bar plots show the time requirements (a) using 20K, 40K, and 80K target variant sets. The aggregated time (b) and the maximum memory usage of the methods are also shown (c).

**Figure 5.**
Illustration of a secure outsourced imputation service (a). The time (b) and memory requirements (c) are illustrated in the bar plots where colors indicate security context. The y-axis shows the time (in seconds) and main memory (in gigabytes) used by each method to perform the imputation of the 80K variants where the secure outsourced method includes the plaintext model training and secure model evaluation steps.

See this image and copyright information in PMC

Comment in

Paving the path toward genomic privacy with secure imputation.
Sherman MA. Sherman MA. Cell Syst. 2021 Oct 20;12(10):950-952. doi: 10.1016/j.cels.2021.09.006. Cell Syst. 2021. PMID: 34672957

References

1. Agarwala V, Flannick J, Sunyaev S, Altshuler D, Consortium G. et al. (2013), ‘Evaluating empirical bounds on complex disease genetic architecture’, Nature genetics 45(12), 1418. - PMC - PubMed
1. Albrecht M, Chase M, Chen H, Ding J, Goldwasser S, Gorbunov S, Halevi S, Hoffstein J, Laine K, Lauter K, Lokam S, Micciancio D, Moody D, Morrison T, Sahai A. & Vaikuntanathan V. (2018), Homomorphic encryption security standard, Technical report, HomomorphicEncryption.org, Toronto, Canada.
1. Albrecht MR, Player R. & Scott S. (2015), ‘On the concrete hardness of learning with errors’, J. Mathematical Cryptology 9(3), 169–203. URL: http://www.degruyter.com/view/j/jmc.2015.9.issue-3/jmc-2015-0016/jmc-201...
1. Allen HL, Estrada K, Lettre G, Berndt SI, Weedon MN, Rivadeneira F, Willer CJ, Jackson AU, Vedantam S, Raychaudhuri S. et al. (2010), ‘Hundreds of variants clustered in genomic loci and biological pathways affect human height’, Nature 467(7317), 832–838. - PMC - PubMed
1. Belmont JW, Hardenbol P, Willis TD, Yu F, Yang H, Ch’Ang LY, Huang W, Liu B, Shen Y, Tam PKH, Tsui LC, Waye MMY, Wong JTF, Zeng C, Zhang Q, Chee MS, Galver LM, Kruglyak S, Murray SS, Oliphant AR, Montpetit A, Chagnon F, Ferretti V, Leboeuf M, Phillips MS, Verner A, Duan S, Lind DL, Miller RD, Rice J, Saccone NL, Taillon-Miller P, Xiao M, Sekine A, Sorimachi K, Tanaka Y, Tsunoda T, Yoshino E, Bentley DR, Hunt S, Powell D, Zhang H, Matsuda I, Fukushima Y, Macer DR, Suda E, Rotimi C, Adebamowo CA, Aniagwu T, Marshall PA, Matthew O, Nkwodimmah C, Royal CD, Leppert MF, Dixon M, Cunningham F, Kanani A, Thorisson GA, Chen PE, Cutler DJ, Kashuk CS, Donnelly P, Marchini J, McVean GA, Myers SR, Cardon LR, Morris A, Weir BS, Mullikin JC, Feolo M, Daly MJ, Qiu R, Kent A, Dunston GM, Kato K, Niikawa N, Watkin J, Gibbs RA, Sodergren E, Weinstock GM, Wilson RK, Fulton LL, Rogers J, Birren BW, Han H, Wang H, Godbout M, Wallenburg JC, L’Archevêque P, Bellemare G, Todani K, Fujita T, Tanaka S, Holden AL, Collins FS, Brooks LD, McEwen JE, Guyer MS, Jordan E, Peterson JL, Spiegel J, Sung LM, Zacharia LF, Kennedy K, Dunn MG, Seabrook R, Shillito M, Skene B, Stewart JG, Valle DL, Clayton EW, Jorde LB, Chakravarti A, Cho MK, Duster T, Foster MW, Jasperse M, Knoppers BM, Kwok PY, Licinio J, Long JC, Ossorio P, Wang VO, Rotimi CN, Spallone P, Terry SF, Lander ES, Lai EH, Nickerson DA, Abecasis GR, Altshuler D, Boehnke M, Deloukas P, Douglas JA, Gabriel SB, Hudson RR, Hudson TJ, Kruglyak L, Nakamura Y, Nussbaum RL, Schaffner SF, Sherry ST, Stein LD & Tanaka T. (2003), ‘The international hapmap project’, Nature 426(6968), 789–796. - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Ultrafast homomorphic encryption models enable secure outsourcing of genotype imputation

Affiliations

Ultrafast homomorphic encryption models enable secure outsourcing of genotype imputation

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Comment in

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources