Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Oct 20;12(10):983-993.e7.
doi: 10.1016/j.cels.2021.08.001. Epub 2021 Aug 26.

Privacy-preserving genotype imputation in a trusted execution environment

Affiliations

Privacy-preserving genotype imputation in a trusted execution environment

Natnatee Dokmai et al. Cell Syst. .

Abstract

Genotype imputation is an essential tool in genomics research, whereby missing genotypes are inferred using reference genomes to enhance downstream analyses. Recently, public imputation servers have allowed researchers to leverage large-scale genomic data resources for imputation. However, privacy concerns about uploading one's genetic data to a server limit the utility of these services. We introduce a secure hardware-based solution for privacy-preserving genotype imputation, which keeps the input genomes private by processing them within Intel SGX's trusted execution environment. Our solution features SMac, an efficient and secure imputation algorithm designed for Intel SGX, which employs a state-of-the-art imputation strategy also utilized by existing imputation servers. SMac achieves imputation accuracy equivalent to existing tools and provides protection against known side-channel attacks on SGX while maintaining scalability. We also show the necessity of our enhanced security by identifying vulnerabilities in existing imputation software. Our work represents a step toward privacy-preserving genomic analysis services.

Keywords: Intel SGX; genomic privacy; genotype imputation; imputation server; privacy enhancing technologies; privacy-preserving data analysis; secure computation; secure enclaves; trusted execution environment.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

Figure 1:
Figure 1:. Workflow of SMac.
(1) User and enclave engage in the remote attestation (RA) protocol, which verifies the integrity of SMac’s application binary and the configuration of the enclave to the user. (2) Using a secure channel established during RA, user securely uploads input genotypes directly to the enclave. The uploaded input genotypes reside in an encrypted random-access memory (RAM) region, called the Enclave Page Cache (EPC), and are kept private from the service provider and other users of the system. (3) SMac securely performs imputation inside the enclave leveraging the reference panel held by the service provider. (4) Once imputation is completed, user securely downloads the imputed genotypes from the the enclave through the secure channel.
Figure 2:
Figure 2:. Illustration of SMac’s side-channel resilience.
We compare a standard imputation tool running in SGX (top) and SMac (bottom) with respect to their security against side-channel attacks. When using a standard imputation tool, an attacker who can measure the runtime of the SGX process can exploit the runtime discrepancy between different logical branches of the program to infer the user’s genotype (xi). As an example, we depict a well-established timing difference in floating-point multiplication between subnormal (very small; e.g. 10−38) and normal numbers (e.g. 0.3). In contrast, SMac leverages constant-time operations based on log-transformed fixed-point numbers to ensure that the runtimes of SMac components do not leak information about the input genotypes, while accurately carrying out the same computation overall (Method Details).
Figure 3:
Figure 3:. SMac imputation accuracy is identical to Minimac and substantially higher than homomorphic encryption-based solutions.
We conducted a cross-validation experiment on 1KG and HRC datasets (chromosome 20) to compare the accuracy of SMac to Minimac3 (the most accurate version of Minimac), and recent homomorphic encryption-based imputation methods (HE-EPFL and HE-UTMSR) (A). Accuracy is measured by Pearson r2 within each minor allele frequency (MAF) range of the target variant (0.01-0.5%, 0.5-5%, 5-50%). Black dot indicates mean r2, red horizontal line indicates the median, boxes extend to upper and lower quartiles, and whiskers extend to extreme values excluding outliers marked by red plus symbols. We also plot r2 of SMac and HE-based solutions for individual test subjects (y-axis) against Minimac r2 (x-axis) on HRC data, 0.5-5% MAF category (B). Note that blue dots representing SMac lie on the diagonal, showing that SMac and Minimac results are identical.
Figure 4:
Figure 4:. SMac achieves practical performance with respect to runtime and memory usage.
We measured the runtime (A) and memory usage (B) of SMac and Minimac4 for imputing chromosome 20 of a single sample on a range of reference panels with varying sizes, including 1KG, HRC, and subsampled HRC datasets with 10k and 25k haplotypes each. We also show the performance of SMac-lite, a less secure version of SMac which runs the same imputation algorithm as SMac in SGX without our additional protection for side-channel leakages. All results reflect the average of five trials. All methods show linear scaling in both runtime and memory with respect to data size, and SMac incurs a modest 54% runtime overhead on the largest dataset with 54k haplotypes while additionally providing protection for the user’s data. SMac and SMac-lite use ~100 MB in enclave page cache (EPC) and the rest in non-EPC RAM for swap (total is shown as EPC+swap). This overall halves the memory usage of Minimac4.
Figure 5:
Figure 5:. Demonstration of side-channel vulnerabilities in the original Minimac imputation algorithm.
We identified two vulnerable gadgets in Minimac—renormalization (top row) and emission (bottom row)—and demonstrated two types of attacks (port contention and Prime+Probe), as described in main text. We provide the pseudocodes of the two gadgets (A and D) with extractable secrets highlighted in magenta and measurable attack surfaces in blue. B and E show a successful attack on both gadgets via port contention side-channel. Each data point reflects the amount of contention in clock cycles. Horizontal lines in red and green indicate the sample mean for each segment of the victim process, which successfully distinguish the underlying secrets (0 or 1) shown at the top. C and F demonstrate another type of attack based on Prime+Probe, which holds even when the simultaneous multithreading (SMT) feature of SGX is turned off as a security measure, unlike port contention. Each data point reflects the latency in the probe step of the attack in clock cycles. Clear distinction in latency is observed between different user secrets (0 or 1) shown at the top.
Figure 6:
Figure 6:. SMac enhances eQTL identification in GTEx dataset.
We performed an eQTL analysis of whole blood samples from 466 individuals in GTEx dataset with and without SMac imputation. (A) Number of significant eQTL associations (Bonferroni-corrected p < 0.05) identified for each gene in the first 23 Mbp of chromosome 20. SMac imputation leads to a greater number of significant associations. (B) Example eQTL association signals (shown as −log10 p-values) in a small genomic window (chr20:3,000,000–3,500,000). Significant associations corresponding to the two most represented genes, ITPA and C20orf194, are highlighted in blue and red, respectively. SMac imputation leads to a higher resolution map of eQTLs and obtains identical results compared to an eQTL analysis based on Minimac4 imputation.

Comment in

References

    1. Aciicmez O and Seifert J (2007) ‘Cheap hardware parallelism implies cheap security’, in Workshop on fault diagnosis and tolerance in cryptography (FDTC 2007), pp. 80–91. doi: 10.1109/FDTC.2007.16. - DOI
    1. Aldaya AC et al. (2019) ‘Port contention for fun and profit’, in 2019 IEEE symposium on security and privacy (SP). IEEE, pp. 870–887.
    1. Andrysco M et al. (2015) ‘On subnormal floating point and abnormal timing’, in 2015 IEEE symposium on security and privacy. IEEE, pp. 623–639.
    1. Baum LE and others (1972) ‘An inequality and associated maximization technique in statistical estimation for probabilistic functions of markov processes’, Inequalities, 3(1), pp. 1–8.
    1. Brasser F et al. (2017) ‘Software grand exposure: SGX cache attacks are practical’, in 11th USENIX workshop on offensive technologies (WOOT17). Vancouver, BC: USENIX Association. Available at: https://www.usenix.org/conference/woot17/workshop-program/presentation/b....

Publication types