Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 24;20(7):e1012142.
doi: 10.1371/journal.pcbi.1012142. eCollection 2024 Jul.

FedGMMAT: Federated generalized linear mixed model association tests

Affiliations

FedGMMAT: Federated generalized linear mixed model association tests

Wentao Li et al. PLoS Comput Biol. .

Abstract

Increasing genetic and phenotypic data size is critical for understanding the genetic determinants of diseases. Evidently, establishing practical means for collaboration and data sharing among institutions is a fundamental methodological barrier for performing high-powered studies. As the sample sizes become more heterogeneous, complex statistical approaches, such as generalized linear mixed effects models, must be used to correct for the confounders that may bias results. On another front, due to the privacy concerns around Protected Health Information (PHI), genetic information is restrictively protected by sharing according to regulations such as Health Insurance Portability and Accountability Act (HIPAA). This limits data sharing among institutions and hampers efforts around executing high-powered collaborative studies. Federated approaches are promising to alleviate the issues around privacy and performance, since sensitive data never leaves the local sites. Motivated by these, we developed FedGMMAT, a federated genetic association testing tool that utilizes a federated statistical testing approach for efficient association tests that can correct for confounding fixed and additive polygenic random effects among different collaborating sites. Genetic data is never shared among collaborating sites, and the intermediate statistics are protected by encryption. Using simulated and real datasets, we demonstrate FedGMMAT can achieve the virtually same results as pooled analysis under a privacy-preserving framework with practical resource requirements.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Scatter plot on -log10 scale of P-values of Synthetic data in FedGMMAT and GMMAT (Experimental Setup 3).
Fig 2
Fig 2. Scatter plot on -log10 scale of P-values in FedGMMAT and GMMAT.
(A) Heterogeneity; (B) Homogeneity.
Fig 3
Fig 3. Scatter plot on -log10 scale of P-values in FedGMMAT and GMMAT on real data.
Fig 4
Fig 4. The comparison of computation time with different numbers of SNPs per batch in minutes between HE and Non-HE protection of FedGMMAT.
Fig 5
Fig 5. The total run-time per site (y-axis) in second with respect to the number of subjects on each site (x-axis).
Fig 6
Fig 6. Matrix Splitting at the Central Server and aggregation among local data repositories.
In the FedGMMAT framework, each local data repository will maintain its unique dataset locally and gather intermediate model information from Global updates. The vertical splitting of Σ1 among local repositories is illustrated in the figure as an example. Each site receives nj × n sized covariate matrices (Xj), which are multiplied locally with nj × p sized matrices. The resulting n × p matrices are encrypted and aggregated among sites via round-robin schedule and sent to the server. Splitting and aggregation of other matrices is accomplished with similar protocol.
Fig 7
Fig 7. Key setup and aggregation via round-robin schedule.
The central server initiates the public/private key pair at the initialization and broadcasts the public key to all sites. Aggregation of split matrices are performed among sites by taking turns in aggregation. Each site encrypts their matrix “share” before aggregation. Symmetric keys (secret seeds) for OTP-like encryption is sent to all sites at the scoring stage whenever an aggregation is performed (S2 and S3 Figs).

Similar articles

Cited by

References

    1. Goodwin S, McPherson JD, McCombie WR. Coming of age: ten years of next-generation sequencing technologies. Nature Reviews Genetics. 2016;17(6):333–351. doi: 10.1038/nrg.2016.49 - DOI - PMC - PubMed
    1. Taliun D, Harris DN, Kessler MD, Carlson J, Szpiech ZA, Torres R, et al.. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature. 2021;590(7845):290–299. doi: 10.1038/s41586-021-03205-y - DOI - PMC - PubMed
    1. Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, et al.. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS medicine. 2015;12(3):e1001779. doi: 10.1371/journal.pmed.1001779 - DOI - PMC - PubMed
    1. Investigators A. The “All of Us” research program. New England Journal of Medicine. 2019;381(7):668–676. - PMC - PubMed
    1. Wang Q, Lu Q, Zhao H. A review of study designs and statistical methods for genomic epidemiology studies using next generation sequencing. Frontiers in genetics. 2015;6:149. doi: 10.3389/fgene.2015.00149 - DOI - PMC - PubMed

LinkOut - more resources