Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Apr 3;39(4):btad193.
doi: 10.1093/bioinformatics/btad193.

Multivariate genome-wide association analysis by iterative hard thresholding

Affiliations

Multivariate genome-wide association analysis by iterative hard thresholding

Benjamin B Chu et al. Bioinformatics. .

Abstract

Motivation: In a genome-wide association study, analyzing multiple correlated traits simultaneously is potentially superior to analyzing the traits one by one. Standard methods for multivariate genome-wide association study operate marker-by-marker and are computationally intensive.

Results: We present a sparsity constrained regression algorithm for multivariate genome-wide association study based on iterative hard thresholding and implement it in a convenient Julia package MendelIHT.jl. In simulation studies with up to 100 quantitative traits, iterative hard thresholding exhibits similar true positive rates, smaller false positive rates, and faster execution times than GEMMA's linear mixed models and mv-PLINK's canonical correlation analysis. On UK Biobank data with 470 228 variants, MendelIHT completed a three-trait joint analysis (n=185 656) in 20 h and an 18-trait joint analysis (n=104 264) in 53 h with an 80 GB memory footprint. In short, MendelIHT enables geneticists to fit a single regression model that simultaneously considers the effect of all SNPs and dozens of traits.

Availability and implementation: Software, documentation, and scripts to reproduce our results are available from https://github.com/OpenMendel/MendelIHT.jl.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1
Figure 1
FP counts evaluated on LD-pruned genotypes reveal mIHT maintains low FP counts even on datasets that are in increasing linkage equilibrium. The x-axis corresponds to filtering the original NFBC chr1 genotypes at different pairwise correlation cutoffs. A smaller value means more aggressive pruning.
Figure 2
Figure 2
An 18-trait joint analysis on UK Biobank’s metabolomic traits using mIHT. The effect size for each trait is plotted against its chromosome position. The larger effect sizes are labeled with their SNP names. Note, one unit increase in effect size does not directly translate to one unit increase of lipids levels in its original scale because all traits were log-transformed and standardized. The featured metabolomic traits are available under category 220 of the UK Biobank where their field IDs appear in Supplementary Table S6.

References

    1. Abraham G, Qiu Y, Inouye M. et al. FlashPCA2: principal component analysis of Biobank-scale genotype datasets. Bioinformatics 2017;33:2776–8. - PubMed
    1. Agrawal A, Chiu AM, Le M. et al. Scalable probabilistic PCA for large-scale genetic variation data. PLoS Genet 2020;16:e1008773. - PMC - PubMed
    1. Alexander DH, Lange K.. Stability selection for genome-wide association. Genet Epidemiol 2011;35:722–8. - PubMed
    1. Barber RF, Candès EJ.. Controlling the false discovery rate via knockoffs. Ann Statist 2015;43:2055–85.
    1. Bezanson J, Edelman A, Karpinski S. et al. Julia: a fresh approach to numerical computing. SIAM Rev 2017;59:65–98.

Publication types