Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jul;50(7):906-908.
doi: 10.1038/s41588-018-0144-6.

Mixed-model association for biobank-scale datasets

Affiliations

Mixed-model association for biobank-scale datasets

Po-Ru Loh et al. Nat Genet. 2018 Jul.

Abstract

Biobank-based genome-wide association studies are enabling exciting insights in complex trait genetics, but much uncertainty remains over best practices for optimizing statistical power and computational efficiency in GWAS while controlling confounders. Here, we introduce a much faster version of our BOLT-LMM Bayesian mixed model association method—capable of running analyses of the full UK Biobank cohort in a few days on a single compute node—and show that it produces highly powered, robust test statistics when run on all 459K European samples (retaining related individuals). When used to conduct a GWAS for height in UK Biobank, BOLT-LMM achieved power equivalent to linear regression on 650K samples—a 93% increase in effective sample size versus the common practice of analyzing unrelated British samples using linear regression (UK Biobank documentation; Bycroft et al. bioRxiv). Across a broader set of 23 highly heritable traits, the total number of independent GWAS loci detected increased from 5,839 to 10,759, an 84% increase. We recommend the use of BOLT-LMM (retaining related individuals) for biobank-scale analyses, and we have publicly released BOLT-LMM summary association statistics for the 23 traits analyzed as a resource for all researchers.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:. Power, calibration, and speed of BOLT-LMM v2.3 in UK Biobank analyses.
(a) Numbers of independent genome-wide significant associations (p<5×10–9) identified by BOLT-LMM analyses of all European-ancestry individuals (N=459,327) versus linear regression analyses of unrelated British individuals (N=337,539, following common practice [5]). Results for 23 phenotypes are plotted, with 8 representative phenotypes highlighted. (b) Variance explained by genome-wide SNPs on which BOLT-LMM implicitly conditions to increase power. Conditioning on BOLT-LMM’s polygenic predictions—which attain accuracy (r2BOLT-LMM) approaching SNP-heritability (hg2) for some traits—achieves effective sample sizes as high as ~700K. (We measured effective sample size by comparing χ2 statistics at associated SNPs; Supplementary Note.) (c) Test statistic calibration of BOLT-LMM on all European individuals versus linear regression on unrelated British individuals (using 20 principal component covariates). Attenuation ratios from LD score regression [7, 8] match closely between the two methods, indicating that BOLT-LMM properly controls false positives (Supplementary Fig. 2). Error bars, jackknife s.e. (d) Computational cost of association analysis using BOLT-LMM v2.3, the previous version of BOLT-LMM [3], and linear regression (implemented efficiently within the BOLT-LMM software) on the UK Biobank N=150K and N=500K data releases. Analyses were run on 8 threads on a 2.10 GHz Intel Xeon E5–2683 v4 processor. Additional details and numerical data are provided in the Supplementary Note, Supplementary Fig. 1, and Supplementary Tables 1–7.

Similar articles

Cited by

References

    1. Yu J et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature Genetics 38, 203–208 (2006). - PubMed
    1. Yang J, Zaitlen NA, Goddard ME, Visscher PM & Price AL Advantages and pitfalls in the application of mixed-model association methods. Nature Genetics 46, 100–106 (2014). - PMC - PubMed
    1. Loh P-R et al. Efficient Bayesian mixed model analysis increases association power in large cohorts. Nature Genetics 47, 284–290 (2015). - PMC - PubMed
    1. Sudlow C et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLOS Medicine 12, 1–10 (2015). - PMC - PubMed
    1. Bycroft C et al. Genome-wide genetic data on ~500,000 UK Biobank participants. bioRxiv (2017).

Publication types