Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Feb;51(2):277-284.
doi: 10.1038/s41588-018-0279-5. Epub 2018 Dec 3.

SumHer better estimates the SNP heritability of complex traits from summary statistics

Affiliations

SumHer better estimates the SNP heritability of complex traits from summary statistics

Doug Speed et al. Nat Genet. 2019 Feb.

Abstract

We present SumHer, software for estimating confounding bias, SNP heritability, enrichments of heritability and genetic correlations using summary statistics from genome-wide association studies. The key difference between SumHer and the existing software LD Score Regression (LDSC) is that SumHer allows the user to specify the heritability model. We apply SumHer to results from 24 large-scale association studies (average sample size 121,000) using our recommended heritability model. We show that these studies tended to substantially over-correct for confounding, and as a result the number of genome-wide significant loci was under-reported by about a quarter. We also estimate enrichments for 24 categories of SNPs defined by functional annotations. A previous study using LDSC reported that conserved regions were 13-fold enriched, and found a further six categories with above threefold enrichment. By contrast, our analysis using SumHer finds that none of the categories have enrichment above twofold. SumHer provides an improved understanding of the genetic architecture of complex traits, which enables more efficient analysis of future genetic data.

PubMed Disclaimer

Conflict of interest statement

Competing interests

The authors declare no competing financial interests.

Figures

Figure 1
Figure 1. Importance of the heritability model.
We generated 100 phenotypes assuming the GCTA heritability model, 100 assuming the LDAK heritability model, then analyzed each using LDSC-Zero, LDSC, SumHer-Zero and SumHer-GC (see main text for details of the heritability models and methods). a, Average estimates of h2SNP (true h2SNP is 0.5). b, Average estimates of the enrichment of heritability in conserved regions (true enrichment is 1). c, Average estimates of genetic correlation between pairs of phenotypes (true correlation is 0.5). In all plots, vertical line segments mark 95% confidence intervals for the average estimates.
Figure 2
Figure 2. Comparing the GCTA and LDAK heritability models.
These analyses use Hybrid-Zero and Hybrid-GC, versions of SumHer that assign weights 1-p and p to the GCTA and LDAK heritability models, respectively. a, Average estimates of p from Hybrid-Zero for GCTA phenotypes (true p = 0), LDAK phenotypes (true p = 1) and hybrid phenotypes (true p = 0.5). b, Estimates of p from Hybrid-Zero for the 25 raw GWAS. Colors distinguish between the 13 WTCCC, 5 binary eMERGE and 7 quantitative eMERGE traits (black denotes the 25-trait average). A precise estimate of p was not possible for shingles (Segment 17), due to the trait having very low h2SNP. c, Estimates of p from Hybrid-GC for the 24 summary GWAS. Colors distinguish between the 9 binary and 15 quantitative traits (black denotes the 24-trait average). In all plots, vertical line segments mark 95% confidence intervals.
Figure 3
Figure 3. Functional enrichments across the 24 summary GWAS.
a, Average estimates of enrichments for the 24 functional categories from LDSC (red bars) and from SumHer-GC (blue bars). b, Average estimates of enrichments from SumHer-GC, based either on the 9 binary or on the 15 quantitative traits. In both plots, horizontal and vertical line segments mark 95% confidence intervals.
Figure 4
Figure 4. Prediction of five quantitative traits.
For each trait, we use data from the 24 summary GWAS to construct Bayesian polygenic risk scores (PRS) corresponding to four heritability models: GCTA, Enriched GCTA, LDAK and Enriched LDAK (see main text for details of each model). a, The distribution of E[v2j], the expected heritability tagged by SNP j, corresponding to each heritability model. b, Prediction accuracy, measured as correlation between observed and predicted phenotypes in the (independent) eMERGE data, for the Classical PRS (effect sizes are frequentist estimates from single-SNP analysis) and for each of the four Bayesian PRS (effect sizes are posterior means). Vertical line segments mark 95% confidence intervals.

References

    1. Bulik-Sullivan B, et al. LD Score Regression Distinguishes Confounding from Polygenicity in Genome-Wide Association Studies. Nat Genet. 2014;47:291–295. - PMC - PubMed
    1. Finucane H, et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat Genet. 2015;47:1228–1235. - PMC - PubMed
    1. Bulik-Sullivan B, et al. An atlas of genetic correlations across human diseases and traits. Nat Genet. 2015;47:1236–1241. - PMC - PubMed
    1. Zheng J, et al. LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis. Bioinformatics. 2016;33:272–279. - PMC - PubMed
    1. Speed D, Cai N, Johnson MR, Nejentsev S, Balding DJ. Reevaluation of SNP heritability in complex human traits. Nat Genet. 2017;49:986–992. - PMC - PubMed

Publication types