Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Oct 21;17(10):e1009483.
doi: 10.1371/journal.pcbi.1009483. eCollection 2021 Oct.

Estimation of regional polygenicity from GWAS provides insights into the genetic architecture of complex traits

Affiliations

Estimation of regional polygenicity from GWAS provides insights into the genetic architecture of complex traits

Ruth Johnson et al. PLoS Comput Biol. .

Abstract

The number of variants that have a non-zero effect on a trait (i.e. polygenicity) is a fundamental parameter in the study of the genetic architecture of a complex trait. Although many previous studies have investigated polygenicity at a genome-wide scale, a detailed understanding of how polygenicity varies across genomic regions is currently lacking. In this work, we propose an accurate and scalable statistical framework to estimate regional polygenicity for a complex trait. We show that our approach yields approximately unbiased estimates of regional polygenicity in simulations across a wide-range of various genetic architectures. We then partition the polygenicity of anthropometric and blood pressure traits across 6-Mb genomic regions (N = 290K, UK Biobank) and observe that all analyzed traits are highly polygenic: over one-third of regions harbor at least one causal variant for each of the traits analyzed. Additionally, we observe wide variation in regional polygenicity: on average across all traits, 48.9% of regions contain at least 5 causal SNPs, 5.44% of regions contain at least 50 causal SNPs. Finally, we find that heritability is proportional to polygenicity at the regional level, which is consistent with the hypothesis that heritability enrichments are largely driven by the variation in the number of causal SNPs.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. BEAVR is relatively unbiased in simulated data.
We ran 100 replicates (M = 1, 000 SNPs, N = 500K individuals) where the genome-wide heritability was set to hGW2=0.5 and the true polygenicity of the region was pr = 0.005, 0.01, 0.05, 0.10. We compared BEAVR to GENESIS-M2 and GENESIS-M3 which employs a spike-and-slab model with either 2 or 3 components (point-mass and either 1 or 2 slabs). All methods are unbiased when the polygenicity is low (pr = 0.005, 0.01). However, when polygenicity is higher (pr = 0.05, 0.10), both GENESIS-M2 and GENESIS-M3 are severely downward biased whereas BEAVR provides unbiased estimates across all settings. Dashed red lines denote true regional polygenicity values in each setting.
Fig 2
Fig 2. BEAVR is relatively unbiased across various genetic architectures.
We ran 100 replicates where we vary the genome-wide heritability to be hGW2=0.10, 0.25, 0.5, the polygenicity of the region to be pr = 0.005, 0.01, 0.05, 0.10, and the sample size N = 50K, 500K, 1 million individuals. We compared BEAVR to GENESIS-M2 (2-component) and GENESIS-M3 (3-component). The x-axis denotes the simulated values for the regional polygenicity and the y-axis denotes the estimated values across 100 replicates. Dashed red lines denote the true regional polygenicity value in each setting.
Fig 3
Fig 3. BEAVR is robust in realistic settings.
(A) Using SNP data from chromosome 22 (M = 9, 564 array SNPs, N = 337K individuals), we simulated 100 replicates where the genome-wide heritability was hGW2=0.50 and p = 0.01. We divided the data into 6-Mb consecutive regions for a total of 6 regions and estimated the regional heritability using external software (HESS [12]). Using BEAVR and the estimated regional heritability, we estimated the regional polygenicity to be unbiased across all regions. (B) We ran 100 replicates where the genome-wide heritability is fixed hGW2=0.50, polygenicity pr = 0.01, sample size N = 500K, and then varied the number of SNPs in the region from M = 500, 1K, 5K SNPs. We used BEAVR to estimate the polygenicity in each region and found our results to be unbiased across all regions. (C) We set the genome-wide heritability to hGW2=0.50, regional polygenicity pr = 0.01, and sample size N = 500K. We find that the accuracy of our results is invariant to our choice of prior hyper-parameter (α).
Fig 4
Fig 4. BEAVR is computationally efficient.
(A) We show the run-time in terms of seconds per iteration of the Gibbs sampler (log-scale). We compare the version of BEAVR with the algorithmic speedup outlined in Materials and methods (‘speedup’) versus the straightforward implementation (‘baseline’). We vary the number of SNPs in the region while fixing the polygenicity of each region to pr = 0.01. (B) We show the runtime of the sampler when the number of SNPs in the region is fixed to M = 1, 000 and we vary the polygenicity.
Fig 5
Fig 5. Distribution of regional polygenicity and heritability.
We divide the genome into 6-Mb regions and report the posterior mean of the regional polygenicity for each region across height and diastolic blood pressure. Using external software [12], we report the distribution of regional heritability for each trait.
Fig 6
Fig 6. Heritability is proportional to the number of causal SNPs in a region.
We show the relationship between the number of causal SNPs and heritability for each region across height and diastolic blood pressure. We fit a linear regression for each trait and report the slope of the regression, which can be interpreted as the increase of heritability per additional causal SNP. Horizontal error bars represent two posterior standard deviations around our estimates for the number of causal SNPs. Vertical error bars represent twice the standard error around the estimates of regional heritability. Dots in black denote outlier regions which have an absolute studentized residual larger than 3.

Similar articles

Cited by

References

    1. Chatterjee N, Wheeler B, Sampson J, Hartge P, Chanock SJ, Park JH. Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies. Nature genetics. 2013;45(4):400. doi: 10.1038/ng.2579 - DOI - PMC - PubMed
    1. Zhang Y, Qi G, Park JH, Chatterjee N. Estimation of complex effect-size distributions using summary-level statistics from genome-wide association studies across 32 complex traits. Nature genetics. 2018;50(9):1318. doi: 10.1038/s41588-018-0193-x - DOI - PubMed
    1. Zeng J, Vlaming R, Wu Y, Robinson MR, Lloyd-Jones LR, Yengo L, et al.. Signatures of negative selection in the genetic architecture of human complex traits. Nature genetics. 2018;50(5):746. doi: 10.1038/s41588-018-0101-4 - DOI - PubMed
    1. O’Connor LJ, Schoech AP, Hormozdiari F, Gazal S, Patterson N, Price AL. Extreme polygenicity of complex traits is explained by negative selection. The American Journal of Human Genetics. 2019;105(3):456–476. doi: 10.1016/j.ajhg.2019.07.003 - DOI - PMC - PubMed
    1. Boyle EA, Li YI, Pritchard JK. An expanded view of complex traits: from polygenic to omnigenic. Cell. 2017;169(7):1177–1186. doi: 10.1016/j.cell.2017.05.038 - DOI - PMC - PubMed

Publication types

MeSH terms