. 2021 Mar 31;217(3):iyaa046.

doi: 10.1093/genetics/iyaa046.

The genetic architecture of human complex phenotypes is modulated by linkage disequilibrium and heterozygosity

Dominic Holland¹, Oleksandr Frei², Rahul Desikan³, Chun-Chieh Fan¹, Alexey A Shadrin², Olav B Smeland², Ole A Andreassen², Anders M Dale³

Affiliations

¹ Center for Multimodal Imaging and Genetics, University of California at San Diego, La Jolla, CA 92037, USA.
² NORMENT, KG Jebsen Centre for Psychosis Research, Institute of Clinical Medicine, University of Oslo, Oslo 0424, Norway.
³ Department of Radiology, University of California, San Francisco, San Francisco, CA 94158, USA.

PMID: 33789345
PMCID: PMC8045737
DOI: 10.1093/genetics/iyaa046

The genetic architecture of human complex phenotypes is modulated by linkage disequilibrium and heterozygosity

Dominic Holland et al. Genetics. 2021.

. 2021 Mar 31;217(3):iyaa046.

doi: 10.1093/genetics/iyaa046.

Authors

Dominic Holland¹, Oleksandr Frei², Rahul Desikan³, Chun-Chieh Fan¹, Alexey A Shadrin², Olav B Smeland², Ole A Andreassen², Anders M Dale³

Affiliations

¹ Center for Multimodal Imaging and Genetics, University of California at San Diego, La Jolla, CA 92037, USA.
² NORMENT, KG Jebsen Centre for Psychosis Research, Institute of Clinical Medicine, University of Oslo, Oslo 0424, Norway.
³ Department of Radiology, University of California, San Francisco, San Francisco, CA 94158, USA.

PMID: 33789345
PMCID: PMC8045737
DOI: 10.1093/genetics/iyaa046

Erratum in

Erratum to: The genetic architecture of human complex phenotypes is modulated by linkage disequilibrium and heterozygosity.
Holland D, Frei O, Desikan R, Fan CC, Shadrin AA, Smeland OB, Andreassen OA, Dale AM. Holland D, et al. Genetics. 2021 Jun 24;218(2):iyab064. doi: 10.1093/genetics/iyab064. Genetics. 2021. PMID: 34167151 Free PMC article. No abstract available.

Abstract

We propose an extended Gaussian mixture model for the distribution of causal effects of common single nucleotide polymorphisms (SNPs) for human complex phenotypes that depends on linkage disequilibrium (LD) and heterozygosity (H), while also allowing for independent components for small and large effects. Using a precise methodology showing how genome-wide association studies (GWASs) summary statistics (z-scores) arise through LD with underlying causal SNPs, we applied the model to GWAS of multiple human phenotypes. Our findings indicated that causal effects are distributed with dependence on total LD and H, whereby SNPs with lower total LD and H are more likely to be causal with larger effects; this dependence is consistent with models of the influence of negative pressure from natural selection. Compared with the basic Gaussian mixture model it is built on, the extended model-primarily through quantification of selection pressure-reproduces with greater accuracy the empirical distributions of z-scores, thus providing better estimates of genetic quantities, such as polygenicity and heritability, that arise from the distribution of causal effects.

Keywords: effect size; heritability; linkage disequilibrium; minor allele frequency; natural selection; polygenicity.

PubMed Disclaimer

Figures

**Figure 1**
Examples of $π_{1} \times$ prior probability functions $p_{c} (L)$ used in Equations (4) and (5), where L is reference SNP total LD (see Equation (3) for the general expression, and Appendix Table G for parameter values). These functions can be summarized by three quantities: the maximum value, $p_{c 1}$ , which occurs at L = 1; the total LD value, L = *m_c*, where $p_{c} (m_{c}) = p_{c 1} / 2$ , given by the gray dashed lines in the figure; and the total LD width of the transition region, *w_c*, defined as the distance between where $p_{c} (L)$ falls to 95% and 5% of $p_{c 1}$ given by the flanking red dashed lines in the figure. Numerical values of $p_{c 1}$ , *m_c*, and *w_c* are given in Table 1 and Figures 2 and 3. $p_{d} (L)$ is similar. Plots of $p_{c} (L)$ and $p_{d} (L)$ , where relevant, for all phenotypes are shown in Supplementary Figures S3–S5.

**Figure 2**
QQ plots of (pruned) z-scores for qualitative phenotypes (dark blue, 95% confidence interval in light blue) with model prediction (yellow): (A) Alzheimer’s Disease, excluding chromosome 19; (B) Amyotrophic Lateral Sclerosis, chromosome 9 only; (C) Bipolar Disorder; (D) Schizophrenia; (E) AD, chromosome 19 only; (F) Crohn’s Disease; (G) Ulcerative Colitis; and (H) Coronary Artery Disease. See Supplementary Figures S15–S22. amplitude of the full $p_{c} (L)$ function, which occurs at L = 1; the values (*m_c*, *w_c*) in parentheses following it are the total LD (*m_c*) where the function falls to half its amplitude (the middle gray dashed lines in Figure 1 are examples), and the total LD width (*w_c*) of the transition region (distance between flanking red dashed lines in Figure 1). Similarly for $p_{d 1} (m_{d}, w_{d})$ , where given. $h_{b}^{2}, h_{c}^{2}$ , and $h_{d}^{2}$ are the heritabilities associated with the “b,” “c,” and “d” Gaussians, respectively. h² is the total SNP heritability, reexpressed as $h_{l}^{2}$ on the liability scale for binary phenotypes. Parameter values are also given in Table 1 and heritabilities are also in Table 3; numbers of causal SNPs are in Table 2. Reading the plots: on the vertical axis, choose a P-value threshold for typed or imputed SNPs (SNPs with z-scores; more extreme values are further from the origin), then the horizontal axis gives the proportion, q, of typed SNPs exceeding that threshold (higher proportions are closer to the origin). See also Supplementary Figure S1, where the y-axis is restricted to $0 \leq - {log}_{10} (p) ⩽ 10$ . The $h_{b}^{2}$ values reported here are from one component in the extended model; values for the exclusive basic model are reported as $h_{β}^{2}$ in Holland *et al.* (2020).

**Figure 3**
QQ plots of (pruned) z-scores for quantitative phenotypes (dark blue, 95% confidence interval in light blue) with model prediction (yellow): (A) Body Mass Index; (B) Intelligence; (C) Education; (D) Height (2010); (E) High-density Lipoprotein; (F) Low-density Lipoprotein; (G) Total Cholesterol; and (H) Height (2014). See Supplementary Figures S23–S29. For HDL, $p_{c} (L) = p_{c 1}$ for all L; for bipolar disorder and LDL, $p_{d} (L) = p_{d 1}$ for all L. See caption to Figure 2 for further description. See also Supplementary Figure S2, where the y-axis is restricted to $0 \leq - {log}_{10} (p) ⩽ 10$ . For (A) BMI, see also Supplementary Figure S37.

**Figure 4**
A 4 × 4 subset from a 10 × 10 heterozygosity × TLD grid of QQ plots for HDL; see Figure 3E for the overall summary plot. Similar plots for all phenotypes are in Supplementary Figures S15–S29. The light gray curves are 95% confidence intervals for the data; ${\hat{λ}}_{D}$ and ${\hat{λ}}_{M}$ are the “genomic inflation factors” calculated from the QQ subplots, for the data and the model prediction, respectively; n is the number of SNPs; H is heterozygosity, L is total LD, and the square brackets give their ranges for GWAS SNPs in each grid element.

**Figure 5**
Model results for height (2014) using the BC model. The reference panel SNPs are binned with respect to both heterozygosity (H) and total LD (L) in a 50 × 50 grid for $0.02 \leq H ⩽ 0.5$ and $1 \leq L ⩽ 500$ . Shown are model estimates of: (A) ${log}_{10}$ of the percentage of heritability in each grid element; (B) for each element, the average heritability per causal-SNP in the element; (C) ${log}_{10}$ of the number of causal SNPs in each element; and (D) the expected $β^{2}$ for the element-wise causal SNPs. Note that H increases from top to bottom.

**Figure 6**
QQ plots of (pruned) z-scores for simulated qualitative phenotypes (dark blue, 95% confidence interval in light blue) with model prediction (yellow). See Figure 2. The value given for $p_{c 1}$ is the amplitude of the full $p_{c} (L)$ function, which occurs at L = 1; the values (*m_c*, *w_c*) in parentheses following it are the total LD (*m_c*) where the function falls to half its amplitude (the middle gray dashed lines in Figure 1 are examples), and the total LD width (*w_c*) of the transition region (distance between flanking red dashed lines in Figure 1). Similarly for $p_{d 1} (m_{d}, w_{d})$ , where given. $h_{b}^{2}, h_{c}^{2}$ , and $h_{d}^{2}$ are the heritabilities associated with the “b,” “c,” and “d” Gaussians, respectively. h² is the total SNP heritability, re-expressed as $h_{l}^{2}$ on the liability scale for binary phenotypes. Reading the plots: on the vertical axis, choose a p-value threshold for typed SNPs (SNPs with z-scores; more extreme values are further from the origin), then the horizontal axis gives the proportion, q, of typed SNPs exceeding that threshold (higher proportions are closer to the origin). See Appendix Tables C1 and C2 for a comparison of numerical values between model estimates for real phenotypes and Hapgen-based simulations where the underlying distributions of simulation causal effects were given based on the real phenotype model parameters (with $σ_{0} = 1$ ).

**Figure 7**
QQ plots of (pruned) z-scores for simulated quantitative phenotypes (dark blue, 95% confidence interval in light blue) with model prediction (yellow). See Figure 3. See caption to Figure 2 for further description. See Appendix Tables C1 and C2 for a comparison of numerical values between model estimates for real phenotypes and Hapgen-based simulations where the underlying distributions of simulation causal effects were given based on the real phenotype model parameters (with $σ_{0} = 1$ ).

See this image and copyright information in PMC

References

1. Al-Chalabi A, Fang F, Hanby MF, Leigh PN, Shaw CE, et al. 2010. An estimate of amyotrophic lateral sclerosis heritability using twin data. J Neurol Neurosurg Psychiatry 81:1324–1326. - PMC - PubMed
1. Alzheimer’s Association. 2018. 2018 Alzheimer’s disease facts and figures. Alzheimer’s Dementia, 14:367–429.
1. Berg JJ, Harpak A, Sinnott-Armstrong N, Joergensen AM, Mostafavi H, et al. 2019. Reduced signal for polygenic adaptation of height in UK biobank. eLife. 8:e39725. - PMC - PubMed
1. Bulik-Sullivan BK, Loh P-R, Finucane HK, Ripke S, Yang J, Schizophrenia Working Group of the Psychiatric Genomics Consortium, et al. 2015. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet. 47:291–295. - PMC - PubMed
1. Burisch J, Jess T, Martinato M, Lakatos PL, ECCO EpiCom.. 2013. The burden of inflammatory bowel disease in Europe. J Crohns Colitis. 7:322–337. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

U24 DA041123/DA/NIDA NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The genetic architecture of human complex phenotypes is modulated by linkage disequilibrium and heterozygosity

Affiliations

The genetic architecture of human complex phenotypes is modulated by linkage disequilibrium and heterozygosity

Authors

Affiliations

Erratum in

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials