Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Apr 7;109(4):647-668.
doi: 10.1016/j.ajhg.2022.02.010. Epub 2022 Mar 2.

The individual and global impact of copy-number variants on complex human traits

Collaborators, Affiliations

The individual and global impact of copy-number variants on complex human traits

Chiara Auwerx et al. Am J Hum Genet. .

Abstract

The impact of copy-number variations (CNVs) on complex human traits remains understudied. We called CNVs in 331,522 UK Biobank participants and performed genome-wide association studies (GWASs) between the copy number of CNV-proxy probes and 57 continuous traits, revealing 131 signals spanning 47 phenotypes. Our analysis recapitulated well-known associations (e.g., 1q21 and height), revealed the pleiotropy of recurrent CNVs (e.g., 26 and 16 traits for 16p11.2-BP4-BP5 and 22q11.21, respectively), and suggested gene functionalities (e.g., MARF1 in female reproduction). Forty-eight CNV signals (38%) overlapped with single-nucleotide polymorphism (SNP)-GWASs signals for the same trait. For instance, deletion of PDZK1, which encodes a urate transporter scaffold protein, decreased serum urate levels, while deletion of RHD, which encodes the Rhesus blood group D antigen, associated with hematological traits. Other signals overlapped Mendelian disorder regions, suggesting variable expressivity and broad impact of these loci, as illustrated by signals mapping to Rotor syndrome (SLCO1B1/3), renal cysts and diabetes syndrome (HNF1B), or Charcot-Marie-Tooth (PMP22) loci. Total CNV burden negatively impacted 35 traits, leading to increased adiposity, liver/kidney damage, and decreased intelligence and physical capacity. Thirty traits remained burden associated after correcting for CNV-GWAS signals, pointing to a polygenic CNV architecture. The burden negatively correlated with socio-economic indicators, parental lifespan, and age (survivorship proxy), suggesting a contribution to decreased longevity. Together, our results showcase how studying CNVs can expand biological insights, emphasizing the critical role of this mutational class in shaping human traits and arguing in favor of a continuum between Mendelian and complex diseases.

Keywords: CNV; GWAS; UK Biobank; lifespan; mutational burden; penetrance; pleiotropy; polygenicity; structural variants; variable expressivity.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

None
Graphical abstract
Figure 1
Figure 1
CNV frequency landscape in the UK Biobank (A and B) Miami plot of high-confidence probe-level duplication (A) and deletion (B) frequencies [%] in the UKBB. Consecutive probes with identical duplication and deletion frequencies were clustered so that each dot represents one probe group. Loci with duplication frequency ≥ 0.3% or deletion frequency ≥ 0.2% are labeled with cytogenic bands.
Figure 2
Figure 2
CNV-GWAS roadmap of the UK Biobank (A–C) CNV-GWAS association models with PLINK encoding: the mirror model assumes equal-sized but opposite-direction effect of deletion and duplication and estimates the impact of each additional copy (A); the duplication-only model disregards deletion carriers and estimates the effect of duplications (B); the deletion-only model disregards duplication carriers and estimates the effect of deletions (C). (D) Independent genome-wide significant associations (p ≤ 0.05/11,804 = 4.2 × 10−6) between CNV regions (x axis; as cytogenic bands) and traits (y axis). Color tiles represent the model(s) through which the association was detected—dark green, mirror and duplication-only; light green, duplication-only; dark orange, mirror and deletion-only; light orange, deletion-only; dark purple, mirror, duplication-only, and deletion-only; light purple, mirror; white: none—and signs show directionality, so that the duplication (greens), deletion (oranges), or copy number (purples) of a CNV region associated with a phenotypic increase (+) or decrease (−). 16p11.2 (16p11.2 BP2-BP3 and 16p11.2 BP4-BP5) and 22q11.21 recurrent CNVs (LCR B at chr22: 20,400,000) are assessed separately. For each CNV region, average duplication (green) and deletion (orange) frequencies [%] of the lead probe (according to the most significant model) are depicted at the top. Deletion frequency of 1p36.11 was truncated from 3.76%. (E) Boxplot representing height in individuals with CNVs overlapping the Xp22.33 pseudoautosomal region (chrX: 285,850–1,720,422). Sample size is reported for each copy-number category at the top; boxes show the first (Q1), second (median, thick line), and third (Q3) quartiles; lower and upper whiskers show the most extreme value within Q1 minus and Q3 plus 1.5× the interquartile range, respectively; dots show the mean; outliers are not shown.
Figure 3
Figure 3
Replication of CNV-GWAS signals in the Estonian Biobank (A) Estonian Biobank (EstBB; y axis) versus UK Biobank (UKBB; x axis) standardized effect sizes. The identity line is in red; size reflects power at α = 0.05/61; non-significant signals (p > 0.05) are in gray; nominally significant signals (p ≤ 0.05) with 95% confidence intervals are colored according to replication models: mirror (purple), duplication-only (green), or deletion-only (orange); multiple-testing correction surviving signals (p ≤ 8.2 × 10−4) are circled in black and listed in (B) with the first column’s color corresponding to the association model and numbers matching labels in (A). (B) Effect sizes (β; unit in the effect column) and p values (p) for the UKBB and EstBB GWAS, along with the number of individuals with available phenotypic data carrying a deletion, no CNV, or a duplication overlapping the CNV region. Labels indicate: (1) platelet count—1p36.11 (chr1: 25,599,041–25,648,747); (2) glycated hemoglobin (HbA1c)—1q21.1–1q21.2 (chr1: 146,478,785–147,832,715); (3) height—1q21.1–1q21.2 (chr1: 146,478,785–147,832,715); (4) age at menarche—1q21.1 (chr1: 145,368,664–145,738,611); (5) platelet count—16p11.2 BP2-BP3 (chr16: 28,818,541–29,043,450); (6) weight—16p11.2 BP2-BP3 (chr16: 28,818,541–29,043,450); (7) age at menarche—16p11.2 BP4-BP5 (chr16: 29,596,230–30,208,637); (8) body mass index (BMI)—16p11.2 BP4-BP5 (chr16: 29,596,230–30,208,637); (9) waist-to-hip ratio (WHR)—16p11.2 BP4-BP5 (chr16: 29,596,230–30,208,637); (10) height—16p11.2 BP4-BP5 (chr16: 29,596,230–30,208,637); (11) weight—16p11.2 BP4-BP5 (chr16: 29,596,230–30,208,637); (12) alanine aminotransferase (ALT)—16p11.2 BP4-BP5 (chr16: 29,624,931–30,208,637); (13) age at menopause—16p13.11 (chr16: 15,151,451–16,308,285); (14) age at menarche—16p13.11 (chr16: 15,120,501–16,308,285); (15) serum creatinine (SCr)—17p12 (chr17: 14,098,277–15,468,444); (16) SCr—17q12 (chr17: 34,797,651–36,249,489); (17) C-reactive protein (CRP)—17q12 (chr17: 34,797,651–36,249,489); (18) platelet count—22q11.21 LCR A-D (chr22: 19,024,651–21,174,444); (19) BMI—22q11.21 LCR A-D (chr22: 19,024,651–21,463,515); (20) weight—22q11.21 LCR A-D (chr22: 19,024,651–21,463,545); (21) eosinophil count—22q11.21 LCR B-D (chr22: 20,457,855–21,463,545); (22) γ-glutamyl transferase (GGT)—22q11.23 (chr22: 23,688,345–24,990,213).
Figure 4
Figure 4
CNV-GWAS associations at SNP-GWAS loci (A and B) Boxplots representing levels of (A) serum urate in individuals with a 1q21.1 (chr1: 145,383,239–145,765,206) overlapping small (start ≥ 145.6 Mb) or large (start < 145.6 Mb) deletion, copy-neutrality, or duplication and (B) γ-glutamyl transferase (GGT) in individuals with a 22q11.23 (chr22: 23,688,345–24,990,213) overlapping deletion, copy-neutrality, or duplication. Copy number (CN) and sample size (n) are reported for each category; boxes show the first (Q1), second (median, thick line), and third (Q3) quartiles; lower and upper whiskers show the most extreme value within Q1 minus and Q3 plus 1.5× the interquartile range, respectively; dots show the mean; outliers are not shown; light green backgrounds show normal clinical range for serum urate: 89–476 mmol/L (A) and GGT: 4–6 U/L (B). p value of a two-sided t test comparing serum urate levels of small and large 1q21.1 deletion carriers is shown. (C) Association plot for the 1p36.11 deletion (chr1: 25,599,041–25,648,747). Red dashed lines delimit the deletion-only CNV region; left y axis shows the negative logarithm of association p value for reticulocyte count (blue), platelet count (purple), and glycated hemoglobin (HbA1c; red); right y axis shows deletion frequency [%] (orange); encompassed genes are schematically represented at the bottom; retained exons for the most strongly expressed isoform in whole blood are shown for RHD (ENST00000328664) and RSRP1 (ENST00000243189), and shaded color represents the full gene sequence; star indicates the RHD and RSRP1 expression quantitative locus rs55794721. (D and E) GTEx v8 gene expression in 33 tissues for RHD (D) and RSRP1 (E). Brain, cervix, esophagus, and skin are not shown for visibility. Whole blood is shown with a red label.
Figure 5
Figure 5
CNV-GWAS associations at Mendelian disorder loci (A–C) Boxplots showing total bilirubin levels in copy-neutral individuals, small (start ≥ 21.1 Mb) or large (start < 21.1 Mb) 12p12.2-p12.1 (chr12: 21,008,080–21,403,457) overlapping deletion carriers, and Rotor or Dubin-Johnson syndrome-affected individuals (ICD-10 E80.6) (A), cystatin C levels in individuals with a 17q12 (chr17: 34,797,651–36,249,489) overlapping deletion, copy-neutrality, or duplication (B), and hand grip strength in individuals with a 17p12 (chr17: 14,098,277–15,457,056) overlapping deletion, copy-neutrality, or duplication, split according to the presence (w/) or absence (w/o) of a neuropathy (ICD-10 G60.0; red stripes) (C). Copy number (CN) and sample size (n) are reported for each category; boxes show the first (Q1), second (median, thick line), and third (Q3) quartiles; lower and upper whiskers show the most extreme value within Q1 minus and Q3 plus 1.5× the interquartile range, respectively; dots show the mean; outliers are not shown; light green backgrounds show normal clinical range for total bilirubin: 5–17 mmol/L (A) and cystatin C: 0.6–1.2 mg/L (B). p value of a two-sided t test comparing total bilirubin levels of small and large 12p12.2-p12.1 deletion carriers is shown. p values of one-sided t tests comparing hand grip strength of copy neutral and 17p12 duplication carriers with or without a neuropathy diagnosis are shown.
Figure 6
Figure 6
MARF1 as a putative gene involved in human female reproduction (A and B) Boxplots representing age at menarche (A) and menopause (B) in individuals with a 16p13.11 (A, chr16: 15,120,501–16,308,285; B, chr16: 15,151,451–16,308,285) overlapping deletion, copy-neutrality, or duplication. Copy number (CN) and sample size (n) are reported for each category; dots show the mean; boxes show the first (Q1), second (median, thick line), and third (Q3) quartiles; lower and upper whiskers show the most extreme value within Q1 minus and Q3 plus 1.5× the interquartile range, respectively; notches represent median ± 1.58 × IQR/√n; outliers are not shown; light red backgrounds indicate pathogenic values corresponding to primary amenorrhea (age at menarche > 16 years) (A) and premature ovarian insufficiency (age at menopause < 40 years), respectively (B). (C) Mapping of CNVs overlapping the 16p13.11 CNV region (chr16: 15,120,501–16,308,285). Number and frequency of duplications and deletions are at the top left; left plot shows all overlapping CNVs; right plot focuses on the associated CNV region delineated with red dashed lines; duplications are in green, deletions in orange; black lines indicate the lead signal for age at menarche (mirror) and menopause (duplication-only); purple line indicates age at menarche-associated SNP; overlapping recurrent DECIPHER CNV is shown in black and protein-coding genes are colored according to the upper bound of the confidence interval for the observed/expected (o/e) mutation ratio in gnomAD.
Figure 7
Figure 7
The negative impact of the CNV burden on complex traits (A) Pearson correlation across six burden metrices. (B) Significant associations (p ≤ 0.05/63 = 7.9 × 10−4) between the CNV burden, expressed as the number of Mb or genes affected by CNVs (x axis), and traits assessed through CNV-GWASs (y axis). Color represents the type of burden—dark green, CNV and duplication-only; light green, duplication-only; dark orange, CNV and deletion-only; light orange, deletion-only; dark purple, CNV, duplication-only, and deletion-only; light purple, CNV; white, none—found to increase (+) or decrease (−) the considered phenotype. (C) Schematic representation of the correction for modifier CNVs. Top: individuals carrying a CNV overlapping a CNV-GWAS region were identified (i.e., modifier CNV carrier; yellow). Bottom: Phenotype and burden were corrected (green arrows) and a new linear regression was fitted. (D) Significant associations (p ≤ 0.05/63 = 7.9 × 10−4) between the CNV burden after correction for modifier CNVs. Phenotype label color indicates whether the number of associated metrices between the CNV burden and the trait was fully lost (0 associations; red), decreased (gray), identical (black), or increased (blue) after the correction. Green stars mark highly polygenic traits associating with the CNV burden without having any significant CNV-GWAS signals. (E) Significant associations (p ≤ 0.05/63 = 7.9 × 10−4) between the CNV burden and life history traits (y axis). (D and E) follow the legend in (B).

References

    1. Visscher P.M., Wray N.R., Zhang Q., Sklar P., McCarthy M.I., Brown M.A., Yang J. 10 Years of GWAS Discovery: Biology, Function, and Translation. Am. J. Hum. Genet. 2017;101:5–22. - PMC - PubMed
    1. Watanabe K., Stringer S., Frei O., Umićević Mirkov M., de Leeuw C., Polderman T.J.C., van der Sluis S., Andreassen O.A., Neale B.M., Posthuma D. A global overview of pleiotropy and genetic architecture in complex traits. Nat. Genet. 2019;51:1339–1348. - PubMed
    1. Canela-Xandri O., Rawlik K., Tenesa A. An atlas of genetic associations in UK Biobank. Nat. Genet. 2018;50:1593–1599. - PMC - PubMed
    1. Manolio T.A., Collins F.S., Cox N.J., Goldstein D.B., Hindorff L.A., Hunter D.J., McCarthy M.I., Ramos E.M., Cardon L.R., Chakravarti A., et al. Finding the missing heritability of complex diseases. Nature. 2009;461:747–753. - PMC - PubMed
    1. Sudmant P.H., Rausch T., Gardner E.J., Handsaker R.E., Abyzov A., Huddleston J., Zhang Y., Ye K., Jun G., Fritz M.H.Y., et al. An integrated map of structural variation in 2,504 human genomes. Nature. 2015;526:75–81. - PMC - PubMed

Publication types

Substances

LinkOut - more resources