Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Oct 27;185(22):4233-4248.e27.
doi: 10.1016/j.cell.2022.09.028.

Influences of rare copy-number variation on human complex traits

Affiliations

Influences of rare copy-number variation on human complex traits

Margaux L A Hujoel et al. Cell. .

Abstract

The human genome contains hundreds of thousands of regions harboring copy-number variants (CNV). However, the phenotypic effects of most such polymorphisms are unknown because only larger CNVs have been ascertainable from SNP-array data generated by large biobanks. We developed a computational approach leveraging haplotype sharing in biobank cohorts to more sensitively detect CNVs. Applied to UK Biobank, this approach accounted for approximately half of all rare gene inactivation events produced by genomic structural variation. This CNV call set enabled a detailed analysis of associations between CNVs and 56 quantitative traits, identifying 269 independent associations (p < 5 × 10-8) likely to be causally driven by CNVs. Putative target genes were identifiable for nearly half of the loci, enabling insights into dosage sensitivity of these genes and uncovering several gene-trait relationships. These results demonstrate the ability of haplotype-informed analysis to provide insights into the genetic basis of human complex traits.

Keywords: Complex traits; Copy-number variants; Genetic associations; Haplotypes; Structural variation.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests V.G.S. serves as an advisor to and/or has equity in Branch Biosciences, Ensoma, Novartis, Forma, and Cellarity, all unrelated to the present work.

Figures

Figure 1:
Figure 1:. Haplotype-informed CNV detection from SNP-array data in UK Biobank.
A The HI-CNV framework improves power to detect CNVs by analyzing SNP-array data from an individual together with corresponding data from individuals with long shared haplotypes (“haplotype neighbors”). In contrast, standard approaches analyze data from the individual alone. B SNP-specific genotype cluster priors map allele-specific (A and B allele) probe intensity measurements to probabilistic information about copy-number likelihoods. C Average number of CNVs called by PennCNV and HI-CNV per UK Biobank participant. D Distribution of total CNV length per individual in the HI-CNV call set. E Validation rate of CNV calls from PennCNV and HI-CNV on 43 UK Biobank participants with independent whole-genome sequencing data. Error bars, 95% CIs. F Distribution of CNV lengths in the HI-CNV call set. G Distributions (across increasingly constrained gene sets) of observed counts of whole-gene deletions and duplications and pLoF CNVs in n=452,500 UK Biobank participants. Centers, medians; box edges, 25th and 75th percentiles; whiskers, 5th and 95th percentiles.
Figure 2:
Figure 2:. HI-CNV performance benchmarks on subsamples of the UK Biobank data set.
To evaluate the extent to which HI-CNV improves detection sensitivity in smaller sample sizes, we benchmarked the performance of HI-CNV across a range of subsamples of UK Biobank (N = 5K, 15K, 50K, and 150K). A For a subset of 500 individuals included in all subsamples, for each CNV call made in these individuals in the full N~500K analysis, we determined the minimal sample (N = 5K, 15K, 50K, 150K, or full cohort) in which the call was detected. Full bar heights indicate average numbers of calls across the 500 individuals (from the full N~500K analysis) stratified by event size and CNV type (deletion vs duplication). Shading reflects the subsample in which each call was first detected (defined as a call in the subsample overlapping or perfectly replicating the given call). These analyses showed that while detection sensitivity increased with sample size as expected (especially for small CNVs <10kb), most CNV calls made using the full UK Biobank cohort were already detectable by HI-CNV at a sample size of N=5K. B We compared the average number of calls per individual made by HI-CNV (on N = 5K, 15K, 50K, 150K, or all samples) to PennCNV. The average number of called CNVs per individual is plotted across the various subsamples, colored by CNV type. The horizontal lines reflect the average number of events detected by PennCNV across the entire UK Biobank cohort. (In each subsample, ~90% of calls (range: 89%–93%) replicated or overlapped calls made using the full cohort, indicating effective false-positive control in these downsampled analyses.)
Figure 3:
Figure 3:. Fine-mapping analyses reveal likely-causal CNV-trait associations.
A Association and fine-mapping pipeline; inset depicts the three categories of CNVs tested. B Effect size versus minor allele frequency for 269 likely-causal CNV-phenotype associations, colored by phenotype category. C Distributions of CNV length (left) and genic context (right) across all CNVs and across likely-causal CNVs. D Breakdown of 97 CNV loci according to prior literature status and whether a putative target gene was identified. E Candidate target genes, categorized according to whether (i) the CNV-phenotype association was previously reported, (ii) the target gene was previously implicated (either by a previously-reported coding variant association or by previous experimental work), or (iii) neither of the above. The rightmost column lists syndromic CNVs re-identified here. Colors indicate CNV type; bold font indicates noncoding CNVs potentially regulating the target gene. F Genic context of syndromic CNVs (bottom) and non-syndromic CNVs (top) stratified by the number of phenotype categories associated with the CNV.
Figure 4:
Figure 4:. Corroboration and replication of CNV-phenotype associations.
A Loss-of-function burden analyses in UK Biobank. For associations involving CNVs that we believed acted on a candidate target (focal) gene (Figure 3E), we compared the estimated effect of CNVs predicted to cause loss-of-function (pLoF) of the putative target gene to the estimated effect of ultra-rare pLoF SNP and indel variants in the same gene (recently reported in a whole-exome analysis of UK Biobank that performed SNP/indel pLoF burden tests (Backman et al., 2021)). Effect sizes and 95% confidence intervals are shown in red for the pLoF CNVs and in black for the pLoF SNP/indel burden; markers and error bars for the pLoF SNP/indel burden are shaded based on power to detect an association (assuming an effect size equal to the pLoF CNV and accounting for the combined allele frequency of the pLoF SNPs and indels). Previously reported associations are shown with a triangle, genes previously implicated are shown with a circle, and the remaining genes are shown with a square. B Replication of CNV-phenotype associations in BioBank Japan. We attempted to replicate 14 associations (selected based on available phenotyping and power in BioBank Japan) involved in gene-trait relationships putatively uncovered by our analysis of UK Biobank. Effect sizes and 95% confidence intervals are shown in red for pLoF CNVs and in blue for whole-gene duplications.
Figure 5:
Figure 5:. CNV-phenotype associations stronger than nearby SNPs. A UHRF2 locus.
Top: height associations for UHRF2 pLoF CNVs and nearby SNPs. Bottom: locations of UHRF2 pLoF CNVs and SNP and indel PTVs; left: effect sizes for height. B SLC2A3 locus. Top: menarche age associations for SLC2A3 duplications and deletions and nearby SNPs. Bottom: locations of SLC2A3 deletions and duplications; left: effect sizes for menarche age, height, and basophil and lymphocyte counts. C BMP5 locus. Top: bone mineral density associations for a deletion upstream of BMP5 and nearby SNPs (colored according to linkage disequilibrium with the deletion, for SNPs with R2>0.1 to the deletion). Bottom: locations of the upstream deletion, BMP5 pLoF CNVs, and SNP and indel PTVs; left: effect sizes for bone mineral density. In all panels, deletions are colored red and duplications are colored blue. Error bars on effect sizes, 95% CIs. Numerical results are available in Table S5; example signal intensity plots are in Figure S3.
Figure 6:
Figure 6:. Allelic series involving both regulatory and gene-altering CNVs. A HBA locus.
Eight classes of CNVs at the α-globin locus and their effect sizes for mean corpuscular hemoglobin and red blood cell counts. Genomic annotations indicate accessible chromatin regions in erythroblasts (Ulirsch et al., 2019) and distal DNase I hypersensitive sites (DHS) for HBA2/HBA1 (Thurman et al., 2012), highlighting the HS-40 super-enhancer. B JAK2 locus. Four classes of variants – JAK2 pLoF CNVs, JAK2 SNP and indel PTVs, a deletion of a distal enhancer, and the common SNP rs12005199 within the enhancer – and their effect sizes for platelet counts. Genomic annotations indicate accessible chromatin regions in megakaryocytes (Ulirsch et al., 2019) and JAK2 distal DHS pairs (Thurman et al., 2012), which colocalize with common-SNP platelet count associations (top) at the enhancer region ~220kb upstream of JAK2. C IRF8 locus. Fine-mapped common variants and rare pLoF variants at the IRF8 locus – including a putatively regulatory distal deletion, IRF8 pLoF CNVs, and IRF8 SNP and indel PTVs – and their effect sizes for monocyte counts. Genomic annotations indicate accessible chromatin regions in monocytes (Ulirsch et al., 2019) and GeneHancer connections (Fishilevich et al., 2017) between downstream regulatory regions and IRF8. D R3HDM4 locus. Rare CNVs, SNP and indel PTVs, and a common intronic SNP at R3HDM4 and their effect sizes for reticulocyte counts. Genomic annotations indicate ChromHMM (Ernst and Kellis, 2017) annotations, accessible chromatin regions in erythroblasts (Ulirsch et al., 2019), and GeneHancer connections (Fishilevich et al., 2017), all indicating regulatory function in the first intron of R3HDM4. The lead-associated SNP rs1683587 (top) also lies within this intron, suggesting regulatory function. In a and b, DHS pairs are colored by their correlation value, from light red (correlation < 0.8) to dark red (correlation >0.95). Error bars on effect sizes, 95% CIs. Numerical results are available in Table S5; example signal intensity plots are in Figure S3.
Figure 7:
Figure 7:. Contrasting phenotypic effects of deletions and duplications.
A,B Mean height (a) and years of education (b) as a function of total genomic length affected by deletions and duplications. Individuals carrying a known syndromic CNV were excluded from analysis. Numerical results are presented in Table S7. C Associations between whole-gene deletions and quantitative traits in targeted analyses of 41 gene-trait pairs for which we previously identified likely trait-altering PTVs(Barton et al., 2021) and for which the HI-CNV call set contained at least two whole-gene deletions. Effect sizes and 95% confidence intervals are shown in red for 16 genes for which whole-gene deletions exhibited nominally significant associations (P < 0.05); effect sizes for SNP or indel PTVs (Barton et al., 2021) are shown in black. D Observing 16 nominally significant associations was consistent with whole-gene deletions having the same effects as PTVs. Probability distributions indicate numbers of significant associations in simulations in which whole-gene deletions have no effect (grey), half the effect magnitude as PTVs (light pink), or the same effect magnitude as PTVs (red). E,F Analogous results for whole-gene duplications in targeted analyses of 139 gene-trait pairs, which produced 27 significant associations (P < 0.05), consistent with whole-gene duplications having less than half the effect magnitude of PTVs. (The aberrant effect directions of DOCK8 deletions and duplications relative to the DOCK8 PTV rs192864327 may be explained by this variant only causing loss of function in one of several transcripts.)

References

    1. Abel HJ, Larson DE, Regier AA, Chiang C, Das I, Kanchi KL, Layer RM, Neale BM, Salerno WJ, Reeves C, et al. (2020). Mapping and characterization of structural variation in 17,795 human genomes. Nature 583, 83–89. 10.1038/s41586-020-2371-0. - DOI - PMC - PubMed
    1. Abyzov A, Urban AE, Snyder M, and Gerstein M (2011). CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 21, 974–984. 10.1101/gr.114876.110. - DOI - PMC - PubMed
    1. Aguet F, Barbeira AN, Bonazzola R, Brown A, Castel SE, Jo B, and Kasela S (2020). The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330. - PMC - PubMed
    1. Aguirre M, Rivas MA, and Priest J (2019). Phenome-wide Burden of Copy-Number Variation in the UK Biobank. Am. J. Hum. Genet 105, 373–383. 10.1016/j.ajhg.2019.07.001. - DOI - PMC - PubMed
    1. Akiyama M, Okada Y, Kanai M, Takahashi A, Momozawa Y, Ikeda M, Iwata N, Ikegawa S, Hirata M, Matsuda K, et al. (2017). Genome-wide association study identifies 112 new loci for body mass index in the Japanese population. Nat. Genet 49, 1458–1467. 10.1038/ng.3951. - DOI - PubMed

Publication types

LinkOut - more resources