Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jun 2;109(6):1065-1076.
doi: 10.1016/j.ajhg.2022.04.016. Epub 2022 May 23.

A phenome-wide association study identifies effects of copy-number variation of VNTRs and multicopy genes on multiple human traits

Affiliations

A phenome-wide association study identifies effects of copy-number variation of VNTRs and multicopy genes on multiple human traits

Paras Garg et al. Am J Hum Genet. .

Abstract

The human genome contains tens of thousands of large tandem repeats and hundreds of genes that show common and highly variable copy-number changes. Due to their large size and repetitive nature, these variable number tandem repeats (VNTRs) and multicopy genes are generally recalcitrant to standard genotyping approaches and, as a result, this class of variation is poorly characterized. However, several recent studies have demonstrated that copy-number variation of VNTRs can modify local gene expression, epigenetics, and human traits, indicating that many have a functional role. Here, using read depth from whole-genome sequencing to profile copy number, we report results of a phenome-wide association study (PheWAS) of VNTRs and multicopy genes in a discovery cohort of ∼35,000 samples, identifying 32 traits associated with copy number of 38 VNTRs and multicopy genes at 1% FDR. We replicated many of these signals in an independent cohort and observed that VNTRs showing trait associations were significantly enriched for expression QTLs with nearby genes, providing strong support for our results. Fine-mapping studies indicated that in the majority (∼90%) of cases, the VNTRs and multicopy genes we identified represent the causal variants underlying the observed associations. Furthermore, several lie in regions where prior SNV-based GWASs have failed to identify any significant associations with these traits. Our study indicates that copy number of VNTRs and multicopy genes contributes to diverse human traits and suggests that complex structural variants potentially explain some of the so-called "missing heritability" of SNV-based GWASs.

Keywords: CNV; GWAS; read depth; tandem repeat; variable number tandem repeat.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

Figure 1
Figure 1
Identification of extreme variations in gene copy number in the human population (A–C) (A) Diploid copy number estimates generated using mosdepth for the ꞵ-defensin gene family at 8p23.1 in ∼45,000 individuals from eight TOPMed cohorts used in this study. While most individuals carry between 2 and 8 copies of the ꞵ-defensin locus, we observed rare individuals with up to 16 copies and conversely one individual who apparently completely lacked ꞵ-defensin (Figure S5). Other examples of genes exhibiting extreme variations in copy number include (B) HPR, where one individual carried an estimated ∼42 copies, compared to a median of two in the general population, and (C) CCL3L1/CCL4L1. Plots in (B) and (C) show CNVnator relative diploid copy number per 500 bp bin in 225 selected individuals. Below each plot is an image of the region taken from the UCSC Genome Browser showing gene and segmental duplication annotations. Additional examples are shown in Figure S4.
Figure 2
Figure 2
Manhattan plots showing genomic location of variants identified in a PheWAS using 283 traits Results for (A) 54,479 VNTRs and (B) 878 multicopy genes in ∼35,000 TOPMed individuals. Name(s) of multicopy genes are shown adjacent to each significant association in the lower panel. Note the discontinuous y axis used to display the results for multicopy genes, resulting from the very strong association of LPA copy number with lipoprotein A levels. Dashed green and blue horizontal lines indicate the p = 0.05 Bonferroni and 1% FDR significance thresholds, respectively. Points in red indicate significant associations at <1% FDR. Full results are shown in Tables S3 and S4.
Figure 3
Figure 3
Genomic copy number of multicopy genes often correlates with their own expression level and that of multiple neighboring genes in cis (A and B) The RHD locus (chr1: 25,200,000–25,450,000) (A) and the HPR locus (chr16: 71,900,000–72,125,000) (B). At the base of each plot, the colored bar plots show custom UCSC Genome Browser tracks indicating significant (<10% FDR) correlation (R) values between estimated copy number of the CNV (red shaded) region and gene expression level across the 48 GTEx tissues analyzed. Direct correlations are indicated by positive R values (i.e., projecting up above the baseline), while inverse correlations are indicated by negative R values (i.e., projecting down below the baseline). For both RHD and HPR, increased genomic copy number resulted in increased expression (i.e., positive correlations) for genes within the CNV region. In addition, despite being located outside the CNV region, the expression level of multiple other neighboring genes also showed either positive and/or negative correlations with copy number of RHD and HPR. The upper plot in each panel shows CNVnator relative diploid copy number per 500 bp bin in 225 selected individuals, and the common copy number variable region is shaded in red. The lower plot in each panel shows an image of the region taken from the UCSC Genome Browser showing gene and segmental duplication annotations in addition to significant eQTL results. Complete eQTL data for all multicopy genes in the GTEx cohort are shown in Table S5.
Figure 4
Figure 4
Copy number of a 37mer VNTR located within a large cluster of T cell receptor genes at 14q11.2 (chr14: 22,355,658–22,355,834) is the likely causal variant associated with lymphocyte concentration in blood (A) We identified a strong and consistent association between copy number of this VNTR and lymphocyte concentration across all TOPMed cohorts and ancestries tested (discovery meta-analysis p = 2.9 × 10−30). In contrast, no prior GWASs have reported signals for lymphocyte concentration in this region. (B) We confirmed the absence of SNV associations in this region by repeating the association analysis with lymphocyte concentration using all SNVs located within ±100 kb of the VNTR (gray circles), which all yielded non-significant p values compared to the VNTR (black square). MsCAVIAR confirmed the VNTR as the single most likely causal variant to explain the observed association with lymphocyte concentration (posterior p = 0.988) (Table S8). The same VNTR was also significantly associated with white cell count, neutrophil count, and interleukin 6 levels.

Similar articles

Cited by

References

    1. Lander E.S., Linton L.M., Birren B., Nusbaum C., Zody M.C., Baldwin J., Devon K., Dewar K., Doyle M., Fitzhugh W., et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. doi: 10.1038/35057062. - DOI - PubMed
    1. Warburton P.E., Hasson D., Guillem F., Lescale C., Jin X., Abrusan G. Analysis of the largest tandemly repeated DNA families in the human genome. BMC Genomics. 2008;9:533. doi: 10.1186/1471-2164-9-533. - DOI - PMC - PubMed
    1. Chaisson M.J.P., Sanders A.D., Zhao X., Malhotra A., Porubsky D., Rausch T., Gardner E.J., Rodriguez O.L., Guo L., Collins R.L., et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun. 2019;10:1784. doi: 10.1038/s41467-018-08148-z. - DOI - PMC - PubMed
    1. Lu T.Y., Munson K.M., Lewis A.P., Zhu Q., Tallon L.J., Devine S.E., Lee C., Eichler E.E., Chaisson M.J.P. Profiling variable-number tandem repeat variation across populations using repeat-pangenome graphs. Nat. Commun. 2021;12:4250. doi: 10.1038/s41467-021-24378-0. - DOI - PMC - PubMed
    1. Bakhtiari M., Shleizer-Burko S., Gymrek M., Bansal V., Bafna V. Targeted genotyping of variable number tandem repeats with adVNTR. Genome Res. 2018;28:1709–1719. doi: 10.1101/gr.235119.118. - DOI - PMC - PubMed

Publication types

LinkOut - more resources