Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 6;108(5):809-824.
doi: 10.1016/j.ajhg.2021.03.016. Epub 2021 Mar 31.

Pervasive cis effects of variation in copy number of large tandem repeats on local DNA methylation and gene expression

Affiliations

Pervasive cis effects of variation in copy number of large tandem repeats on local DNA methylation and gene expression

Paras Garg et al. Am J Hum Genet. .

Abstract

Variable number tandem repeats (VNTRs) are composed of large tandemly repeated motifs, many of which are highly polymorphic in copy number. However, because of their large size and repetitive nature, they remain poorly studied. To investigate the regulatory potential of VNTRs, we used read-depth data from Illumina whole-genome sequencing to perform association analysis between copy number of ∼70,000 VNTRs (motif size ≥ 10 bp) with both gene expression (404 samples in 48 tissues) and DNA methylation (235 samples in peripheral blood), identifying thousands of VNTRs that are associated with local gene expression (eVNTRs) and DNA methylation levels (mVNTRs). Using an independent cohort, we validated 73%-80% of signals observed in the two discovery cohorts, while allelic analysis of VNTR length and CpG methylation in 30 Oxford Nanopore genomes gave additional support for mVNTR loci, thus providing robust evidence to support that these represent genuine associations. Further, conditional analysis indicated that many eVNTRs and mVNTRs act as QTLs independently of other local variation. We also observed strong enrichments of eVNTRs and mVNTRs for regulatory features such as enhancers and promoters. Using the Human Genome Diversity Panel, we define sets of VNTRs that show highly divergent copy numbers among human populations and show that these are enriched for regulatory effects and preferentially associate with genes that have been linked with human phenotypes through GWASs. Our study provides strong evidence supporting functional variation at thousands of VNTRs and defines candidate sets of VNTRs, copy number variation of which potentially plays a role in numerous human phenotypes.

Keywords: VNTR; eQTL; mQTL; macrosatellite; minisatellite.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Copy number variation at thousands of VNTRs is associated with variation in gene expression and DNA methylation in cis (A) CNVnator-estimated copy number per 100 bp bin over a VNTR region shows highly variable read depth among samples from the GTEx cohort. Shown is read depth data for a 44-mer repeat that has 43 copies in the reference genome (chr12: 132,148,891–132,150,764, hg38), located intronic within NOC4L, which shows >10-fold difference in copy number within the population. (B) Read depth provides good accuracy for estimating diploid VNTR copy number. Using 14 samples where both Illumina and PacBio WGS data were available, at 1,891 eVNTR loci we compared diploid VNTR copy number estimates from WGS read depth by using CNVnator with direct genotypes derived from Pacific Biosciences long-read diploid assemblies. We observed a high correlation between the two approaches (R2 = 0.81). (C) Q-Q plots showing the distribution of observed versus expected p values for eVNTRs in 16 representative GTEx tissues. Variations in the observed p value distribution among GTEx tissues are a reflection of the varying sample sizes available, which strongly influence statistical power. (D) Manhattan plot showing results of cis-association analysis between VNTR copy number and gene expression in skeletal muscle samples from the GTEx cohort. The high frequency of significant associations in subtelomeric and centromeric regions is consistent with the known enrichment of VNTRs in these regions., (E) Significant eVNTRs are highly enriched within close proximity to the genes whose expression level they are associated with, mirroring similar observations made for SNV eQTLs., We also observed a similar relationship for mVNTRs and the CpGs they associate with (Figure S10), although we note an approximate order of magnitude difference in the distances over which significant eVNTRs and mVNTRs were typically observed to function.
Figure 2
Figure 2
Example associations of VNTRs with cis-linked DNA methylation and gene expression Copy number of a 72-mer tandem repeat (chr14: 105,271,805–105,272,305, hg38) is associated with DNA methylation levels at multiple CpGs spread over >80 kb and the expression of multiple genes in cis. (A) Manhattan plot of associations between copy number of this VNTR and CpG methylation within ± 50 kb. Significant CpGs (p < 0.01 after Bonferroni correction for the number of pairwise tests performed genome wide) are shown in color: red represents positive correlations with VNTR copy number and green indicates negative correlations. The location of the 72-mer VNTR is indicated by the vertical red bar in the center of the plot. The dashed gray line indicates the Bonferroni significance threshold. Above the plot is an image from the UCSC Genome Browser showing location of CpG islands, simple repeats, and RefSeq genes. (B and C) Correlation of VNTR copy number with CpG methylation (cg25733327) that lies 1 kb downstream of the TSS of BRF1 and expression of BRF1 in thyroid. (D and E) Correlation of VNTR copy number with CpG methylation (cg01181307) that lies 500 bp upstream of the TSS of BTBD6 and expression of BTBD6 in esophagus muscularis. For both genes, increased methylation levels around the TSS are associated with reduced gene expression, which is consistent with the known repressive effects of promoter methylation. (F) A 107-mer repeat (chr15: 100,554,293–100,558,659, hg38), increased copy number of which causes local hypermethylation. This VNTR also associates with the expression level of multiple nearby genes in many different tissues. (G) A 40-mer repeat (chr17: 82,764,738–82,765,449, hg38), which associates with methylation of multiple CpGs over an ∼50 kb region. This VNTR also associates with the expression level of multiple nearby genes in many different tissues. In (F) and (G), the location of the associated VNTR is shown by a red bar in the simple repeats track.
Figure 3
Figure 3
Copy number variation at the majority of VNTRs shows association with gene expression and DNA methylation independently of SNV eQTLs and mQTLs We performed conditional analysis of eVNTRs and mVNTRs after removing the effect of the strongest SNV QTL on the same target. Shown is an example locus, a 44-mer repeat that has 43 copies in the reference genome (chr12: 132,148,891–132,150,764, hg38), corresponding to the same VNTR shown in Figure 1A. This VNTR is located intronic within NOC4L and is significantly associated with NOC4L expression. (A) We identified rs11543305, a C/T variant that is located 1.6 kb proximal to the VNTR, as being the lead SNV associated with NOC4L expression. (B–D) After stratifying samples on the basis of genotype at rs11543305, copy number of this VNTR still shows a significant association with NOC4L expression (B). Considering all significant VNTRs we identified, including eVNTRs observed in GTEx (C) and mVNTRs observed in the PCGC cohort (D), there is a clear trend where the majority of observed VNTR associations retain their original signal even after conditioning on the genotype of the lead SNV QTL. These data indicate that the majority of VNTR associations we identified act independently of local SNV QTLs. In each plot, colored points represent VNTR associations that retain the same directionality after conditioning on the lead SNV QTL: either positive associations (green) or negative associations (red).
Figure 4
Figure 4
Replication of the majority of significant eVNTRs and mVNTRs in an independent cohort We performed replication analysis in the PPMI cohort, which comprises 712 individuals, with Illumina WGS, DNA methylation, and RNA-seq data derived from whole blood. We observed that 73% of significant eVNTRs detected in GTEx whole blood were also identified as significant in the PPMI cohort. Similarly, 80% of significant mVNTRs detected in the PCGC discovery cohort were also significant in the PPMI cohort. Points shown in gray were non-significant in both discovery and replication cohorts, points in orange were significant in one cohort, while points in red were significant in both cohorts.
Figure 5
Figure 5
Additional replication of mVNTRs from direct VNTR genotyping and methylation profiling in 30 genomes sequenced with Oxford Nanopore long reads (A) Outline of how phased long reads can be used to perform allelic association analysis of VNTR genotype with cis-linked CpG methylation levels. In each individual, ONT reads are phased into the two haplotypes via SNVs (colored letters), VNTRs (blue blocks) are genotyped directly on each haplotype based on the phased assemblies, and CpG methylation levels (lollipops) on each haplotype are estimated on the basis of electrical current signals from each phased read. (B) For mVNTR:CpG pairs identified in the PCGC discovery cohort that had ≥20 haplotypes each with ≥10× coverage in the 30 available ONT genomes, 163 of 228 (71%) showed the same directionality of association in this independent dataset. (C) Copy number of an 83-mer VNTR (chr17: 216,953–218,561, hg38, indicated by the red bar) that lies intronic within RPH3AL is positively associated with local DNA methylation, including an annotated enhancer of RPH3AL. This same VNTR was negatively associated with RPH3AL expression in 22 GTEx tissues. (D) Copy number of a 32-mer VNTR (chr1: 1,080,637–1,081,029, hg38, indicated by the red bar) that lies ∼800 bp upstream of C1orf159 is negatively associated with local DNA methylation, including a region of H3K4 mono-methylation and DNaseI hypersensitivity. This same VNTR was positively associated with C1orf159 expression in six GTEx tissues. In (C) and (D), plots show the correlation (R) values and unadjusted p values between VNTR copy number and CpG methylation measured directly from ONT reads. The dashed vertical lines indicate the position of a CpG that was associated with VNTR copy number in the PCGC discovery cohort. Correlation values are colored according to their significance in the 30 ONT genomes: yellow indicates p < 0.1, orange p < 0.05, and red p < 0.01. Below the plots are screenshots from the UCSC Genome Browser showing annotations of RefSeq genes, simple repeats, and regulatory regions.
Figure 6
Figure 6
VNTRs with high population divergence are enriched for functional associations with gene expression, methylation, and human traits We estimated population stratification of VNTR copy number with the VST statistic in samples from the Human Genome Diversity Panel. Both eVNTRs and mVNTRs were enriched for VNTRs with high VST, and consistent with the notion that selection may have acted to modify copy number of functional VNTR loci in specific populations, we also observed that eVNTRs with elevated VST were enriched for putative phenotype associations. Shown are six example VNTRs with high VST. (A) A 40-mer VNTR (chr12: 499,333–499,718, hg38) expanded in the Oceanic population. (B) A 33-mer VNTR (chr21: 27,626,691–27,627,440, hg38) expanded in Asians. (C) A 20-mer VNTR (chr2: 220,823,510–220,823,980, hg38) expanded in Americans. (D) An 81-mer VNTR (chr2: 241,457,351–241,457,836, hg38) expanded in Americans is associated with expression level of SEPT2 (MIM: 601506) in skin and thyroid and is potentially linked to multiple human traits by GWASs. (E) A 24-mer VNTR (chr20: 62,825,064–62,825,209, hg38) expanded in East Asians is associated with expression level of COL9A3 (MIM: 120270) in adipose tissue, muscle, and blood. (F) A 39-mer VNTR (chr19: 3,177,632–3,178,287, hg38) expanded in East Asians is associated with expression level of S1PR4 (MIM: 603751) in mammary tissue, thyroid, and esophagus.

Similar articles

Cited by

References

    1. Lander E.S., Linton L.M., Birren B., Nusbaum C., Zody M.C., Baldwin J., Devon K., Dewar K., Doyle M., FitzHugh W., International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. - PubMed
    1. Perry G.H., Dominy N.J., Claw K.G., Lee A.S., Fiegler H., Redon R., Werner J., Villanea F.A., Mountain J.L., Misra R. Diet and the evolution of human amylase gene copy number variation. Nat. Genet. 2007;39:1256–1260. - PMC - PubMed
    1. Warburton P.E., Hasson D., Guillem F., Lescale C., Jin X., Abrusan G. Analysis of the largest tandemly repeated DNA families in the human genome. BMC Genomics. 2008;9:533. - PMC - PubMed
    1. Course M.M., Gudsnuk K., Smukowski S.N., Winston K., Desai N., Ross J.P., Sulovari A., Bourassa C.V., Spiegelman D., Couthouis J. Evolution of a Human-Specific Tandem Repeat Associated with ALS. Am. J. Hum. Genet. 2020;107:445–460. - PMC - PubMed
    1. Song J.H.T., Lowe C.B., Kingsley D.M. Characterization of a Human-Specific Tandem Repeat Associated with Bipolar Disorder and Schizophrenia. Am. J. Hum. Genet. 2018;103:421–430. - PMC - PubMed

MeSH terms

LinkOut - more resources