Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Aug;31(8):1313-1324.
doi: 10.1101/gr.275560.121. Epub 2021 Jul 9.

Characterizing nucleotide variation and expansion dynamics in human-specific variable number tandem repeats

Affiliations

Characterizing nucleotide variation and expansion dynamics in human-specific variable number tandem repeats

Meredith M Course et al. Genome Res. 2021 Aug.

Abstract

There are more than 55,000 variable number tandem repeats (VNTRs) in the human genome, notable for both their striking polymorphism and mutability. Despite their role in human evolution and genomic variation, they have yet to be studied collectively and in detail, partially owing to their large size, variability, and predominant location in noncoding regions. Here, we examine 467 VNTRs that are human-specific expansions, unique to one location in the genome, and not associated with retrotransposons. We leverage publicly available long-read genomes, including from the Human Genome Structural Variant Consortium, to ascertain the exact nucleotide composition of these VNTRs and compare their composition of alleles. We then confirm repeat unit composition in more than 3000 short-read samples from the 1000 Genomes Project. Our analysis reveals that these VNTRs contain highly structured repeat motif organization, modified by frequent deletion and duplication events. Although overall VNTR compositions tend to remain similar between 1000 Genomes Project superpopulations, we describe a notable exception with substantial differences in repeat composition (in PCBP3), as well as several VNTRs that are significantly different in length between superpopulations (in ART1, PROP1, DYNC2I1, and LOC102723906). We also observe that most of these VNTRs are expanded in archaic human genomes, yet remain stable in length between single generations. Collectively, our findings indicate that repeat motif variability, repeat composition, and repeat length are all informative modalities to consider when characterizing VNTRs and their contribution to genomic variation.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Characterization of 467 human-specific VNTR expansions assessed. (A) Violin plot of motif size for each VNTR. Lines show mean (40.1 bp) and standard deviation (±28.6 bp). (B) Scatter plot of motif size versus repeat copy number in the GRCh38 human reference genome. Axes are log2 and line is log-log (best-fit slope = −0.74). (C) Scatter plot of motif size versus total repeat length in the GRCh38 human reference genome. Axes are log2 and line is log-log (best-fit slope = 0.16). (D) Scatter plot of motif size versus average total repeat length in eight SMRT long-read sequenced genomes representing different superpopulations. Axes are log2 and line is log-log (best-fit slope = 0.034). (E) Bar chart summarizing overall VNTR locations in the genome. (F) Pie chart breaking down specific locations of VNTRs in exons.
Figure 2.
Figure 2.
Timing of expansion in 467 human-specific VNTRs. (A) Pie chart showing number of VNTRs expanded or not expanded in Neanderthal and/or Denisovan genomes. In total, 460 VNTRs were successfully assessed. (B) XY plot showing mean (black dots) and standard deviation (gray lines) ratio of child VNTR length to average parent VNTR length. Data are from 585 trios from the 1000 Genomes Project, and 455 VNTRs were successfully assessed and are ranked by mean ratio on the x-axis.
Figure 3.
Figure 3.
Composition plots illustrating modes of variability in VNTRs. (A) Schematic overview of how composition plots are generated from long-read sequencing. Example is from the CHM1 genome for the VNTR in ZNF667. (BH) Composition plots showing varying patterns in example VNTRs: (B) ZNF667, (C) PDE4D, (D) SORL1, (E) LOC102725191, (F) PLCB4, (G) WDR7, (H) VPS53. At the left of each plot are listed the genomes from which the allele has been obtained, which were previously sequenced and published, and which represent geographically diverse populations (Audano et al. 2019). Different alleles from the same individual are denoted with “a1” and “a2.” At the top of each plot is the motif length given by Tandem Repeats Finder, which can vary in length by one or more bases depending on insertions or deletions in each motif. Black segments in the plots denote motifs with private variants. (I) Composition plot for the VNTR in SLC22A1, with examples of duplication and deletion boxed in black and gray, respectively. The genomes used for this plot were previously sequenced and published (Ebert et al. 2021).
Figure 4.
Figure 4.
A VNTR in PCBP3 shows repeat motif differences in modern superpopulations. (A) PCBP3 repeat motifs with variable positions highlighted. At left is the assigned color code for each motif. At right is the relative abundance of each motif in the 1000 Genomes Project and in ancient genomes. (B) Composition plot of the PCBP3 VNTR in geographically diverse populations. At the left are listed the genomes from which the allele has been obtained, which were previously sequenced and published (Audano et al. 2019). Black segments in the plot denote motifs with private variants. (C) Cumulative frequency of repeat motifs in PCBP3 across superpopulations. Repeat motifs are ordered by decreasing abundance in the African superpopulation, and numbers on the y-axis correspond to their global ranked abundance.
Figure 5.
Figure 5.
Comparing VNTR lengths across modern superpopulations. Volcano plots showing pair-wise comparisons of average VNTR lengths between superpopulations from the 1000 Genomes Project. The VNTRs with the greatest length differences (as determined by DESeq2) are labeled by the nearest gene, or gene in which they reside, and were determined based on both log2-fold change and P-value. Trial size for each superpopulation is 347–660 individuals.
Figure 6.
Figure 6.
Length differences in the top four differentially expanded VNTRs in modern superpopulations. (AD) Individual VNTR copy numbers plus mean and standard deviation for each superpopulation (left) and cumulative abundance binned into groups of 10 repeat motifs (right) for VNTRs in LOC102723906 (A), PROP1 (B), ART1 (C), and DYNC2I1 (D). Trial size for each superpopulation is 347–660. One-way ANOVAs gave P < 0.0001 for each comparison of superpopulations for each VNTR.
Figure 7.
Figure 7.
Composition plots and copy number of the top four differentially expanded VNTRs in modern superpopulations. Composition plots for VNTRs in LOC102723906 (A), PROP1 (B), ART1 (C), and DYNC2I1 (D). The colors at the left of the plot denote the superpopulation from which the alleles were obtained (see key), which were previously sequenced and published (Ebert et al. 2021). Gray segments in the plot denote motifs that are rare or private. The y-axis shows the length of the repeat in number of repeat motifs. The heat map legend in B denotes the length of each repeat found in the PROP1 VNTR, which has been plotted based on this unique feature, instead of the motif structure used for the other VNTRs. (EH) Plots comparing average number of repeat motifs estimated from short-read data and average number of repeat motifs (from both alleles per individual) from phased long-read genomes are given for the same VNTRs, along with their R2 values.

References

    1. The 1000 Genomes Project Consortium. 2015. A global reference for human genetic variation. Nature 526: 68–74. 10.1038/nature15393 - DOI - PMC - PubMed
    1. Audano PA, Sulovari A, Graves-Lindsay TA, Cantsilieris S, Sorensen M, Welch AE, Dougherty ML, Nelson BJ, Shah A, Dutcher SK, et al. 2019. Characterizing the major structural variant alleles of the human genome. Cell 176: 663–675.e19. 10.1016/j.cell.2018.12.019 - DOI - PMC - PubMed
    1. Benson G. 1999. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27: 573–580. 10.1093/nar/27.2.573 - DOI - PMC - PubMed
    1. Berg IL, Neumann R, Lam KWG, Sarbajna S, Odenthal-Hesse L, May CA, Jeffreys AJ. 2010. PRDM9 variation strongly influences recombination hot-spot activity and meiotic instability in humans. Nat Genet 42: 859–863. 10.1038/ng.658 - DOI - PMC - PubMed
    1. Calderón MDC, Rey MD, Cabrera A, Prieto P. 2014. The subtelomeric region is important for chromosome recognition and pairing during meiosis. Sci Rep 4: 6488. 10.1038/srep06488 - DOI - PMC - PubMed

Publication types

LinkOut - more resources