Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 11;4(12):100699.
doi: 10.1016/j.xgen.2024.100699. Epub 2024 Nov 27.

Genome-wide investigation of VNTR motif polymorphisms in 8,222 genomes: Implications for biological regulation and human traits

Affiliations

Genome-wide investigation of VNTR motif polymorphisms in 8,222 genomes: Implications for biological regulation and human traits

Sijia Zhang et al. Cell Genom. .

Abstract

Variable number tandem repeat (VNTR) is a pervasive and highly mutable genetic feature that varies in both length and repeat sequence. Despite the well-studied copy-number variants, the functional impacts of repeat motif polymorphisms remain unknown. Here, we present the largest genome-wide VNTR polymorphism map to date, with over 2.5 million VNTR length polymorphisms (VNTR-LPs) and over 11 million VNTR motif polymorphisms (VNTR-MPs) detected in 8,222 high-coverage genomes. Leveraging the large-scale NyuWa cohort, we identified 2,982,456 (31.8%) NyuWa-specific VNTR-MPs, of which 95.3% were rare. Moreover, we found 1,937 out of 38,685 VNTRs that were associated with gene expression through VNTR-MPs in lymphoblastoid cell lines. Specifically, we clarified that the expansion of a likely causal motif could upregulate gene expression by improving the binding concentration of PU.1. We also explored the potential impacts of VNTR polymorphisms on phenotypic differentiation and disease susceptibility. This study expands our knowledge of VNTR-MPs and their functional implications.

Keywords: PU.1; VNTR; gene regulation; motif polymorphism; tandem repeat; whole-genome sequencing.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

None
Graphical abstract
Figure 1
Figure 1
VNTR-LPs and VNTR-MPs identified in this study (A) Comparison of VNTR length polymorphisms (VNTR-LPs; top) and VNTR motif polymorphisms (VNTR-MPs; bottom) identified in the NyuWa dataset with those identified in the 1KGP and HGDP datasets (left) and the East Asian samples from the 1KGP dataset (right). (B) Bar plot of frequency distributions for k-mers identified in the NyuWa dataset with k-mers identified in the 1KGP and HGDP datasets (left) and the East Asian samples from the 1KGP dataset (right). Inset, the distribution of rare k-mers (0 < frequency [freq] ≤ 0.01) is shown at a finer scale.
Figure 2
Figure 2
Cis-regulatory effects of eVNTRs and eMotifs in gene expression (A and B) Quantile-quantile plot comparing observed p values for VNTR gene association tests (two-sided t test in linear model) versus the expected uniform distribution in eVNTR (A) and eMotif (B) analysis. The blue dots represent the observed association tests, and gray dots represent p values for permutation control. The black line gives the expected p value distribution under the null hypothesis of no association. (C and D) Correlations of eVNTR (C) and eMotif (D) effect sizes identified in this study and a previous study by Lu et al. The blue points indicate eVNTRs and eMotifs whose directions of effect were concordant in two studies, and gray points denote eVNTRs and eMotifs with discordant directions of effect in two studies. The eVNTRs and eMotifs detected in both studies are colored red regardless of the concordance of effect. (E and F) Fold enrichment of eVNTRs (E) and eMotifs (F) in genome and epigenetic regions in the GM12878 cell line. A permutation test was repeated 1,000 times, and empirical p values were computed together with the enrichment values by GAT v.1.3.4. Points denote the enrichment values. Red and blue points denote significant enrichments or depletions (p < 0.05 after Benjamini and Hochberg correction), and bars show 95% confidence intervals. (G and H) Fold enrichment of eVNTRs (G) and eMotifs (H) in chromatin states defined by ChromHMM in the GM12878 cell line. A permutation test was repeated 1,000 times, and empirical p values were computed together with the enrichment values by GAT v.1.3.4. Points denote the enrichment values. Red and blue points denote significant enrichments or depletions (p < 0.05 after Benjamini and Hochberg correction), and bars show 95% confidence intervals. (I) Conservative evaluation of eVNTRs. We randomly selected 500 VNTRs unrelated to gene expression as controls. The median LINSIGHT scores in 50 bp windows were measured for visualization. (J) Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment for annotated genes of eMotifs. (K) Classification of predicted binding transcription factors for eMotifs.
Figure 3
Figure 3
Gene regulation of specific eMotif MAD1L1 polymorphism (A) Fine-mapped eMotif MAD1L1 (chr7:1,940,924). p value was obtained from eQTL association tests. Dots with diverse colors and shapes represent different PIP intervals. (B) Correlations between the dosage of the eMotif MAD1L1 and normalized expression of MAD1L1. The red line indicates the best fit under simple linear regression. (C) Predicted number of PU.1 binding sites across 9 length-divergent haplotypes. Each haplotype was scanned for matches with a 20 bp PU.1 binding motif (MA0080.5) using FIMO with a cutoff of p-adjusted < 10−2. (D) EMSA analysis of the 1CM and 2CM double-stranded DNA (dsDNA; 5 μM, 1 μL) binding to PU.1 ETS-domain (0.17 μg/μL, 1 μL) with (+) or without (−) specific competitor (20 μM, 4 μL). (E and F) Kinetic analysis of recombinant PU.1 ETS-domain binding to 1CM and 2CM dsDNA by surface plasmon resonance (SPR). The concentrations of the PU.1 ETS-domain protein are 6.25, 12.5, 25, 50, 100, 200, and 400 nM in each graph, respectively. Equilibrium and kinetic constants were calculated by a global fit to 1:1 Langmuir binding model. The gray dashed lines are curves fit to the data using the 1:1 binding model. RU, resonance units. (G) Dual-luciferase assays of enhancer activity for the empty vector (control) and the plasmids with the insertion of 1CM and 2CM in the forward position. The luciferase activity of each construct was normalized against the activity of Renilla luciferase. Data are shown as the median (minimum to maximum), from six independent experiments for each construction. p values were calculated by a two-sided Wilcoxon rank-sum test. ∗p < 0.05 and ∗∗p < 0.01.
Figure 4
Figure 4
Hypervariable VNTR motifs within superpopulations associated with human phenotypes (A) Gene Ontology (GO) enrichment analysis for annotated genes of hypervariable VNTR motifs in all superpopulations. The top ten items with significant p values are shown. (B) Tissue-specific gene enrichment analysis of hypervariable VNTR motifs is shown in bar plot. The y axis represents the adjusted p value derived from hypergeometric test and corrected using the Benjamini and Hochberg correction by TissueEnrich v.1.16.0. (C) Principal-component analysis of hypervariable VNTR motifs in all superpopulation samples and 200 randomly selected NyuWa samples. Shapes represent the superpopulation of each sample. Colors represent the population of each sample. (D) Distributions of lengths for two phenotype-related VNTR significant differences across superpopulations. p values were computed using the two-sided Wilcoxon rank-sum test. ∗∗∗p < 0.001 and ∗∗∗∗p < 0.0001. In (C) and (D), AFR denotes African superpopulation, AMR denotes American superpopulation, EAS denotes East Asian superpopulation, EUR denotes European superpopulation, and CSA denotes Central/South Asian superpopulation.
Figure 5
Figure 5
Length comparisons of 360 human-specific expansion VNTRs across five superpopulations Volcano plots show pairwise comparisons of average lengths for 360 human-specific expansion VNTRs between superpopulations from the 1KGP and HGDP. VNTRs with significant length differences are shown in red, while two novel VNTRs (black) and one previously reported VNTR (gray) are labeled by the nearest gene or the gene in which they reside. AFR, African superpopulation; AMR, American superpopulation; EAS, East Asian superpopulation; EUR, European superpopulation; SAS, South Asian superpopulation. p values are reported for the two-sided Wilcoxon rank-sum test and adjusted using the Benjamini and Hochberg correction.

Similar articles

Cited by

References

    1. Vergnaud G., Denoeud F. Minisatellites: mutability and genome architecture. Genome Res. 2000;10:899–907. doi: 10.1101/gr.10.7.899. - DOI - PubMed
    1. Sulovari A., Li R., Audano P.A., Porubsky D., Vollger M.R., Logsdon G.A., Human Genome Structural Variation Consortium. Warren W.C., Pollen A.A., Chaisson M.J.P., Eichler E.E. Human-specific tandem repeat expansion and differential gene expression during primate evolution. Proc. Natl. Acad. Sci. USA. 2019;116:23243–23253. doi: 10.1073/pnas.1912175116. - DOI - PMC - PubMed
    1. Chaisson M.J.P., Sulovari A., Valdmanis P.N., Miller D.E., Eichler E.E. Advances in the discovery and analyses of human tandem repeats. Emerg. Top. Life Sci. 2023;7:361–381. - PMC - PubMed
    1. Audano P.A., Sulovari A., Graves-Lindsay T.A., Cantsilieris S., Sorensen M., Welch A.E., Dougherty M.L., Nelson B.J., Shah A., Dutcher S.K., et al. Characterizing the Major Structural Variant Alleles of the Human Genome. Cell. 2019;176:663–675.e19. doi: 10.1016/j.cell.2018.12.019. - DOI - PMC - PubMed
    1. Linthorst J., Meert W., Hestand M.S., Korlach J., Vermeesch J.R., Reinders M.J.T., Holstege H. Extreme enrichment of VNTR-associated polymorphicity in human subtelomeres: genes with most VNTRs are predominantly expressed in the brain. Transl Psychiat. 2020;10:369. doi: 10.1038/s41398-020-01060-5. - DOI - PMC - PubMed

LinkOut - more resources