. 2024 Dec 11;4(12):100699.

doi: 10.1016/j.xgen.2024.100699. Epub 2024 Nov 27.

Genome-wide investigation of VNTR motif polymorphisms in 8,222 genomes: Implications for biological regulation and human traits

Sijia Zhang¹, Qiao Song², Peng Zhang², Xiaona Wang², Rong Guo², Yanyan Li², Shuai Liu², Xiaoyu Yan², Jingjing Zhang², Yiwei Niu², Yirong Shi², Tingrui Song², Tao Xu³, Shunmin He⁴

Affiliations

¹ Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China; Department of Scientific Research, Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research & Affiliated Cancer Hospital of Nanjing Medical University, Nanjing, China.
² Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.
³ College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China; National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan, Shandong 250117, China. Electronic address: xutao@ibp.ac.cn.
⁴ Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China. Electronic address: heshunmin@ibp.ac.cn.

PMID: 39609246
PMCID: PMC11701250
DOI: 10.1016/j.xgen.2024.100699

Genome-wide investigation of VNTR motif polymorphisms in 8,222 genomes: Implications for biological regulation and human traits

Sijia Zhang et al. Cell Genom. 2024.

. 2024 Dec 11;4(12):100699.

doi: 10.1016/j.xgen.2024.100699. Epub 2024 Nov 27.

Authors

Sijia Zhang¹, Qiao Song², Peng Zhang², Xiaona Wang², Rong Guo², Yanyan Li², Shuai Liu², Xiaoyu Yan², Jingjing Zhang², Yiwei Niu², Yirong Shi², Tingrui Song², Tao Xu³, Shunmin He⁴

Affiliations

¹ Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China; Department of Scientific Research, Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research & Affiliated Cancer Hospital of Nanjing Medical University, Nanjing, China.
² Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.
³ College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China; National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan, Shandong 250117, China. Electronic address: xutao@ibp.ac.cn.
⁴ Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China. Electronic address: heshunmin@ibp.ac.cn.

PMID: 39609246
PMCID: PMC11701250
DOI: 10.1016/j.xgen.2024.100699

Abstract

Variable number tandem repeat (VNTR) is a pervasive and highly mutable genetic feature that varies in both length and repeat sequence. Despite the well-studied copy-number variants, the functional impacts of repeat motif polymorphisms remain unknown. Here, we present the largest genome-wide VNTR polymorphism map to date, with over 2.5 million VNTR length polymorphisms (VNTR-LPs) and over 11 million VNTR motif polymorphisms (VNTR-MPs) detected in 8,222 high-coverage genomes. Leveraging the large-scale NyuWa cohort, we identified 2,982,456 (31.8%) NyuWa-specific VNTR-MPs, of which 95.3% were rare. Moreover, we found 1,937 out of 38,685 VNTRs that were associated with gene expression through VNTR-MPs in lymphoblastoid cell lines. Specifically, we clarified that the expansion of a likely causal motif could upregulate gene expression by improving the binding concentration of PU.1. We also explored the potential impacts of VNTR polymorphisms on phenotypic differentiation and disease susceptibility. This study expands our knowledge of VNTR-MPs and their functional implications.

Keywords: PU.1; VNTR; gene regulation; motif polymorphism; tandem repeat; whole-genome sequencing.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

**Figure 1**
VNTR-LPs and VNTR-MPs identified in this study (A) Comparison of VNTR length polymorphisms (VNTR-LPs; top) and VNTR motif polymorphisms (VNTR-MPs; bottom) identified in the NyuWa dataset with those identified in the 1KGP and HGDP datasets (left) and the East Asian samples from the 1KGP dataset (right). (B) Bar plot of frequency distributions for k-mers identified in the NyuWa dataset with k-mers identified in the 1KGP and HGDP datasets (left) and the East Asian samples from the 1KGP dataset (right). Inset, the distribution of rare k-mers (0 < frequency [freq] ≤ 0.01) is shown at a finer scale.

**Figure 2**
*Cis*-regulatory effects of eVNTRs and eMotifs in gene expression (A and B) Quantile-quantile plot comparing observed p values for VNTR gene association tests (two-sided t test in linear model) versus the expected uniform distribution in eVNTR (A) and eMotif (B) analysis. The blue dots represent the observed association tests, and gray dots represent p values for permutation control. The black line gives the expected p value distribution under the null hypothesis of no association. (C and D) Correlations of eVNTR (C) and eMotif (D) effect sizes identified in this study and a previous study by Lu et al. The blue points indicate eVNTRs and eMotifs whose directions of effect were concordant in two studies, and gray points denote eVNTRs and eMotifs with discordant directions of effect in two studies. The eVNTRs and eMotifs detected in both studies are colored red regardless of the concordance of effect. (E and F) Fold enrichment of eVNTRs (E) and eMotifs (F) in genome and epigenetic regions in the GM12878 cell line. A permutation test was repeated 1,000 times, and empirical p values were computed together with the enrichment values by GAT v.1.3.4. Points denote the enrichment values. Red and blue points denote significant enrichments or depletions (p < 0.05 after Benjamini and Hochberg correction), and bars show 95% confidence intervals. (G and H) Fold enrichment of eVNTRs (G) and eMotifs (H) in chromatin states defined by ChromHMM in the GM12878 cell line. A permutation test was repeated 1,000 times, and empirical p values were computed together with the enrichment values by GAT v.1.3.4. Points denote the enrichment values. Red and blue points denote significant enrichments or depletions (p < 0.05 after Benjamini and Hochberg correction), and bars show 95% confidence intervals. (I) Conservative evaluation of eVNTRs. We randomly selected 500 VNTRs unrelated to gene expression as controls. The median LINSIGHT scores in 50 bp windows were measured for visualization. (J) Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment for annotated genes of eMotifs. (K) Classification of predicted binding transcription factors for eMotifs.

**Figure 3**
Gene regulation of specific eMotif *MAD1L1* polymorphism (A) Fine-mapped eMotif *MAD1L1* (chr7:1,940,924). p value was obtained from eQTL association tests. Dots with diverse colors and shapes represent different PIP intervals. (B) Correlations between the dosage of the eMotif *MAD1L1* and normalized expression of *MAD1L1*. The red line indicates the best fit under simple linear regression. (C) Predicted number of PU.1 binding sites across 9 length-divergent haplotypes. Each haplotype was scanned for matches with a 20 bp PU.1 binding motif (MA0080.5) using FIMO with a cutoff of p-adjusted < 10⁻². (D) EMSA analysis of the 1CM and 2CM double-stranded DNA (dsDNA; 5 μM, 1 μL) binding to PU.1 ETS-domain (0.17 μg/μL, 1 μL) with (+) or without (−) specific competitor (20 μM, 4 μL). (E and F) Kinetic analysis of recombinant PU.1 ETS-domain binding to 1CM and 2CM dsDNA by surface plasmon resonance (SPR). The concentrations of the PU.1 ETS-domain protein are 6.25, 12.5, 25, 50, 100, 200, and 400 nM in each graph, respectively. Equilibrium and kinetic constants were calculated by a global fit to 1:1 Langmuir binding model. The gray dashed lines are curves fit to the data using the 1:1 binding model. RU, resonance units. (G) Dual-luciferase assays of enhancer activity for the empty vector (control) and the plasmids with the insertion of 1CM and 2CM in the forward position. The luciferase activity of each construct was normalized against the activity of Renilla luciferase. Data are shown as the median (minimum to maximum), from six independent experiments for each construction. p values were calculated by a two-sided Wilcoxon rank-sum test. ∗p < 0.05 and ∗∗p < 0.01.

**Figure 4**
Hypervariable VNTR motifs within superpopulations associated with human phenotypes (A) Gene Ontology (GO) enrichment analysis for annotated genes of hypervariable VNTR motifs in all superpopulations. The top ten items with significant p values are shown. (B) Tissue-specific gene enrichment analysis of hypervariable VNTR motifs is shown in bar plot. The y axis represents the adjusted p value derived from hypergeometric test and corrected using the Benjamini and Hochberg correction by TissueEnrich v.1.16.0. (C) Principal-component analysis of hypervariable VNTR motifs in all superpopulation samples and 200 randomly selected NyuWa samples. Shapes represent the superpopulation of each sample. Colors represent the population of each sample. (D) Distributions of lengths for two phenotype-related VNTR significant differences across superpopulations. p values were computed using the two-sided Wilcoxon rank-sum test. ∗∗∗p < 0.001 and ∗∗∗∗p < 0.0001. In (C) and (D), AFR denotes African superpopulation, AMR denotes American superpopulation, EAS denotes East Asian superpopulation, EUR denotes European superpopulation, and CSA denotes Central/South Asian superpopulation.

**Figure 5**
Length comparisons of 360 human-specific expansion VNTRs across five superpopulations Volcano plots show pairwise comparisons of average lengths for 360 human-specific expansion VNTRs between superpopulations from the 1KGP and HGDP. VNTRs with significant length differences are shown in red, while two novel VNTRs (black) and one previously reported VNTR (gray) are labeled by the nearest gene or the gene in which they reside. AFR, African superpopulation; AMR, American superpopulation; EAS, East Asian superpopulation; EUR, European superpopulation; SAS, South Asian superpopulation. p values are reported for the two-sided Wilcoxon rank-sum test and adjusted using the Benjamini and Hochberg correction.

See this image and copyright information in PMC

Cited by

A Tandem Repeat Atlas for the Genome of Inbred Mouse Strains: A Genetic Variation Resource.
Ren W, Liu W, Fang Z, Dolzhenko E, Weisburd B, Cheng Z, Peltz G. Ren W, et al. bioRxiv [Preprint]. 2025 May 24:2025.05.23.655792. doi: 10.1101/2025.05.23.655792. bioRxiv. 2025. PMID: 40475611 Free PMC article. Preprint.

References

1. Vergnaud G., Denoeud F. Minisatellites: mutability and genome architecture. Genome Res. 2000;10:899–907. doi: 10.1101/gr.10.7.899. - DOI - PubMed
1. Sulovari A., Li R., Audano P.A., Porubsky D., Vollger M.R., Logsdon G.A., Human Genome Structural Variation Consortium. Warren W.C., Pollen A.A., Chaisson M.J.P., Eichler E.E. Human-specific tandem repeat expansion and differential gene expression during primate evolution. Proc. Natl. Acad. Sci. USA. 2019;116:23243–23253. doi: 10.1073/pnas.1912175116. - DOI - PMC - PubMed
1. Chaisson M.J.P., Sulovari A., Valdmanis P.N., Miller D.E., Eichler E.E. Advances in the discovery and analyses of human tandem repeats. Emerg. Top. Life Sci. 2023;7:361–381. - PMC - PubMed
1. Audano P.A., Sulovari A., Graves-Lindsay T.A., Cantsilieris S., Sorensen M., Welch A.E., Dougherty M.L., Nelson B.J., Shah A., Dutcher S.K., et al. Characterizing the Major Structural Variant Alleles of the Human Genome. Cell. 2019;176:663–675.e19. doi: 10.1016/j.cell.2018.12.019. - DOI - PMC - PubMed
1. Linthorst J., Meert W., Hestand M.S., Korlach J., Vermeesch J.R., Reinders M.J.T., Holstege H. Extreme enrichment of VNTR-associated polymorphicity in human subtelomeres: genes with most VNTRs are predominantly expressed in the brain. Transl Psychiat. 2020;10:369. doi: 10.1038/s41398-020-01060-5. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Genome-wide investigation of VNTR motif polymorphisms in 8,222 genomes: Implications for biological regulation and human traits

Affiliations

Genome-wide investigation of VNTR motif polymorphisms in 8,222 genomes: Implications for biological regulation and human traits

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Related information

LinkOut - more resources

Full Text Sources