Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr 6;12(1):2075.
doi: 10.1038/s41467-021-22206-z.

Variable number tandem repeats mediate the expression of proximal genes

Affiliations

Variable number tandem repeats mediate the expression of proximal genes

Mehrdad Bakhtiari et al. Nat Commun. .

Abstract

Variable number tandem repeats (VNTRs) account for significant genetic variation in many organisms. In humans, VNTRs have been implicated in both Mendelian and complex disorders, but are largely ignored by genomic pipelines due to the complexity of genotyping and the computational expense. We describe adVNTR-NN, a method that uses shallow neural networks to genotype a VNTR in 18 seconds on 55X whole genome data, while maintaining high accuracy. We use adVNTR-NN to genotype 10,264 VNTRs in 652 GTEx individuals. Associating VNTR length with gene expression in 46 tissues, we identify 163 "eVNTRs". Of the 22 eVNTRs in blood where independent data is available, 21 (95%) are replicated in terms of significance and direction of association. 49% of the eVNTR loci show a strong and likely causal impact on the expression of genes and 80% have maximum effect size at least 0.3. The impacted genes are involved in diseases including Alzheimer's, obesity and familial cancers, highlighting the importance of VNTRs for understanding the genetic basis of complex diseases.

PubMed Disclaimer

Conflict of interest statement

V.B. is a co-founder, serves on the scientific advisory board, and has equity interest in Boundless Bio, inc. (BB) and Digital Proteomics, LLC (DP), and receives income from DP and BB. The terms of this arrangement have been reviewed and approved by the University of California, San Diego in accordance with its conflict of interest policies. BB and DP were not involved in the research presented here. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. VNTR performance.
a Length distribution of all known VNTRs (red) and selected targeted VNTRs (blue) across the GRCh38 human genome in base pairs. b The genotyping pipeline. c Neural network architecture for each VNTR which uses a mapping of reads to a k-mer composition vector. d Improvement in running time after using neural network and k-mer matching. e Accuracy and efficiency of read recruitment in simulated data. The scatter plot shows 1-efficiency ((TP + FP)/R) and recall (TP/(TP + FN)) of classification with different methods. High efficiency is related directly with running time. Each of 10,264 points represents a VNTR locus (Method) and are shown once for each method. The side and top panels show cumulative distributions of recall and 1-efficiency. f Base pairs (log-scale) affected by VNTRs per individual in the GTEx cohort. Source data are provided as a Source Data file.
Fig. 2
Fig. 2. Effect of VNTR genotypes on mediating gene expression.
a Location of target VNTRs and eVNTRs relative to the proximal genes. b Pipeline to identify eVNTRs and assign causality scores. Ancestry, Sex, and PEER factors are included in C as covariates. We associate VNTR genotype with expression residuals after correcting for the effect of C. c Quantile-quantile plot showing p values of association signals separated by tissue. Green line represents the p values using 100 permutations. d Number of unique and shared eVNTRs in each tissue. e Trend of RU count correlation with gene expression level. f Spearman correlation of eVNTRs effect sizes for each pair of tissues. g Scatter plot correlating effect size versus minor allele frequency (MAF). Source data are provided as a Source Data file.
Fig. 3
Fig. 3. Effect of VNTR genotypes on mediating gene expression.
a Association of AS3MT VNTR genotype with gene expression in brain cortex (n = 148 samples, Fisher’s two-sided P: 2.78 × 10−12). Box plots display the median, 25th and 75th percentiles. b Association with gene expression (upper panel) and CAVIAR causality probability of proximal SNPs—all SNPs in 100 kbp window on either side of the AS3MT VNTR (red-star). c Location of AS3MT VNTR relative to known regulatory elements. d, e Association with gene expression of the POMC VNTR (n = 378 samples, Fisher’s two-sided P: 1.53 × 10−9) and its causality probability relative to proximal SNPs. Box plots display the median, 25th and 75th percentiles. f Location of POMC VNTR relative to other regulatory regions and its spatial proximity with the promoter region revealed via Hi-C. g, h Association with gene expression of the ZNF232 VNTR (n = 114 samples, Fisher’s two-sided P: 5.47 × 10−9) and its causality score relative to proximal SNPs. Box plots display the median, 25th and 75th percentiles. Source data are provided as a Source Data file.

References

    1. Willems T, et al. The landscape of human STR variation. Genome Res. 2014;24:1894–1904. doi: 10.1101/gr.177774.114. - DOI - PMC - PubMed
    1. Gymrek, M. A genomic view of short tandem repeats. Curr. Opin. Genet. Dev. 44, 9–16 (2017). - PubMed
    1. Ræder, H. et al. Mutations in the CEL VNTR cause a syndrome of diabetes and pancreatic exocrine dysfunction. Nat. Genet. 38, 54–62 (2006). - PubMed
    1. Li M, et al. A human-specific AS3MT isoform and BORCS7 are molecular risk factors in the 10q24. 32 schizophrenia-associated locus. Nat. Med. 2016;22:649. doi: 10.1038/nm.4096. - DOI - PubMed
    1. Gemayel R, Vinces MD, Legendre M, Verstrepen KJ. Variable tandem repeats accelerate evolution of coding and regulatory sequences. Annu. Rev. Genet. 2010;44:445–477. doi: 10.1146/annurev-genet-072610-155046. - DOI - PubMed

Publication types