Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Aug 13;33(7):2221-2234.
doi: 10.1093/plcell/koab107.

Length variation in short tandem repeats affects gene expression in natural populations of Arabidopsis thaliana

Affiliations

Length variation in short tandem repeats affects gene expression in natural populations of Arabidopsis thaliana

William B Reinar et al. Plant Cell. .

Abstract

The genetic basis for the fine-tuned regulation of gene expression is complex and ultimately influences the phenotype and thus the local adaptation of natural populations. Short tandem repeats (STRs) consisting of repetitive DNA motifs have been shown to regulate gene expression. STRs are variable in length within a population and serve as a heritable, but semi-reversible, reservoir of standing genetic variation. For sessile organisms, such as plants, STRs could be of major importance in fine-tuning gene expression as a response to a shifting local environment. Here, we used a transcriptome dataset from natural accessions of Arabidopsis thaliana to investigate population-wide gene expression patterns in light of genome-wide STR variation. We empirically modeled gene expression as a response to the STR length within and around the gene and demonstrated that an association between gene expression and STR length variation is unequivocally present in the sampled population. To support our model, we explored the promoter activity in a transcriptional regulator involved in root hair formation and provided experimentally determined causality between coding sequence length variation and promoter activity. Our results support a general link between gene expression variation and STR length variation in A. thaliana.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Description of the STRs and the genes included in gene expression modeling. A, The lines show densities of STRs in relation to gene TSSs. Peaks are present upstream and within the gene space. Densities were smoothed using kernel density estimation. Different line colors denote different STR unit sizes (see legend). The average gene size in A. thaliana (2,500 bp) is indicated by the pink rectangle. B, Top ten genotyped STR motifs included in gene expression modeling. C, The bar charts show the distribution of STRs in relation to the gene features (linked to the gene cartoon). Here, a value of 0.5 denotes the middle of the feature, read from 5′ to 3′. D, GO enrichment of biological processes linked to genes with STRs in the gene space or up to 500-bp upstream of the TSS. The Venn diagram shows the overlap of genes in GO terms related to development (purple), hormone pathways (blue), and stress (yellow). The bar charts show the number of genes in subcategories related to the three primary categories. The bars are colored by fold enrichment of the GO term, ranging from 1.25 to 2.75 (see colorbar).
Figure 2
Figure 2
Results of modeling gene expression as a response to STR unit number variation. A, Distribution of A. thaliana accessions included in our modeling. B, Description of the model employed to test whether natural allelic variation in STRs could explain gene expression patterns. Quantile-normalized and loge-transformed gene expression values served as a response (y) in a linear model with unit number variation in STRs (G, with effect size β) as an explanatory variable. Models with and without G were compared using log-likelihood tests. In addition to ε, which captures noise, we also included a genetic covariance matrix based on intergenic, pruned SNPs (X), which captures expected variation in gene expression given the genetic covariance between individuals. See Supplemental Methods for further elaboration on the model and model validation. C, Results of modeling gene expression as a function of the number of units in STRs within 100 kb of the gene. Both the statistical significance and the effect size peak when STRs are in close proximity to the TSS. Note that P-values are −log10-transformed for clarity. Each blue dot shows the P-value resulting from a log-likelihood ratio test between models with and without STRs as an explanatory variable (665,364 tests). Orange dots show the p-values when modeling mock STR genotypes (665,330 tests), none of which reaches the Bonferroni threshold. The x-axis shows the distance in base pairs (bp) between the STR and the gene TSS. In 2,000-bp windows, the centered and standardized percentage of tests below the global Bonferroni threshold are shown as a dark blue line, and the centered and standardized mean effect size is shown as an orange dashed line. D, Effect sizes conditioned on unit sizes. The higher the unit size, the larger the effect observed. Average A. thaliana gene size is denoted as a pink rectangle. E, Example associations between the number of units in STRs and the expression of genes. The examples illustrate local effects of eSTRs, such as A(n)-eSTR 4,644 bp upstream of RPM1. They also show that just a single unit increase in a protein coding eSTR can significantly influence expression levels, such as the GGA(n)-eSTR in METHYL ESTERASE 14 (MES14). Distal effects of eSTRs are also present, such as a T(n)-eSTR 31,289 bp upstream of the TSS of RFO1. Finally, ALFIN-LIKE 6 (AL6) expression levels are influenced by the most overrepresented protein coding eSTR motif, GAA(n). A complete list of named genes influenced by eSTRs is available as Supplemental Data Set S10.
Figure 3
Figure 3
Functional relevance of the GAA-encoded poly-E tract in ALFIN-LIKE 6. A, The nucleotide and amino acid alignment shows three different natural variants of the AL6 glutamate tract (poly-E) present in the 472 A. thaliana accessions. The poly-E tract is located immediately upstream of the start of the PHD zinc finger DNA binding domain. B and C, Transient expression of AL6 from Col-0 (AL6-7E-GFP) and AL6 from accession CS77246 (AL6-3E-GFP) in N. benthamiana leaves. B, AL6-7E-GFP and AL6-3E-GFP localize to the nuclei in N. benthamiana. A protein known to localize to the plasma membrane (PM) was used to outline the PM of the cells. C, Staining with 1-µM 4′,6-Diamidine-2′-phenylindole dihydrochloride shows that AL6 localizes to cell nuclei. D, Expression of AL6-7E-GFP, AL6-3E-GFP, and RING1A-mCherry fusion proteins prior to FRET analysis. E, Boxplots show the promoter activity of AL6 measured by a fluorescent GUS assay. There was a significant difference in AL6 promoter activity in tissue expressing AL6-7E-GFP compared with tissue expressing AL6-3E-GFP. See also Supplemental Figure S2. F, Boxplots showing the results from FRET analysis of protein–protein interaction between AL6 and RING1A. AL6 with three repeated glutamates (AL6-3E) interacts significantly stronger with RING1A than AL6 with seven glutamates(AL6-7E). Expression of AL6-7E-GFP alone served as a negative control. The dashed line indicates the common threshold for significant interaction in FRET experiments (Bleckmann et al., 2010). In (E) and (F), the measurements below the upper whisker and above the lower whisker fall within the interquartile range × 1.5. Measurements above or below the whiskers are indicated with diamond symbols. The asterisks indicate a statistically significant difference between groups.
None

References

    1. 1001 Genomes Consortium (2016). 1,135 Genomes reveal the global pattern of polymorphism in Arabidopsis thaliana. Cell 166: 481–491 - PMC - PubMed
    1. Bates D, Mächler M, Bolker B, Walker S (2015) Fitting linear mixed-effects models using lme4. J Stat Softw 67: 1–48
    1. Benson G (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27: 573–580 - PMC - PubMed
    1. Blázquez M (2007) Quantitative GUS activity assay in intact plant tissue. CSH Protocols 2007: db.prot4688 - PubMed
    1. Bleckmann A, Weidtkamp-Peters S, Seidel CAM, Simon R (2010) Stem cell signaling in Arabidopsis requires CRN to localize CLV2 to the plasma membrane. Plant Physiol 152: 166–176 - PMC - PubMed

Publication types

MeSH terms

Substances