Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Nov 6;11(11):e1005639.
doi: 10.1371/journal.pgen.1005639. eCollection 2015 Nov.

Dynamics of Transcription Factor Binding Site Evolution

Affiliations

Dynamics of Transcription Factor Binding Site Evolution

Murat Tuğrul et al. PLoS Genet. .

Abstract

Evolution of gene regulation is crucial for our understanding of the phenotypic differences between species, populations and individuals. Sequence-specific binding of transcription factors to the regulatory regions on the DNA is a key regulatory mechanism that determines gene expression and hence heritable phenotypic variation. We use a biophysical model for directional selection on gene expression to estimate the rates of gain and loss of transcription factor binding sites (TFBS) in finite populations under both point and insertion/deletion mutations. Our results show that these rates are typically slow for a single TFBS in an isolated DNA region, unless the selection is extremely strong. These rates decrease drastically with increasing TFBS length or increasingly specific protein-DNA interactions, making the evolution of sites longer than ∼ 10 bp unlikely on typical eukaryotic speciation timescales. Similarly, evolution converges to the stationary distribution of binding sequences very slowly, making the equilibrium assumption questionable. The availability of longer regulatory sequences in which multiple binding sites can evolve simultaneously, the presence of "pre-sites" or partially decayed old sites in the initial sequence, and biophysical cooperativity between transcription factors, can all facilitate gain of TFBS and reconcile theoretical calculations with timescales inferred from comparative genomics.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Biophysics of transcription regulation.
A) TFs bind to regulatory DNA regions (promoters and enhancers) in a sequence-specific manner to regulate transcriptional gene expression (mRNA production) level via different mechanisms, such as recruiting RNA polymerase (RNA-pol). B) A schematic of two types of mutational processes that we model: point mutations (left) and indel mutations (right). C) The mismatch binding model results in redundancy of genotype classes, with a binomial distribution (red) of genotypes in each mismatch class (some examples of degenerate sequences shown) D) The mapping from the TFBS regulatory sequence to gene expression level is determined by the thermodynamic occupancy (binding probability) of the binding site. If each of the k mismatches from the consensus sequence decreases the binding energy by ϵ, the occupancy of the binding site is π TD(k) = (1 + e β(ϵkμ))−1, where μ is the chemical potential (related to free TF concentration). A typical occupancy curve is shown in black (ϵ = 2 k B T and μ = 4 k B T); the gray curves show the effect of perturbation to these parameters (ϵ = 1 k B T, ϵ = 3 k B T and μ = 6 k B T); the orange curve illustrates the case of two cooperatively binding TFs (k c = 0 and E c = −3 k B T, see text for details). We pick two thresholds, shown in dashed lines, to define discrete binding classes: strong 𝓢 (π TD > 2/3) and weak 𝓦 (π TD < 1/3).
Fig 2
Fig 2. Single TF binding site gain rates at an isolated DNA region.
A) The dependence of the gain rate, 1/⟨t𝓢 ← k shown in units of point mutation rate, from sequences in different initial mismatch classes k (blue: k = 2, red: k = 5), as a function of selection strength. Results with point mutations only (θ = 0) are shown by dashed line; with admixture of indel mutations (θ = 0.15) by a solid line. For strong selection, Nsn log(2)/2, the rates scale with Ns, which is captured well by the “shortest path” approximation (black dashed lines in the main figure) of Eq (24). The biophysical parameters are: site length n = 7 bp; binding specificity ϵ = 2 k B T; chemical potential μ = 4 k B T. Points correspond to Wright-Fisher simulations with Nu = 0.01 where error bars cover ±2 SEM (standard error of mean). Inset shows the behavior of the gain rates as a function of the initial mismatch class k for Ns = 0 and Ns = 100. B, C) Gain rates from redundancy rich classes (k ∼ 3n/4, typical of evolution from random “virgin” sequence) under strong selection, without (B) and with (C) indel mutations supplementing the point mutations. Red crosshairs denote the cases depicted in panel A. Contour lines show constant gain rates in units of Ns u as a function of biophysical parameters n and ϵ. Wiggles in the contour lines are not a numerical artefact but a consequence of discrete mismatch classes.
Fig 3
Fig 3. Convergence to the stationary distribution of TFBS sequences.
A) Evolutionary dynamics of the mismatch classes distribution ψ(k) for an isolated TFBS under point and indel mutations (θ = 0.15), directional selection for stronger binding, and genetic drift is shown for initially well (k = 0, blue) and badly (k = 5, red) adapted populations. At left, no selection (Ns = 0); at right, strong selection (Ns = 100). Different curves show the distribution of genotype classes at different time points (t = 0u −1, 0.05u −1, 0.1u −1 as decreasing opacity); stationary distribution is shown in green. Insets show the time evolution to convergence for initially well (k = 0, blue) and badly (k = 5, red) adapted populations, measured by the Kullback-Leibler divergence D KL[ψ(t) ∣∣ ψ(t = ∞)]. The biophysical parameters are: n = 7 bp, ϵ = 2 k B T, μ = 4 k B T. B) Rate of convergence to the stationary distribution for different ϵ and n values under strong selection (Nsn log(2)/2; here specifically Ns = 100) and for θ = 0.15. Crosshairs represent the parameters used in a).
Fig 4
Fig 4. TF binding site evolution in a longer sequence of L = 30 base pairs.
The expected number of newly evolved TF binding sites with length n = 7 bp, under strong directional selection (Ns = 100) and both point and indel mutations (θ = 0.15). Time is measured in inverse mutation rates; the number of newly evolved sites is scaled to the selection strength and the sequence length. 1000 replicate simulations were performed with different initial sequences. Average number of sites shown by a solid black line; the gray band shows ±2 SEM (standard error of the mean) envelope. Dashed curves are analytical predictions based on single TFBS gain rates at an isolated DNA region, given by Eqs (27), (28) and (29). Biophysical parameters used: ϵ = 2 k B T, μ = 4 k B T. Insets: Expected number of newly evolved sites from a random sequence of length L at t = 0.001u −1 (left) and t = 0.1u −1 (right) for different binding length and specificity values, computed using the analytical predictions. Crosshairs denote the values used in the main panel.
Fig 5
Fig 5. Ancient sites and cooperativity can accelerate the emergence of TF binding sites in longer regulatory sequences.
A) The expected number of newly evolved TFBS in the presence (red and brown) or absence (black) of an ancient site, for binding site length n = 10 bp, and specificity, ϵ = 3 k B T. In this example, the ancient site was a consensus site (k = 0) or two mismatches away from it (k = 2) that evolved under neutrality for t′ = 0.1/u prior to starting this simulation. Dashed lines show the predictions of a simple analytical model, Eq (30). The inset shows how the number of newly evolved TFBS at t = 0.001/u scales with the mismatch of the ancient site k (plot markers: simulation means; error bars: two standard errors of the mean; dashed curve: prediction). B) The expected number of newly evolved TFBS without (black) and with cooperative interactions (for different cooperativity strengths, magenta: E c = −2 k B T, yellow: E c = −3 k B T, cyan: E c = −4 k B T, see Eq (11) in Methods and text) for binding site length n = 7 bp, and specificity, ϵ = 2 k B T. Both panels use μ = 4 k B T, strong selection (Ns = 100) and a combination of point and indel mutations (θ = 0.15), acting on a regulatory sequence of length L = 30 bp. Thick solid lines show an average over 1000 simulation replicates, shading denotes ±2 SEM.

References

    1. Fay JC, Wittkopp PJ. Evaluating the role of natural selection in the evolution of gene regulation. Heredity. 2007;100:191–199. 10.1038/sj.hdy.6801000 - DOI - PubMed
    1. Zheng W, Gianoulis TA, Karczewski KJ, Zhao H, Snyder M. Regulatory Variation Within and Between Species. Annual Review of Genomics and Human Genetics. 2011;12(1):327–346. 10.1146/annurev-genom-082908-150139 - DOI - PubMed
    1. Romero IG, Ruvinsky I, Gilad Y. Comparative studies of gene expression and the evolution of gene regulation. Nature Reviews Genetics. 2012. July;13(7):505–516. 10.1038/nrg3229 - DOI - PMC - PubMed
    1. Hoekstra HE, Coyne JA. The locus of evolution: evo devo and the genetics of adaptation. Evolution; International Journal of Organic Evolution. 2007. May;61(5):995–1016. 10.1111/j.1558-5646.2007.00105.x - DOI - PubMed
    1. Wittkopp PJ. Evolution of Gene Expression In: The Princeton Guide to Evolution. Princeton University Press; 2013. p. 413–419.

Publication types

MeSH terms

Substances