Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 May 25:1:85-98.
doi: 10.1093/gbe/evp010.

Measuring transcription factor-binding site turnover: a maximum likelihood approach using phylogenies

Affiliations

Measuring transcription factor-binding site turnover: a maximum likelihood approach using phylogenies

Wolfgang Otto et al. Genome Biol Evol. .

Abstract

A major mode of gene expression evolution is based on changes in cis-regulatory elements (CREs) whose function critically depends on the presence of transcription factor-binding sites (TFBS). Because CREs experience extensive TFBS turnover even with conserved function, alignment-based studies of CRE sequence evolution are limited to very closely related species. Here, we propose an alternative approach based on a stochastic model of TFBS turnover. We implemented a maximum likelihood model that permits variable turnover rates in different parts of the species tree. This model can be used to detect changes in turnover rate as a proxy for differences in the selective pressures acting on TFBS in different clades. We applied this method to five TFBS in the fungi methionine biosynthesis pathway and three TFBS in the HoxA clusters of vertebrates. We find that the estimated turnover rate is generally high, with half-life ranging between approximately 5 and 150 My and a mode around tens of millions of years. This rate is consistent with the finding that even functionally conserved enhancers can show very low sequence similarity. We also detect statistically significant differences in the equilibrium densities of estrogen- and progesterone-response elements in the HoxA clusters between mammal and nonmammal vertebrates. Even more extreme clade-specific differences were found in the fungal data. We conclude that stochastic models of TFBS turnover enable the detection of shifts in the selective pressures acting on CREs in different organisms. The analysis tool, called CRETO (Cis-Regulatory Element Turn-Over) can be downloaded from http://www.bioinf.uni-leipzig.de/Software/creto/.

Keywords: cis-regulatory evolution; enhancer evolution; evolution of development; evolution of gene regulation; noncoding sequences; promoter evolution.

PubMed Disclaimer

Figures

F<sc>IG</sc>. 1.—
FIG. 1.—
Example of binding site number evolution for multiple turnover parameters: Phylogenetic tree with six species (A–F with 1–11 binding sites) and three parameter pairs. Each node is labeled with its unique label followed by the mean binding site number of the corresponding subtree and the difference of that number to that of the ancestor node. Because node 1 and node 11 have the largest differences, their subtrees are chosen by the program to have different rates (gray). Note that the rates are valid from the ancestor of the subtree on.
F<sc>IG</sc>. 2.—
FIG. 2.—
Likelihood surface of the model for binding site for CBF1 in the fungal data set. The likelihood is plotted as a function of the origination rate λ and the decay rate μ of the binding sites. The likelihood function has a distinct maximum with an extended ridge (black) between the λ/μ rate corresponding to the mean binding site number (green) and the λ/μ rate corresponding to the λ/μ rate at the likelihood maximum (blue).
F<sc>IG</sc>. 3.—
FIG. 3.—
Influence of the mean binding site number on the estimated λ/μ ratio. Although for four taxa and an RCA of 0.5 exists a strong correlation (Neyman–Pearson = 0.974) between these values (A), the correlation for 16 taxa at the same RCA is virtually 0 (Neyman–Pearson = −0.009) (B). In general, this correlation falls with rising taxa numbers and increase strongly with higher RCA (C). Because the mean binding site number tends to differ more from the equilibrium number for few taxa than for many taxa (see also supplementary fig. S5, Supplementary Material online), this suggests that λ/μ estimates are more reliable with high taxa numbers and low RCA.
F<sc>IG</sc>. 4.—
FIG. 4.—
Influence of the number of taxa on the accuracy of the individual parameter estimates: μ (A) as well as λ (B) show relatively accurate estimates from six taxa on. For smaller taxa numbers, the predictions become more inaccurate which suggest that parameter estimation requires at least six taxa.
F<sc>IG</sc>. 5.—
FIG. 5.—
Influence of the RCA on the V/m ratio of the binding site numbers. Although for large RCA, the ratio corresponds to the expected equilibrium rate V/m = 1, we see for small RCA, a V/m < 1 indicating a phylogenetic signal in the data for binary (blue) as well as for linear (red) trees. Hence, clades which are at least three times as old as the binding site half-life are expected to be in equilibrium. The ratios predicted by the BOE method (yellow, see text) correspond very well with the ML estimates.
F<sc>IG</sc>. 6.—
FIG. 6.—
Influence of the RCA on the accuracy of the estimates of individual parameters. For binary trees, the estimates of μ are relatively accurate up to an RCA of 3 (A) and start then to be biased toward higher values. The estimates of λ act similarly except the negative bias at RCA 16 (C). Estimates of μ (B) and λ (D) on linear trees are similar to that on binary trees except.
F<sc>IG</sc>. 7.—
FIG. 7.—
Relationship of the RCA and the V/m ratio of binding sites of the transcription factors in the methionine pathway of yeast. The letters in the graph refer to the clade labels in figure 8. Whereas for the youngest clade F the V/m ratios for all transcription factors are less than 1, the ratios tends to be large for older clades but still have a considerable variation among binding site classes. Note that V/m ratios larger than 1 suggest heterogeneity of binding site number among taxa.
F<sc>IG</sc>. 8.—
FIG. 8.—
Evolution of binding site number for transcription factors in the methionine pathway of yeast. Clades are marked by a letter at their root and a bracket with the mean binding site number and the V/m ratio above the corresponding species names. For some species the binding site numbers are also given above the name. The whole-genome duplication of the E clade is marked by WGD. For CBF1 (A), the likelihood model estimates a half-life of 57.3 My. There also seems to be a significant decrease in the binding site density in stem lineage of the G clade. GCN4 (B) has with 84.1 My, the longest estimated half-life and a significant loss of binding sites in the D clade. MET30/31 (C) has an estimated half-life of 24.2 My and like CBF1 a significant binding site loss in the G clade. BAS1 (D) has an estimated half-life of 22.7 My. RTG1/3 (E) has in the F clade a V/m ratio close to 1 (0.84), suggesting equilibrium within the 20 Mio years time frame of the F clade. This makes the estimation of the rate parameters difficult and yields to the shortest half-life of only 30.900 years.
F<sc>IG</sc>. 9.—
FIG. 9.—
Evolution of binding site numbers in the HoxA clusters of vertebrates. Clades are marked by a bracket with the mean binding site number and the V/m ratio above the corresponding species names. The binding site numbers of PRE (A) and ERE (B) are significantly overdispersed caused by differences between the mammalian and nonmammalian taxa. The mammalian clade shows no evidence of heterogeneity (V/m of PRE = 0.97, V/m of ERE = 0.64), which suggests a 2-fold increase in equilibrium density from about 5–10. The P value at the stem lineage of the mammalian clade are those of the LRT showing that the mammalian and the nonmammalian lineages have different turnover rates and equilibrium densities of PRE and ERE. RARE (C) show a very low density and variation with a V/m ratio of 0.47 suggesting a low turnover rate. There is also no evidence for heterogeneity which is consistent with the ancestral and conserved function of RARE.

Similar articles

Cited by

References

    1. Akbas GE, Song J, Taylor HS. A HOXA10 estrogen response element (ERE) is differentially regulated by 17 beta-estradiol and diethylstilbestrol (DES) J Mol Biol. 2004;340(5):1013–1023. - PubMed
    1. Balhoff JP, Wray GA. Evolutionary analysis of the well characterized endo16 promoter reveals substantial variation within functional sites. Proc Natl Acad Sci USA. 2005;102(24):8591–8596. - PMC - PubMed
    1. Bourdeau V, et al. Genome-wide identification of high-affinity estrogen response elements in human and mouse. Mol Endocrinol. 2004;18(6):1411–1427. - PubMed
    1. Chen F, Capecchi MR. Paralogous mouse Hox genes, Hoxa9, Hoxb9, and Hoxd9, function together to control development of the mammary gland in response to pregnancy. Proc Natl Acad Sci USA. 1999;96(2):541–546. - PMC - PubMed
    1. Crocker J, Erives A. A closer look at the eve stripe 2 enhancers of Drosophila and Themira. PLoS Genet. 2008;4(11):e1000276. - PMC - PubMed