Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;10(11):e1001420.
doi: 10.1371/journal.pbio.1001420. Epub 2012 Nov 6.

Adaptive evolution and the birth of CTCF binding sites in the Drosophila genome

Affiliations

Adaptive evolution and the birth of CTCF binding sites in the Drosophila genome

Xiaochun Ni et al. PLoS Biol. 2012.

Abstract

Changes in the physical interaction between cis-regulatory DNA sequences and proteins drive the evolution of gene expression. However, it has proven difficult to accurately quantify evolutionary rates of such binding change or to estimate the relative effects of selection and drift in shaping the binding evolution. Here we examine the genome-wide binding of CTCF in four species of Drosophila separated by between ∼2.5 and 25 million years. CTCF is a highly conserved protein known to be associated with insulator sequences in the genomes of human and Drosophila. Although the binding preference for CTCF is highly conserved, we find that CTCF binding itself is highly evolutionarily dynamic and has adaptively evolved. Between species, binding divergence increased linearly with evolutionary distance, and CTCF binding profiles are diverging rapidly at the rate of 2.22% per million years (Myr). At least 89 new CTCF binding sites have originated in the Drosophila melanogaster genome since the most recent common ancestor with Drosophila simulans. Comparing these data to genome sequence data from 37 different strains of Drosophila melanogaster, we detected signatures of selection in both newly gained and evolutionarily conserved binding sites. Newly evolved CTCF binding sites show a significantly stronger signature for positive selection than older sites. Comparative gene expression profiling revealed that expression divergence of genes adjacent to CTCF binding site is significantly associated with the gain and loss of CTCF binding. Further, the birth of new genes is associated with the birth of new CTCF binding sites. Our data indicate that binding of Drosophila CTCF protein has evolved under natural selection, and CTCF binding evolution has shaped both the evolution of gene expression and genome evolution during the birth of new genes.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Conserved binding preference of CTCF.
(A) Topological illustration of the phylogenetic relationships between the four Drosophila species in our study. (B) The number of CTCF binding peaks identified in ChIP-seq experiments in the four Drosophila species. (C) Genomic distribution of CTCF binding sites in the four Drosophila species. The percentages of CTCF binding sites distributed in different genomic locations are shown in the four pie charts: intergenic (>1 kb to nearest TSS, purple), promoter (<1 kb to nearest TSS, light blue), intronic (light green), and exonic (white). In all four species, >90% of the binding sites reside in the noncoding regions with highest percentages in promoter regions. (D) Species-specific binding motifs. The 9 bp core motif for each species is de novo generated by MEME using the top 2000 ChIP-seq-enriched CTCF binding site DNA sequences.
Figure 2
Figure 2. Diverged CTCF binding between Drosophila species.
(A) Evolutionary dynamics of CTCF binding profiles at the Bithorax complex region. The four colored wiggle file tracks show the ChIP CDP enrichment scores estimated from our quantitative analysis pipeline for the four species: D. melanogaster (blue), D. simulans (green), D. yakuba (orange), and D. pseudoobscura (purple). The four tracks are at the same scale, with the height of each curve at each coordinate denoting the enrichment score values. In the top panel, the blue arrows point to examples of conserved binding events across the four species, and the red arrows point to examples of diverged binding events between species. The fifth track shows the boundaries of previously identified insulator elements (in sky blue). The last track shows the genes in the genomic region. (B) Number of conserved and diverged binding events. From left to right, the three bar plots show the number of D. melanogaster–specific (pink), shared (blue), and non–D. melanogaster (D.xxx, yellow) specific binding events between each of the species pairs (D. melanogaster/D. simulans, D. melanogaster/D. yakuba, and D. melanogaster/D. pseudoobscura) for all binding events possibly identified (All, left), Two-Way Orthologous Binding events (TWOB, middle), and Four-Way Orthologous Binding events (FWOB, right). TWOB is defined as a binding event identified in regions where the sequence identity between the two compared species is >50%. FWOB is defined as a binding event identified in regions where the sequence identity across all four species is >50%. (C) Linear increase of pair-wise binding divergence with species divergent time. The binding divergence is calculated as the percent of D. melanogaster binding events not shared with the non–D. melanogaster species in each pair-wise comparison. Different shaped and colored points represent different groups of binding events as indicated by the legend. The red dashed line depicts the fitted linear regression line of TWOB binding divergence with divergent time. (D) Evolutionary groups of CTCF binding events. Top panel, representative dynamic binding profiles in the four Drosophila species (D. melanogaster, blue; D. simulans, green; D. yakuba, orange; D. pseudoobscura, purple) illustrating examples of 15 mutually exclusive evolutionary groups of binding status. The height at each binding curve denotes the ChIP CDP enrichment score estimated from our analyses pipeline. For each evolutionary group, the y-axes of the four binding curves are at the same scale. The first row of the lower table shows the Boolean conservation score corresponding to the binding profiles, where 0 indicates absence of binding event and 1 indicates the presence of binding events. The second and third rows of the lower table summarize the number of all binding events (second row) and FWOB events (third row) falling into each evolutionary group. The last row of the lower table shows the inferred evolutionary age for different groups of D. melanogaster binding events using Parsimony methods. * As for the evolutionary group with boolean conservation score 0,1,1,1, there is no instance identified in our analyses, so the representative binding profile in the figure is generated by artificially modifying another binding profile to represent the specific category.
Figure 3
Figure 3. Selection on CTCF motif sites.
(A) Proportion of binding sites with conserved motifs. The bar plots show proportions of D. melanogaster–specific (pink) and shared (green) binding sites that have conserved motifs between each species pair. A binding site is defined as having conserved motifs if there is at least one species-specific motif identified in the corresponding orthologous sequences. The p value cutoff for FIMO motif searching here is 0.005. For any species pair, the proportion of conserved (here shared) binding sites having conserved motifs is significantly higher than the diverged (here D. melanogaster–specific) binding sites. Significance levels: * p<0.05; ** p<0.01, two-sided Fisher's exact test. (B) Mean Tajima's D values for CTCF-motif sites. Tajima's D values were calculated using 37 D. melanogaster North American strains' polymorphism data for various groups of CTCF-motif sites, the synonymous and nonsynonymous sites of nearest genes, and randomly sampled 3′UTR, 5′UTR, and intergenic 9 bp sites. The center of each filled circle depicts the mean Tajima's D value for each group, with the error bar indicating 2 standard deviations. (C and D) Estimated shared proportion of adaptation with neutral reference to nearest gene synonymous sites (C) and a set of small introns (D). D. yakuba sequences were used as an out-group for estimating alpha values for different groups of CTCF-motif sites using an extension of the MK test framework. The filled colored circles depict the shared alpha value estimated within each group, with the error bar indicating the 95% confidence interval. Label abbreviations: Syn, synonymous sites of nearest genes of CTCF binding sites; Nonsyn, non-synonymous sites of nearest genes of CTCF binding sites; TWOB, CTCF-motif sites associated with two-way orthologous binding events between D. melanogaster and the out-group; conserved TWOB, CTCF-motif sites associated with conserved two-way orthologous binding events; diverged TWOB, CTCF-motif sites associated with D. melanogaster–specific two-way othologous binding events; FWOB binding, sites associated with four-way orthologous binding events; Young FWOB, sites associated with FWOBs, for which the age is estimated to be <2.5 Myr; old FWOB, sites associated with FWOBs, for which the age is estimated to be >6 Myr.
Figure 4
Figure 4. Functional consequences of CTCF binding evolution.
(A–B) CTCF binding evolution is associated with gene expression evolution. The bar plots show the proportion of genes with diverged expression between (A) D. melanogaster/D. simulans and (B) D. melanogaster/D. yakuba comparisons associated with different groups of CTCF binding sites: Genome-wide (black), Conserved TWOB (pink), Diverged TWOB (green), Old FWOB (orange), and Young FWOB (light purple). The table below each bar plot shows the number of genes with diverged and conserved gene expression in the corresponding comparisons and associated with the corresponding CTCF binding sites. For each groups of CTCF binding sites, the associated genes are the union of the nearest gene to each binding site. The evolutionary status of gene expression (conserved or diverged) is determined using triplicate WPP mRNA-seq data through a generalized linear regression framework. Label abbreviations are the same as described in Figure 3. Significance levels: * p<0.05; **p<0.01; one-sided Fisher's exact test. (C–E) CTCF binding evolution is correlated with new gene origination. The four colored wiggle tracks in each of the plots show the ChIP CDP enrichment scores of the four species (D. melanogaster, blue; D. simulans, green; D. yakuba, orange; D. pseudoobscura, purple) across different genomic regions. CTCF binding peaks are observed in D. melanogaster, D. simulans, and D. yakuba at flanking genomic regions of newly evolved genes TFII-A-S2 (C) and CheB93a (D). The two genes both originated after the split of the melanogaster group with the pseudoobscura group. CTCF binding peak is only observed in the D. melanogaster genome in the flanking genomic regions of D. melanogaster lineage-specific gene sphinx (E).

Comment in

Similar articles

Cited by

References

    1. Carroll SB (2008) Evo-devo and an expanding evolutionary synthesis: a genetic theory of morphological evolution. Cell 134: 25–36. - PubMed
    1. King MC, Wilson AC (1975) Evolution at two levels in humans and chimpanzees. Science 188: 107–116. - PubMed
    1. Wray GA (2007) The evolutionary significance of cis-regulatory mutations. Nat Rev Genet 8: 206–216. - PubMed
    1. Borneman AR, Gianoulis TA, Zhang ZD, Yu H, Rozowsky J, et al. (2007) Divergence of transcription factor binding sites across related yeast species. Science 317: 815–819. - PubMed
    1. Bradley RK, Li XY, Trapnell C, Davidson S, Pachter L, et al. (2010) Binding site turnover produces pervasive quantitative changes in transcription factor binding between closely related Drosophila species. PLoS Biol 8: e1000343 doi:10.1371/journal.pbio.1000343. - DOI - PMC - PubMed

Publication types

MeSH terms