Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Feb 1;41(4):2105-20.
doi: 10.1093/nar/gks1456. Epub 2013 Jan 8.

Functional characterization of motif sequences under purifying selection

Affiliations

Functional characterization of motif sequences under purifying selection

De-Hua Chen et al. Nucleic Acids Res. .

Abstract

Diverse life forms are driven by the evolution of gene regulatory programs including changes in regulator proteins and cis-regulatory elements. Alterations of cis-regulatory elements are likely to dominate the evolution of the gene regulatory networks, as they are subjected to smaller selective constraints compared with proteins and hence may evolve quickly to adapt the environment. Prior studies on cis-regulatory element evolution focus primarily on sequence substitutions of known transcription factor-binding motifs. However, evolutionary models for the dynamics of motif occurrence are relatively rare, and comprehensive characterization of the evolution of all possible motif sequences has not been pursued. In the present study, we propose an algorithm to estimate the strength of purifying selection of a motif sequence based on an evolutionary model capturing the birth and death of motif occurrences on promoters. We term this measure as the 'evolutionary retention coefficient', as it is related yet distinct from the canonical definition of selection coefficient in population genetics. Using this algorithm, we estimate and report the evolutionary retention coefficients of all possible 10-nucleotide sequences from the aligned promoter sequences of 27 748. orthologous gene families in 34 mammalian species. Intriguingly, the evolutionary retention coefficients of motifs are intimately associated with their functional relevance. Top-ranking motifs (sorted by evolutionary retention coefficients) are significantly enriched with transcription factor-binding sequences according to the curated knowledge from the TRANSFAC database and the ChIP-seq data generated from the ENCODE Consortium. Moreover, genes harbouring high-scoring motifs on their promoters retain significantly coherent expression profiles, and those genes are over-represented in the functional classes involved in gene regulation. The validation results reveal the dependencies between natural selection and functions of cis-regulatory elements and shed light on the evolution of gene regulatory networks.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Left: A sequence space of fixed length as a graph. A node denotes a sequence, and an edge denotes two sequences differing at one position. Black nodes are members of a motif and white nodes are non-motifs. Dotted edges denote transitions between motifs and non-motifs. Solid edges denote transitions within motifs and non-motifs. Right: The state transition diagram of a birth–death model. State formula image denotes the count of motif occurrence on a promoter. formula image and formula image denote the birth and death rates emanating from state formula image.
Figure 2.
Figure 2.
Left: The scatter plot of canonical selection coefficients and evolutionary retention coefficients on simulated data. Each point denotes the scores obtained from 100 simulated sequences derived from one common ancestor over 100 generations. Right: Empirical distribution of selection coefficients among the formula image 10-mer sequences. The probabilities are displayed in a log scale.
Figure 3.
Figure 3.
Conservation of motif occurrence between humans and another species [P(motif occurs in a species | motif occurs in humans)] for the top 231 motifs and 231 control motifs from the middle of the ranked list. The horizontal axis denotes the species index with an increasing distance from humans (same as the species order in Supplementary Table S1). The vertical axis denotes the motif index from high selection coefficients (top) to low selection coefficients (bottom). The top-ranking and control motifs are separated by a white line. Colours in the heat map denote the levels of conditional probabilities between 0 (black) and 1 (bright red).
Figure 4.
Figure 4.
Enrichment of TRANSFAC motifs in high-scoring sequences. The blue curve shows the distribution of TRANSFAC motif occurrences along the normalized rank of the sorted 10-mer sequences [formula image in equation 10]. The red curve shows the CDF of a uniform distribution [formula image].
Figure 5.
Figure 5.
Enrichment of four functional classes—regulators, enzymes, structural proteins and transporters—among the genes harbouring the top-ranking and control motifs. The horizontal axis denotes the four functional classes. The vertical axis denotes the motif index from high selection coefficients (top) to low selection coefficients (bottom). The top-ranking and control motifs are separated by a yellow line. Colours in the heat map denote the magnitudes of log 10(hyper geometric P-values) from −6 (bright red) to 0 (black).

Similar articles

Cited by

References

    1. Carroll SB. Evolution at two levels: on genes and form. PLoS Biol. 2005;3:e245. - PMC - PubMed
    1. Davidson EH, Erwin DH. Gene regulatory networks and the evolution of animal body plans. Science. 2006;311:796–800. - PubMed
    1. King MC, Wilson AC. Evolution at two levels in humans and chimpanzees. Science. 1975;188:107–116. - PubMed
    1. Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES. Sequencing and comparison of yeast species to identify genes and regulatory motifs. Nature. 2003;423:241–254. - PubMed
    1. Siepel A, Haussler D. Combining phylogenetic and hidden Markov models in biosequence analysis. J. Comput. Biol. 2004;11:413–428. - PubMed

Publication types

Substances