Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2004 Aug;14(8):1530-6.
doi: 10.1101/gr.2662504. Epub 2004 Jul 15.

cis-Regulatory and protein evolution in orthologous and duplicate genes

Affiliations
Comparative Study

cis-Regulatory and protein evolution in orthologous and duplicate genes

Cristian I Castillo-Davis et al. Genome Res. 2004 Aug.

Abstract

The relationship between protein and regulatory sequence evolution is a central question in molecular evolution. It is currently not known to what extent changes in gene expression are coupled with the evolution of protein coding sequences, or whether these changes differ among orthologs (species homologs) and paralogs (duplicate genes). Here, we develop a method to measure the extent of functionally relevant cis-regulatory sequence change in homologous genes, and validate it using microarray data and experimentally verified regulatory elements in different eukaryotic species. By comparing the genomes of Caenorhabditis elegans and C. briggsae, we found that protein and regulatory evolution is weakly coupled in orthologs but not paralogs, suggesting that selective pressure on gene expression and protein evolution is quite similar and persists for a significant amount of time following speciation but not gene duplication. Additionally, duplicates of both species exhibit a dramatic acceleration of both regulatory and protein evolution compared to orthologs, suggesting increased directional selection and/or relaxed selection on both gene expression patterns and protein function in duplicate genes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Illustration of the shared motif method (SMM). The SMM discovers regions of local similarity between DNA sequences without respect to their order, orientation, or spacing. In this example, two 500-bp noncoding sequences, upstream from homologous coding sequences (CDS), are compared. After iterative local alignment in both their native and inverted sequence orientations (Methods), two regions of significant local similarity between the sequences were discovered. One region is 150 bp long but has been inverted in one of the sequences. The other is 20 bp long but has been translocated. The fraction of “shared motifs” between these sequences is simply (20 + 150) / 500, or 0.34. We define shared motif divergence (dSM) as one minus this fraction, or 1 – 0.34 = 0.66. Shared motif divergence is thus the fraction of the two sequences that does not contain a region of significant local alignment without respect to order, orientation, or spacing. Note, this example is a simplified caricature. Real sequence comparisons often exhibit more complex patterns of shared motif conservation (Supplemental Fig. 1).
Figure 2
Figure 2
Correlation between dSM and difference in magnitude of gene expression. We found a significant positive correlation between expression difference and shared motif divergence (dSM) in sequences 0–500 bp upstream of translation start between duplicate genes in duplicate families with two to five members (n = 76, rs = 0.47, P < 10–3; Spearman rank correlation). The log(expression difference) is linearly correlated with dSM (R = 0.46; P ≪ 10–3; Pearson linear correlation); a linear fit of the data is also plotted (y = 0.85 + 1.37x). Similar results were obtained with strict duplicate pairs or duplicate gene families of up to 10 members (data not shown).
Figure 3
Figure 3
Correlation between protein evolution (dN) and regulatory evolution (dSM) in (A) paralogous genes in C. elegans (rs = 0.24), (B) orthologous genes (rs = 0.16), and (C) paralogous genes in C. briggsae (rs = 0.21); P ≪ 10–4 for all tests. Multiple regressions that included dS revealed that the correlation between dN and dSM in paralogs is primarily a function of dS (i.e., duplicate age). No such effect was found in orthologs. Note that for some orthologs and duplicates, regulatory and protein evolution appear to be completely uncoupled.
Figure 4
Figure 4
Rates of protein evolution (dN/dS) in (A) paralogous genes in C. elegans, (B) orthologous genes between C. elegans and C. briggsae, and (C) paralogous genes in C. briggsae. Rates of regulatory evolution (dSM) in (D) paralogous genes in C. elegans, (E) orthologous genes between C. elegans and C. briggsae, and (F) paralogous genes in C. briggsae. In comparison with orthologs, duplicate genes in both the C. briggsae and C. elegans genomes exhibit a higher rate of amino-acid substitution and proximal cis-regulatory sequence evolution for the same amount of synonymous divergence.
Figure 5
Figure 5
Histogram of the rate of (A) protein evolution and (B) regulatory evolution in orthologs between C. elegans and C. briggsae vs. paralogs within C. elegans and C. briggsae. Both protein (dN) and regulatory evolution (dSM) are accelerated in paralogs compared to orthologs for the same amount of synonymous divergence. (C) Histogram of protein vs. regulatory evolution in orthologs and paralogs. For the same amount of regulatory divergence, paralogs have an accelerated rate of protein evolution compared to orthologs.

Similar articles

Cited by

References

    1. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25: 3389–3402. - PMC - PubMed
    1. Blencowe, B.J. 2000. Exonic splicing enhancers: Mechanism of action, diversity and role in human genetic diseases. Trends Biochem. Sci. 25: 106–110. - PubMed
    1. Castillo-Davis, C.I. and Hartl, D.L. 2002. Genome evolution and developmental constraint in Caenorhabditis elegans. Mol. Biol. Evol. 19: 728–735. - PubMed
    1. The C. elegans Sequencing Consortium. 1998. Genome sequence of the nematode C. elegans: A platform for investigating biology. Science 282: 2012–2018. - PubMed
    1. Conant, G.C. and Wagner, A. 2003. Asymmetric sequence divergence of duplicate genes. Genome Res. 13: 2052–2058. - PMC - PubMed

WEB SITE REFERENCES

    1. http://wormbase.org; Wormbase.
    1. http://www.oeb.harvard.edu/faculty/wakeley/; C source code of the SMM software (sharmot).

Publication types

Substances