Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Mar;38(4):1071-85.
doi: 10.1093/nar/gkp1124. Epub 2009 Dec 6.

Exonic remnants of whole-genome duplication reveal cis-regulatory function of coding exons

Affiliations

Exonic remnants of whole-genome duplication reveal cis-regulatory function of coding exons

Xianjun Dong et al. Nucleic Acids Res. 2010 Mar.

Abstract

Using a comparative genomics approach to reconstruct the fate of genomic regulatory blocks (GRBs) and identify exonic remnants that have survived the disappearance of their host genes after whole-genome duplication (WGD) in teleosts, we discover a set of 38 candidate cis-regulatory coding exons (RCEs) with predicted target genes. These elements demonstrate evolutionary separation of overlapping protein-coding and regulatory information after WGD in teleosts. We present evidence that the corresponding mammalian exons are still under both coding and non-coding selection pressure, are more conserved than other protein coding exons in the host gene and several control sets, and share key characteristics with highly conserved non-coding elements in the same regions. Their dual function is corroborated by existing experimental data. Additionally, we show examples of human exon remnants stemming from the vertebrate 2R WGD. Our findings suggest that long-range cis-regulatory inputs for developmental genes are not limited to non-coding regions, but can also overlap the coding sequence of unrelated genes. Thus, exonic regulatory elements in GRBs might be functionally equivalent to those in non-coding regions, calling for a re-evaluation of the sequence space in which to look for long-range regulatory elements and experimentally test their activity.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The GRB model, the evolutionary scenario to define RCEs and an example of an RCE. (A) A GRB is defined as a genomic region where a target gene (red) receives long-range regulatory inputs from an array of HCNEs (green ovals) that span the entire GRB and often intertwine with exons of unrelated bystander genes (orange). The regulatory elements need to stay in cis to their target gene to function, leading to the conservation of synteny between the target and its long-range regulatory inputs. In the evolutionary scenario illustrated, teleost WGD (red circle) and subsequent rediploidization (yellow fork) resulted in each gene being retained in a single functional copy. However, one exon fragment (blue dashed frame) that overlaps a regulatory element was retained in duplicate, with one copy remaining in conserved synteny with the target gene just like the HCNEs, and the other remaining as part of a functional gene elsewhere in the genome. We named such zebrafish exonic remnants and their vertebrate orthologs RCEs, and named the genes they are or were part of ‘RCE host genes’ (blue). (B) The PROX1 - RPS6KC1 locus. The prospero homeobox protein PROX1, which is essential for early development of the central nervous system (CNS), is an example of a 1-to-1 GRB orthology scenario. PROX1 has a bystander gene RPS6KC1 in the synteny block defined by PROX1 and the HCNEs spanning the locus. RPS6KC1 encodes a ribosomal protein kinase, which has no evidence for involvement in CNS development or for being tightly regulated in general. In this case, RPS6KC1, as the bystander gene, was lost in the zebrafish synteny block, leaving several human–zebrafish HCNEs in the gene desert created by its disappearance. Interestingly, three out of 15 exons were also kept as highly conserved remnants in the zebrafish (referred as RCE 9, 10, 11 in Supplementary Table S1).
Figure 2.
Figure 2.
Comparison of sequence conservation of RCE versus ancestral repeats and randomCDS. (A) Cumulative distribution of nucleotide substitution rates for 38 pairs of RCE region and local ancient repeats, and 38 randomly selected CDS regions from the same host genes. (B) Cumulative distribution of conservation scores for 38 pairs of RCE region and local ancient repeats, and randomly selected CDS regions from the same host genes.
Figure 3.
Figure 3.
Nucleotide distance of 4D sites. Histogram of nucleotide distances of RCE 4D sites (red line), RCE host gene 4D sites (green line), RCE host gene excluding the RCE (blue line), and the 4D sites from 1000 randomly selected human:mouse orthologous gene pairs (grey line). The P-value in the legend represents the significant difference level between the corresponding set and the random background set.
Figure 4.
Figure 4.
Fraction of exons overlapping with enhancer markers. Histogram of percentages of exons in GRBs (red), exons outside of GRBs (light blue) and all exons (grey) that overlap with enhancer marks (p300 and/or H3K4me1). The percentages were calculated based on 10 000 sets of 1000 randomly sampled exons for each category. The percentage of RCEs overlapping with enhancer marks is indicated by a vertical dotted line.
Figure 5.
Figure 5.
Transgenic experimental evidence for one RCE element. (A) Screenshot from the UCSC browser (hg18) showing sequences tested, and results from the zebrafish enhancer assay (PAX6_hsE2L—specific, PAX6_hsE2—variable, PAX6_hs4—unspecific). Other tracks visualize UCRs (51), enhancer test results from the VISTA Enhancer browser (4) and an in silico PCR mapping of the sequence E60A tested by Kleinjan et al. (43). (B–E) Zebrafish transgenic lines expressing EGFP driven by PAX6_hsE2L. (B) Lateral view, 1dpf; (C) ventral, 1dpf; (D) lateral, 2dpf; (E) ventral, 2dpf.

References

    1. Sandelin A, Bailey P, Bruce S, Engstrom PG, Klos JM, Wasserman WW, Ericson J, Lenhard B. Arrays of ultraconserved non-coding regions span the loci of key developmental genes in vertebrate genomes. BMC Genomics. 2004;5:99. - PMC - PubMed
    1. Woolfe A, Goodson M, Goode DK, Snell P, McEwen GK, Vavouri T, Smith SF, North P, Callaway H, Kelly K, et al. Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol. 2005;3:e7. - PMC - PubMed
    1. Kimura-Yoshida C, Kitajima K, Oda-Ishii I, Tian E, Suzuki M, Yamamoto M, Suzuki T, Kobayashi M, Aizawa S, Matsuo I. Characterization of the pufferfish Otx2 cis-regulators reveals evolutionarily conserved genetic mechanisms for vertebrate head specification. Development. 2004;131:57–71. - PubMed
    1. Pennacchio LA, Ahituv N, Moses AM, Prabhakar S, Nobrega MA, Shoukry M, Minovitsky S, Dubchak I, Holt A, Lewis KD, et al. In vivo enhancer analysis of human conserved non-coding sequences. Nature. 2006;444:499–502. - PubMed
    1. Kikuta H, Laplante M, Navratilova P, Komisarczuk AZ, Engstrom PG, Fredman D, Akalin A, Caccamo M, Sealy I, Howe K, et al. Genomic regulatory blocks encompass multiple neighboring genes and maintain conserved synteny in vertebrates. Genome Res. 2007;17:545–555. - PMC - PubMed

Publication types