Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2006 Feb;16(2):190-6.
doi: 10.1101/gr.4246506. Epub 2005 Dec 19.

A genome-wide study of dual coding regions in human alternatively spliced genes

Affiliations
Comparative Study

A genome-wide study of dual coding regions in human alternatively spliced genes

Han Liang et al. Genome Res. 2006 Feb.

Abstract

Alternative splicing is a major mechanism for gene product regulation in many multicellular organisms. By using different exon combinations, some coding regions can encode amino acids in multiple reading frames in different transcripts. Here we performed a systematic search through a set of high-quality human transcripts and show that approximately 7% of alternatively spliced genes contain dual (multiple) coding regions. By using a conservative criterion, we found that in these regions most secondary reading frames evolved recently in mammals, and a significant proportion of them may be specific to primates. Based on the presence of in-frame stop codons in orthologous sequences in other animals, we further classified ancestral and derived reading frames in these regions. Our results indicated that ancestral reading frames are usually under stronger selection than are derived reading frames. Ancestral reading frames mainly influence the coding properties of these dual coding regions. Compared with coding regions of the whole genome, ancestral reading frames largely maintain similar nucleotide composition at each codon position and amino acid usage, while derived reading frames are significantly different. Our results also indicated that prior to acquisition of a new reading frame, the suppression of in-frame stop codons in the ancestral state is mainly achieved by one-step transition substitutions at the first or second codon position. Finally, the selective forces imposed on these dual coding regions will also be discussed.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Schematic representation of a dual coding region in the human ITGB4BP gene. Exons are represented by boxes, and introns are represented by connecting lines. Numbers inside the boxes refer to base pairs. Roman numerals indicate intron phases. The dual coding region is marked by a black horizontal arrow. Orthologous sequences for this region are shown in other species, and in-frame stop codons are marked by an underlined X. Based on this alignment, the table on the right summarizes the presence of stop codons in two reading frames. Bioinformatic supporting evidence for both reading frames is shown in the table on the left. White arrows indicate direction of data flow. NM_181466 and NM_181467 are RefSeq accession numbers.
Figure 2.
Figure 2.
Evolutionary origin of dual coding regions in the human genome. The frequency that one of two reading frames contains stop codons in orthologous sequences in each species is shown on each lineage. Two numbers are shown in parentheses: One is the number of the sequences in which one reading frame contains stop codons, and the other is the total sequence number in comparison. Only dual coding regions >100 nucleotides are used in this analysis.
Figure 3.
Figure 3.
Selection on ARFs and DRFs in dual coding regions. The black bars represent the frequency of amino acid substitution rate in ARFs; the striped bars represent the frequency of amino acid substitution rate in DRFs in dual coding regions. For all regions included in this analysis, the amino acid substitution rates between human and chimpanzee sequences are <0.03 substitutions per amino acid in both reading frames.
Figure 4.
Figure 4.
(A) GC% at each codon position in dual coding regions. The gray bars represent GC% at each codon position in the whole-genome coding regions, the black bars represent GC% in ARFs, and the striped bars represent GC% in DRFs in dual coding regions. (B) Amino acid usage in dual coding regions. The gray bars represent the frequency of amino acids in the whole-genome coding regions, the black bars represent the frequency of amino acids in ARFs, and the striped bars represent the frequency of amino acids in DRFs in dual coding regions, (C) The correlation between amino acid usage and nucleotides at codon positions in DRFs in dual coding regions. The overrepresented amino acids are colored in blue, the underrepresented amino acids are colored in green, and the stop codons are in red. The statistically significantly overrepresented amino acids (Phe, Ile, Val, Tyr, Asn, Lys, Asp, and Glu) are colored dark blue (P <10-3); the statistically significantly underrepresented amino acids (Pro, Ala, His, Gln, Cys, Trp, Arg, and Gly) are colored dark green (P <10-3).
Figure 4.
Figure 4.
(A) GC% at each codon position in dual coding regions. The gray bars represent GC% at each codon position in the whole-genome coding regions, the black bars represent GC% in ARFs, and the striped bars represent GC% in DRFs in dual coding regions. (B) Amino acid usage in dual coding regions. The gray bars represent the frequency of amino acids in the whole-genome coding regions, the black bars represent the frequency of amino acids in ARFs, and the striped bars represent the frequency of amino acids in DRFs in dual coding regions, (C) The correlation between amino acid usage and nucleotides at codon positions in DRFs in dual coding regions. The overrepresented amino acids are colored in blue, the underrepresented amino acids are colored in green, and the stop codons are in red. The statistically significantly overrepresented amino acids (Phe, Ile, Val, Tyr, Asn, Lys, Asp, and Glu) are colored dark blue (P <10-3); the statistically significantly underrepresented amino acids (Pro, Ala, His, Gln, Cys, Trp, Arg, and Gly) are colored dark green (P <10-3).

References

    1. Baranov, P.V., Gesteland, R.F., and Atkins, J.F. 2004. P-site tRNA is a crucial initiator of ribosomal frameshifting. RNA 10 221-230. - PMC - PubMed
    1. Blanchette, M., Kent, W.J., Riemer, C., Elnitski, L., Smit, A.F., Roskin, K.M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E.D., et al. 2004. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14 708-715. - PMC - PubMed
    1. Croft, L., Schandorff, S., Clark, F., Burrage, K., Arctander, P., and Mattick, J.S. 2000. ISIS, the intron information system, reveals the high frequency of alternative splicing in the human genome. Nat. Genet. 24 340-341. - PubMed
    1. Eigen, M. and Schuster, P. 1979. The hypercycle: A principle of natural self-organzation. Springer-Verlag, Berlin.
    1. Gattiker, A., Gasteiger, E., and Bairoch, A. 2002. ScanProsite: A reference implementation of a PROSITE scanning tool. Appl. Bioinformatics 1 107-108. - PubMed

Publication types

Substances

LinkOut - more resources