Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Nov;22(11):2219-29.
doi: 10.1101/gr.133249.111. Epub 2012 May 16.

Observation of dually decoded regions of the human genome using ribosome profiling data

Affiliations

Observation of dually decoded regions of the human genome using ribosome profiling data

Audrey M Michel et al. Genome Res. 2012 Nov.

Abstract

The recently developed ribosome profiling technique (Ribo-Seq) allows mapping of the locations of translating ribosomes on mRNAs with subcodon precision. When ribosome protected fragments (RPFs) are aligned to mRNA, a characteristic triplet periodicity pattern is revealed. We utilized the triplet periodicity of RPFs to develop a computational method for detecting transitions between reading frames that occur during programmed ribosomal frameshifting or in dual coding regions where the same nucleotide sequence codes for multiple proteins in different reading frames. Application of this method to ribosome profiling data obtained for human cells allowed us to detect several human genes where the same genomic segment is translated in more than one reading frame (from different transcripts as well as from the same mRNA) and revealed the translation of hitherto unpredicted coding open reading frames.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Utilization of triplet periodicity for detecting translated reading frames. (A) A plot of the number of RPFs aligning to particular mRNA positions between the 30th and the 47th nucleotide downstream from the start codon aggregated over 6000 human RefSeq mRNAs. In each codon, subcodon position 2 is shown as a red bar, while subcodon positions 1 and 3 are shown as blue and green bars, respectively. (B) A schematic representation of the generation of a subcodon profile from the corresponding RPF profile. Each subcodon position (blue indicates 1; red, 2; green, 3) is shown on separated plots. (C) The absolute number of RPFs aligning to each subcodon position is shown for the coding region of human Antizyme 1 (OAZ1) mRNA. The location of the programmed ribosomal frameshift site is indicated by a broken black line. (D) The distribution of the number of RPFs aligning to different subcodon positions, upstream of the frameshift site (left) and downstream (right). It can be seen that the subcodon position with the lowest RPF count shifts from the second to the third upon ribosomal frameshifting.
Figure 2.
Figure 2.
Computational approach for detecting transitions between reading frames and their performance on simulated dual coding. (A) Segments of pie charts represent the number of CDS codons with the specific number of RPFs aligning to them for the top 10 (left) and for the top 1000 (right) most covered mRNAs from the Guo et al. (2010) data set. It can be seen that, even for the most RPF-covered mRNAs, many CDS codons have no RPFs aligning to them. (B) Calculation of cumulative RPF subcodon proportion differences (CSCPD) upstream of and downstream from a sliding point x. Position a represents the annotated CDS start, while position b denotes the annotated CDS stop. Vertical lines represent RPFs that align at given CDS coordinates. (C) Principle of the automated scoring scheme, Periodicity Transition Score (PTS). PTS is calculated as the area (shaded in pink) where CSCPD over the examined CDS exceeds the expected level as estimated from the 95th quantile CSCPDs of the 1000 mRNA transcripts with the highest RPF coverage. For details, see Results. (D) Boxplots representing the distributions of PTS scores (y-axis) obtained for real ribosome profiles for mRNAs with artificially introduced frameshifts at different locations relative to the ends of CDS (x-axis). (E) Distribution of PTS for ribosome profiles on simulated mRNAs containing simultaneously translated dual coding regions of different lengths. The simulations were carried out for three sets of mRNAs with different RPF density as indicated in the figure. The shaded areas represent the lower and upper quartile intervals for each RPF density. (F) Distribution of PTS for simulated mRNAs containing dual coding regions with varying densities of RPFs in the alternative frame. Shading is as in E.
Figure 3.
Figure 3.
Classification of dual coding regions. (A) Classification and PTS of 108 candidates. (B) Schematic organization of three major classes of dual coding. pORFs are shown as light blue bars and alternative frames as light pink bars. Splicing organization: green bars correspond to exons included in transcript variants, and lines indicate intronic regions excised during splicing.
Figure 4.
Figure 4.
Dual coding in NPAS2 mRNA due to the presence of a translated nonupstream ORF and in THAP7 mRNA due to the overlap of the main ORF with an uORF. (A) Subcodon profile (top three rows) and mRNA-seq (fourth row) for NPAS2 mRNA (left; NM_002518) and THAP7 mRNA (right; NM_001008695). CDS coordinates are marked with dotted vertical lines. (B) ORF organization of NPAS2 mRNA (left) and THAP7 mRNA (right). The three reading frames are indicated as 1, 2, 3. Blue vertical lines indicate stop codons and start codons are indicated in red. Annotated CDS is shaded in light blue. The areas where translation in alternative frames is detected are shaded in light pink. (C) Comparative analysis of orthologous genomic sequences from 23 vertebrate species for NPAS2 (left) and from 19 vertebrate species for THAP7 (right). Colored bars represent codon substitutions within multiple sequence alignments for the standard (top) and alternative (bottom) reading frames (detailed alignments are in Supplemental Figs. S121, S122). Dark green and light green boxes correspond to synonymous and positive (in the BLOSUM62 matrix) substitutions, respectively; red boxes correspond to negative (in BLOSUM62 matrix) nonsynonymous substitutions. Gaps are shown in yellow and stop codons are in black. Stop codons are also aggregated across the entire alignment beneath each bar. Plots of coding likelihood are shown underneath the colored bars for both reading frames as calculated with MLOGD. Synonymous position conservation for the standard translation phase (pORF) is shown above the colored bar. (D) Exon organization of the NPAS2 locus (left) and the THAP7 locus (right). CCDS and RefSeq gene tracks from the UCSC Genome Browser are shown in green and blue bars, respectively. Alternatively decoded regions are indicated in red.
Figure 5.
Figure 5.
Dual coding in C11orf48 locus. (A) Subcodon profile and mRNA-seq for RefSeq mRNA NM_024099 (left) and predicted Ensembl transcript ENST00000524958 (right). (B) ORF organization of NM_024099 mRNA (left) and ENST00000524958 (right). (C) Comparative sequence analysis of corresponding genomic alignments from 15 vertebrate species for RefSeq mRNA NM_024099. (D) Exon organization of the C11orf48 locus. For detailed description, see legend to Figure 4. The higher density of mRNA-seq reads for NM_024099 (fourth row panel A, left) in the shaded pink area indicates that RNA-seq reads are being generated from an additional transcript variant corresponding to Ensembl transcript ENST00000524958. In panel C, it can be seen that for most of the predicted CDS, codon substitutions are consistent with RefSeq CDS predictions (the area is greener in the zero-frame). However, for the pink shaded area, substitutions are consistent with protein coding evolutionary signatures in the +1 frame. It can be seen that the coding likelihood for the +1 frame exceeds the threshold in the area of dual decoding. The conservation plot of synonymous codon positions, shown above the 0 frame, shows that conservation of synonymous positions is significantly higher in the shaded pink area. This is consistent with the purifying selection acting on protein coding sequences in two frames in this region.
Figure 6.
Figure 6.
Dual coding in alternatively spliced PHPT1 exon. (A) Subcodon profile and mRNA-seq for PHPT1 mRNA variant NM_001135861 (left) and variant NM_014172 (right). (B) ORF organization for NM_001135861 (left) and NM_014172 (right). (C) Analysis of codon substitutions within the multiple alignments of orthologous genomic sequences for NM_001135861. (D) Exon organization of the two PHPT1 mRNA variants. For notations, see legend to Figure 4. Subcodon profiles for variant NM_001135861 (panel A, left), which is the longest isoform (see Methods), indicate that while the translated frame is the same as the CDS for most of the CDS region (low RPFs density for the second [red] position), the sequence is translated in the +1 frame relative to the CDS frame at its end and downstream (pink shaded area). In addition, there is an evident gap in translation in the subcodon profile and mRNA-seq just prior to the pink shaded area, which corresponds to the third exon in PHPT1 mRNA variant NM_001135861 (panel D). As a result, the fourth exon in the NM_001135861 mRNA is in an alternative frame relative to the CDS start codon. Codon substitution analysis of multiple sequence alignments (panel C) is consistent with the dual decoding of the 5′ end of the fourth exon. Synonymous and positive nonsynonymous substitutions are predominant in both the zero and +1 frames in the locations where RPFs are found.

References

    1. Bekaert M, Firth AE, Zhang Y, Gladyshev VN, Atkins JF, Baranov PV 2010. Recode-2: New design, new search tools, and many more genes. Nucleic Acids Res 38: D69–D74 - PMC - PubMed
    1. Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, et al. 2004. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res 14: 708–715 - PMC - PubMed
    1. Brar GA, Yassour M, Friedman N, Regev A, Ingolia NT, Weissman JS 2012. High-resolution view of the yeast meiotic program revealed by ribosome profiling. Science 335: 552–557 - PMC - PubMed
    1. Calfon M, Zeng H, Urano F, Till JH, Hubbard SR, Harding HP, Clark SG, Ron D 2002. IRE1 couples endoplasmic reticulum load to secretory capacity by processing the XBP-1 mRNA. Nature 415: 92–96 - PubMed
    1. Chung WY, Wadhawan S, Szklarczyk R, Pond SK, Nekrutenko A 2007. A first look at ARFome: Dual-coding genes in mammalian genomes. PLoS Comput Biol 3: e91 doi: 10.1371/journal.pcbi.0030091 - PMC - PubMed

Publication types

LinkOut - more resources