Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 May 2;33(9):981-93.
doi: 10.1002/embj.201488411. Epub 2014 Apr 4.

Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation

Affiliations

Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation

Ariel A Bazzini et al. EMBO J. .

Abstract

Identification of the coding elements in the genome is a fundamental step to understanding the building blocks of living systems. Short peptides (< 100 aa) have emerged as important regulators of development and physiology, but their identification has been limited by their size. We have leveraged the periodicity of ribosome movement on the mRNA to define actively translated ORFs by ribosome footprinting. This approach identifies several hundred translated small ORFs in zebrafish and human. Computational prediction of small ORFs from codon conservation patterns corroborates and extends these findings and identifies conserved sequences in zebrafish and human, suggesting functional peptide products (micropeptides). These results identify micropeptide-encoding genes in vertebrates, providing an entry point to define their function in vivo.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Ribosome profiling in zebrafish
  1. Schematic representation of ribosome profiling: 28 to 29-nt-long ribosome-protected fragments (RPFs) are generated from nuclease digestion, where the P-site of the ribosome is in position 13.

  2. Developmental stages at which ribosome profiling was performed.

  3. Subcodon position of the ribosome footprints (position 13) for the RPF and input reads. Plot shows the proportion of RPFs or input reads aligned to the coding sequence of RefSeq genes at each position relative to the codon. Input reads were obtained after poly-(A) fractionation and random fragmentation of the naked RNA.

  4. RPFs and input reads mapped to a composite RefSeq transcript. RPFs mainly map to the CDS with a 3-nucleotide periodicity. RPF reads are colored as in (C) based on the position with respect to the frame of the CDS. Input reads map to both the UTRs and CDS (gray).

  5. Subcodon profile plot showing RPF and input reads aligned to actinb1. Reads are colored based on the frame (1, 2 or 3) position relative to the transcript (Michel et al, 2012). All putative ORFs (distal AUG-Stop) were also colored for each respective frame (blue, pink and green boxes). Note that most of the RPFs from the annotated ORF match the color of the box, consistent with a strong in-frame distribution of reads within individual transcripts.

Figure 2
Figure 2. Defining actively translated regions by ribosome profiling
  1. Workflow to define the ORFscore: Top diagram represents a transcript, below solid bars represent all possible ORFs (Distal AUG-Stop) identified in each frame (+1, +2, +3). The RPF distribution in each frame is compared to an equally sized uniform distribution using a modified chi-squared statistic (see Materials and Methods). The resulting ORFscore is assigned a negative value when the distribution of RPFs is inconsistent with the frame of the CDS.

  2. Coverage is determined by measuring the proportion of in-frame CDS positions with ≥ 1 reads.

Figure 3
Figure 3. ORFscore discriminates translated from non-translated regions
A–D Scatterplot of the ORFscore and coverage for all ORFs (A), the subset of ORFs with the highest ORFscore per transcript (B) and short (20–100 aa) annotated CDS (D). Relative density plots (scaled to the maximum value for each group) of the ORFscore and coverage are shown for each ORF type. Note the separation between annotated ORFs from the rest of the ORFs, even for short (20–100 aa) annotated CDSs. (C) Color code used to label different ORF types found in RefSeq protein-coding transcripts: annotated CDS (green), 5′UTR ORFs (purple), 3′UTR ORFs (red) and ORFs overlapping the annotated CDS (orange). E Bar plots representing the number of ORFs identified on the basis of their ORFscore and coverage and defined as translated for each ORF type as in (C). Among all putative ORFs, the distribution of annotated ORFs was significantly different from the overall set (P = 2.2e-16, chi-squared test) with long and short CDS showing the highest fold-change enrichment in translated ORFs compared to other ORF types.
Figure 4
Figure 4. Identification of small coding ORFs (smORFs) in non-coding RNAs
A Scatterplot of ORFscore and coverage for the ORF with highest ORFscore per transcript. Shown are annotated short ORF (20–100 aa) (green), annotated lincRNA and “processed transcripts” from Ensembl (orange), non-coding RNAs described by Ulitsky et al (2011) (set 1, dark blue) and by Pauli et al (2012) (set 2, light blue) and ORFs in annotated 3′UTR used as negative control (red). Note that several ORFs in non-coding annotated transcripts score at comparable levels to annotated CDSs. Inset shows the scatter plot for annotated smORFs and 3′UTR ORFs. Relative density plots (scaled to the maximum value for each group) of the ORFscore and coverage are shown for each ORF type. B Subcodon profile plot showing a known non-coding RNA, cyrano, depleted of ribosome footprints. C Stacked plot showing the proportion of genes in which a translated ORF was defined by ORFscore and 10% coverage (*, stringent) or only ORFscore (**, permissive) and transcripts with low ORFscore (undetermined). The number of transcripts in each fraction is indicated. D Pie chart of BLASTp results against several organisms for the 241 newly defined translated regions, collapsed on amino acid sequence. E Bar plot showing the number of unique novel smORFs and Ensembl-predicted smORFs (≤ 100 aa), defined by ORFscore and 10% coverage (*, stringent and predicted) or only ORFscore (**, permissive). F Bar plot displaying the number of novel and Ensembl-predicted smORFs identified by tandem mass spectrometry (MS-MS). G Box plot representing the size distribution of the ORFs defined by ORFscore and MS-MS. H Bar plot showing the number of genes with translated ORFs in the 5′ or 3′ UTR defined by ORFscore or detected by MS-MS. I, J Subcodon profile plots showing individual examples of identified smORFs: Ribosome profiling data show the translated ORF and fragmentation spectra identifying the encoded peptides. K Heat-map showing dynamic expression of novel smORF-containing genes during zebrafish embryogenesis (= 190).
Figure 5
Figure 5. Computational identification of evolutionarily conserved smORFs (MicPDP)
A Number of smORFs detected within putative non-coding RNA transcripts in zebrafish and human. B,C Scatterplot of ORFscore and phyloCSF score for 686 zebrafish and 45,079 human smORFs with sufficient alignment coverage. The predictions of the two methods have small but significant overlap (light blue dots; < 2e-22 and < 6.3e-9 respectively, Fisher's exact test), and zebrafish experimental and computational results are correlated (Spearman's ρ = 0.49, P < 4e-42). D Scatterplot of ORFscore and coverage for 2,000 randomly selected human Ensembl-annotated coding ORFs (green), 2,000 ORFs in the 3′UTR and the set of coding ORFs from human lincRNAs as defined by ORFscore (blue, best ORFscore per unique genomic locus). E Subcodon profile plot, showing a smORF in the human predicted non-coding RNA ENST00000426713 (LINC00116-002) that presented high phyloCSF score and ORFscore.

Comment in

Similar articles

Cited by

References

    1. Barbosa C, Peixeiro I, Romao L. Gene expression regulation by upstream open reading frames and human disease. PLoS Genet. 2013;9:e1003529. - PMC - PubMed
    1. Bazzini AA, Lee MT, Giraldez AJ. Ribosome profiling shows that miR-430 reduces translation before causing mRNA decay in zebrafish. Science. 2012;336:233–237. - PMC - PubMed
    1. Birney E, Stamatoyannopoulos JA, Dutta A, Guigo R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE, Kuehn MS, Taylor CM, Neph S, Koch CM, Asthana S, Malhotra A, Adzhubei I, Greenbaum JA, Andrews RM, Flicek P, et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007;447:799–816. - PMC - PubMed
    1. Brar GA, Yassour M, Friedman N, Regev A, Ingolia NT, Weissman JS. High-resolution view of the yeast meiotic program revealed by ribosome profiling. Science. 2012;335:552–557. - PMC - PubMed
    1. Cabili MN, Trapnell C, Goff L, Koziol M, Tazon-Vega B, Regev A, Rinn JL. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 2011;25:1915–1927. - PMC - PubMed

Publication types

Associated data