Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Apr 7;45(6):2960-2972.
doi: 10.1093/nar/gkw1350.

Bayesian prediction of RNA translation from ribosome profiling

Affiliations

Bayesian prediction of RNA translation from ribosome profiling

Brandon Malone et al. Nucleic Acids Res. .

Abstract

Ribosome profiling via high-throughput sequencing (ribo-seq) is a promising new technique for characterizing the occupancy of ribosomes on messenger RNA (mRNA) at base-pair resolution. The ribosome is responsible for translating mRNA into proteins, so information about its occupancy offers a detailed view of ribosome density and position which could be used to discover new translated open reading frames (ORFs), among other things. In this work, we propose Rp-Bp, an unsupervised Bayesian approach to predict translated ORFs from ribosome profiles. We use state-of-the-art Markov chain Monte Carlo techniques to estimate posterior distributions of the likelihood of translation of each ORF. Hence, an important feature of Rp-Bp is its ability to incorporate and propagate uncertainty in the prediction process. A second novel contribution is automatic Bayesian selection of read lengths and ribosome P-site offsets (BPPS). We empirically demonstrate that our read length selection technique modestly improves sensitivity by identifying more canonical and non-canonical ORFs. Proteomics- and quantitative translation initiation sequencing-based validation verifies the high quality of all of the predictions. Experimental comparison shows that Rp-Bp results in more peptide identifications and proteomics-validated ORF predictions compared to another recent tool for translation prediction.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
(A) Metagene profiles from a HEK293 dataset for reads with length 20 bp (top) and 21 bp (bottom). The reads of length 21 bp show a clear 3-nt periodicity, while those of length 20 bp do not. (B) Simplified view of graphical models for estimating the periodicity of metagene profiles. (Top) The periodic model, H_p, is a two-component mixture model in which the count of the first nucleotide of each codon is drawn from a ‘high’ component h while the other two nucleotides’ counts in the codon are drawn from a ‘low’ component l. That is, this model fits the ‘high-low-low’ pattern of translating ribosomes. (Bottom) The non-periodic model, H_n, is a naïve Bayes model in which all nucleotide counts are drawn from the same distribution.
Figure 2.
Figure 2.
The translation prediction workflow. For each identified ORF, posterior likelihoods of formula image and formula image are estimated from its smoothed profile using Hamiltonian Markov chain Monte Carlo. The posterior distribution of the Bayes factor is calculated (in closed form) from these estimates. The posterior Bayes factor distribution are used to label each ORF as ‘translated’ (✓) or ‘untranslated’ (✗).
Figure 3.
Figure 3.
The selected read lengths and P-site offsets selected by BPPS compared to manual selection.
Figure 4.
Figure 4.
The number of Rp-Bp ORFs using BPPS and manual length and P-site offset selection for the human and mouse datasets. The ORF types are described in the Supplementary Data.
Figure 5.
Figure 5.
The number of peptide sequences identified with (A) in silico digestion of the annotated proteins from GENCODE, and the ORFs predicted by Rp-Bp and RiboTaper for HEK293, (B) in silico digestion of the annotated proteins from WBcel235 and Rp-Bp for Caenorhabditis elegans, (C and D) MaxQuant for the respective datasets. The in silico digestion and MaxQuant details are given in the Supplementary Data.
Figure 6.
Figure 6.
The percentage of each type of Rp-Bp ORF (≥300 nt) from the HEK293 and Caenorhabditis elegans datasets with proteomics support. An ORF is considered to have proteomics support if at least one peptide detected by MaxQuant exactly aligns to the translated protein sequence for the ORF. Furthermore, we require the peptide uniquely align to that ORF. The numbers on the bars show the number of ORFs with and without proteomics support, as indicated.
Figure 7.
Figure 7.
(A and B) The percentage of Rp-Bp and RiboTaper micropeptides of different lengths (<100aa) with proteomics support in HEK293. Proteomics support is described in the caption of Figure 6. All ORF types are grouped based on bin sizes of 20 bp. The numbers on the bars show the number of micropeptides with and without proteomics support, as indicated. (C and D) The percentage of all Rp-Bp and RiboTaper ORFs with unique proteomics support in HEK293. All ORF types are grouped based on bin sizes of 300 bp. The counts are also available in Supplementary File 8.
Figure 8.
Figure 8.
The percentage of Rp-Bp Caenorhabditis elegans (A) micropeptides and (B) all ORFs with unique proteomics support, as described in Figure 7. The counts are also available in Supplementary Table S8.
Figure 9.
Figure 9.
The overlap of transcripts with a QTI-seq peak within 50 bp of the annotated start codon and a Rp-Bp ORF (of any type). The P-values are calculated using a hypergeometric test.

References

    1. Ingolia N.T. Ribosome profiling: new views of translation, from single codons to genome scale. Nat. Rev. Genet. 2014; 15:205–213. - PubMed
    1. Aeschimann F., Xiong J., Arnold A., Dieterich C., Großhans H.. Transcriptome-wide measurement of ribosomal occupancy by ribosome profiling. Methods. 2015; 85:75–89. - PubMed
    1. Guttman M., Russell P., Ingolia N., Weissman J., Lander E.. Ribosome profiling provides evidence that large noncoding RNAs do not encode proteins. Cell. 2013; 154:240–251. - PMC - PubMed
    1. Ingolia N.T., Ghaemmaghami S., Newman J. R.S., Weissman J.S.. Genome-wide analysis in vivo of translation with nucleotideresolution using ribosome profiling. Science. 2009; 324:218–223. - PMC - PubMed
    1. Olshen A.B., Hsieh A.C., Stumpf C.R., Olshen R.A., Ruggero D., Taylor B.S.. Assessing gene-level translational control from ribosome profiling. Bioinformatics. 2013; 29:2995–3002. - PMC - PubMed

Publication types

LinkOut - more resources