. 2018 Feb 28;6(2):180-191.e4.

doi: 10.1016/j.cels.2017.12.007. Epub 2018 Jan 17.

Scikit-ribo Enables Accurate Estimation and Robust Modeling of Translation Dynamics at Codon Resolution

Han Fang¹, Yi-Fei Huang², Aditya Radhakrishnan³, Adam Siepel², Gholson J Lyon⁴, Michael C Schatz⁵

Affiliations

¹ Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA; Department of Applied Mathematics & Statistics, Stony Brook University, Stony Brook, NY 11794, USA.
² Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.
³ Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD 21205, USA.
⁴ Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.
⁵ Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA; Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD 21211, USA. Electronic address: mschatz@cs.jhu.edu.

PMID: 29361467
PMCID: PMC5832574
DOI: 10.1016/j.cels.2017.12.007

Scikit-ribo Enables Accurate Estimation and Robust Modeling of Translation Dynamics at Codon Resolution

Han Fang et al. Cell Syst. 2018.

. 2018 Feb 28;6(2):180-191.e4.

doi: 10.1016/j.cels.2017.12.007. Epub 2018 Jan 17.

Authors

Han Fang¹, Yi-Fei Huang², Aditya Radhakrishnan³, Adam Siepel², Gholson J Lyon⁴, Michael C Schatz⁵

Affiliations

¹ Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA; Department of Applied Mathematics & Statistics, Stony Brook University, Stony Brook, NY 11794, USA.
² Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.
³ Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD 21205, USA.
⁴ Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.
⁵ Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA; Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD 21211, USA. Electronic address: mschatz@cs.jhu.edu.

PMID: 29361467
PMCID: PMC5832574
DOI: 10.1016/j.cels.2017.12.007

Abstract

Ribosome profiling (Ribo-seq) is a powerful technique for measuring protein translation; however, sampling errors and biological biases are prevalent and poorly understood. Addressing these issues, we present Scikit-ribo (https://github.com/schatzlab/scikit-ribo), an open-source analysis package for accurate genome-wide A-site prediction and translation efficiency (TE) estimation from Ribo-seq and RNA sequencing data. Scikit-ribo accurately identifies A-site locations and reproduces codon elongation rates using several digestion protocols (r = 0.99). Next, we show that the commonly used reads per kilobase of transcript per million mapped reads-derived TE estimation is prone to biases, especially for low-abundance genes. Scikit-ribo introduces a codon-level generalized linear model with ridge penalty that correctly estimates TE, while accommodating variable codon elongation rates and mRNA secondary structure. This corrects the TE errors for over 2,000 genes in S. cerevisiae, which we validate using mass spectrometry of protein abundances (r = 0.81), and allows us to determine the Kozak-like sequence directly from Ribo-seq. We conclude with an analysis of coverage requirements needed for robust codon-level analysis and quantify the artifacts that can occur from cycloheximide treatment.

Keywords: Ribo-seq; bioinformatics; machine learning; statistical method; translation.

PubMed Disclaimer

Figures

**Figure 1. Sources of biases using ribosomes densities per mRNA (RPKM-derived TE) as a proxy for TE**
***(A)*** Sampling biases towards low abundance genes (left), and biological biases due to paused ribosomes (right). ***(B)*** Idealized ribosome footprints distribution without biases (left), or with downstream mRNA secondary structure and low conjugate tRNA availability for the A-site codon (right). ***(C)*** Confounding effects of translation initiation and elongation on Riboseq profiles, figure adapted from Quax *et al* 2013. Initiation rate should be proportional to actual protein yield.

**Figure 2. Overview of the analysis workflow in Scikit-ribo**
The complete workflow consists of Ribosome A-site classifier training, A-site codon prediction and mapping, and translation efficiency inference. ***(A)*** Ribosome A-site training and prediction, gray text boxes denote the major steps. ***(B)*** Illustration of the covariates in the codon level generalized linear model. In the model, the mRNA abundance (in TPM) are considered as offset with fixed coefficient equal to one. Codon dwell time and mRNA secondary structure are shared covariates across genes. Translation efficiencies are gene specific covariates.

**Figure 3. Accurate inference of codon elongation rates and mRNA secondary structure**
***(A)*** Almost perfectly reproduced codon dwell time (DT), inverse of elongation rate) from Weinberg *et al* (r=0.99). ***(B)*** Correlation with the codon’s adaptiveness value (RAV, r=0.5), ***(C)*** Correlation with tRNA abundance (r=0.47). In A–C, the gray dashed line denotes the diagonal line; y=x. The RAV scales from 0 to 1. A codon with lower RAV means that it is less optimal for translation elongation, i.e. slower codons. ***(D)*** Meta gene analysis of the log ratio of adjusted DT (ADT), divided by the mean adjusted DT. The solid line denotes the average ADT in a five-codon sliding window. A log ratio greater than zero means ribosomes at this position are faster than average. The log ratios on the left were significantly higher than the ones on the right (T-test, p-value= 5 × 10⁻³). The unit of the distance is codon.

**Figure 4. Pair-wise comparisons of estimates between Scikit-ribo and RPKM-derived TE**
***(A)*** Scatter plot of Scikit-ribo and RPKM derived *log*2(TE). Difference in *log*2(TE): Δ *log*2(TE). Δ *log*2(TE) > 0.5, previously underestimated (green), Δ *log*2(TE) < −0.5, previously overestimated (orange), and other genes in between (gray). The genes with Δ *log*2(TE) less than −8 are indicated by triangles. ***(B)*** Histograms of scikit-ribo and RPKM-derived *log*2(TE), *log*2(TE) values less than −10 are adjusted to −10 ***(C)*** Histograms of ribosome TPM in all genes (blue), and region 1 (green). ***(D)*** Violin plots of Δ *log*2(TE) by the number stem loops. (E) Violin plots of tAI for genes in the six regions, left: Δ *log*2(TE) < 0, right: *log*2(TE) > 0. ***(F)*** The Kozak consensus sequence, AAAATGTCT, found with the TE estimates from Scikit-ribo (p-value=1 × 10⁻²¹). The lower panel is adapted from the original paper, Hamilton *et al* (1987).

**Figure 5. Large-scale validation with mass spectrometry data confirmed Scikit-ribo’s accurate TE estimates, especially for low-abundance genes**
***(A)*** Scikit-ribo derived protein abundance (PA) for all genes in the validation set (r = 0.81, β = 0.83). ***(B)*** Scikit-ribo derived PA for genes with TPM less than 100 (r = 0.6, β = 0.48). ***(C)*** RPKM-derived PA for all genes in the validation set (r = 0.77, β = 0.75). ***(D)*** RPKM-derived PA for genes with TPM less than 100 (r = 0.35, β = 0.29). The black dashed line denotes the identity line; y=x.

**Figure 6. Practical considerations of using Scikit-ribo for Riboseq analysis**
Pearson correlations between the down-sampled data and the original data (Weinberg et al) on ***(A)*** log2(TE), the gray dashed horizontal line denotes Pearson r = 0.95. ***(B)*** The same down-sampling comparison for the codon relative dwell time (DT). ***(C)*** Scatter plot of log₂ TE on Riboseq experiments treated with cycloheximide (CHX) and CHX free data, ***(D)*** Same comparison for the codon relative dwell time (DT). The CHX free data is from Weinberg et al, and the CHX-treated Riboseq data is from McManus et al. Both data are in *S. cerevisiae*. The black dashed line denotes the identity line; y=x.

See this image and copyright information in PMC

Cited by

A critical period of translational control during brain development at codon resolution.
Harnett D, Ambrozkiewicz MC, Zinnall U, Rusanova A, Borisova E, Drescher AN, Couce-Iglesias M, Villamil G, Dannenberg R, Imami K, Münster-Wandowski A, Fauler B, Mielke T, Selbach M, Landthaler M, Spahn CMT, Tarabykin V, Ohler U, Kraushar ML. Harnett D, et al. Nat Struct Mol Biol. 2022 Dec;29(12):1277-1290. doi: 10.1038/s41594-022-00882-9. Epub 2022 Dec 8. Nat Struct Mol Biol. 2022. PMID: 36482253 Free PMC article.
What determines eukaryotic translation elongation: recent molecular and quantitative analyses of protein synthesis.
Neelagandan N, Lamberti I, Carvalho HJF, Gobet C, Naef F. Neelagandan N, et al. Open Biol. 2020 Dec;10(12):200292. doi: 10.1098/rsob.200292. Epub 2020 Dec 9. Open Biol. 2020. PMID: 33292102 Free PMC article. Review.
Genome-Wide Analysis of Actively Translated Open Reading Frames Using RiboTaper/ORFquant.
Harnett D, Meerdink E, Calviello L, Sydow D, Ohler U. Harnett D, et al. Methods Mol Biol. 2021;2252:331-346. doi: 10.1007/978-1-0716-1150-0_16. Methods Mol Biol. 2021. PMID: 33765284
Accurate design of translational output by a neural network model of ribosome distribution.
Tunney R, McGlincy NJ, Graham ME, Naddaf N, Pachter L, Lareau LF. Tunney R, et al. Nat Struct Mol Biol. 2018 Jul;25(7):577-582. doi: 10.1038/s41594-018-0080-2. Epub 2018 Jul 2. Nat Struct Mol Biol. 2018. PMID: 29967537 Free PMC article.
Codon stabilization coefficient as a metric to gain insights into mRNA stability and codon bias and their relationships with translation.
Carneiro RL, Requião RD, Rossetto S, Domitrovic T, Palhano FL. Carneiro RL, et al. Nucleic Acids Res. 2019 Mar 18;47(5):2216-2228. doi: 10.1093/nar/gkz033. Nucleic Acids Res. 2019. PMID: 30698781 Free PMC article.

See all "Cited by" articles

References

1. Albert FW, Muzzey D, Weissman JS, Kruglyak L. Genetic influences on translation in yeast. PLoS Genet. 2014;10:e1004692. - PMC - PubMed
1. Archer SK, Shirokikh NE, Beilharz TH, Preiss T. Dynamics of ribosome scanning and recycling revealed by translation complex profiling. Nature. 2016;535:570–574. - PubMed
1. Balakumar BJ, Fang Han, Hastie Trevor, Friedman Jerome H, Tibshirani Rob, Simon Noah. Glmnet in Python (Zenodo) 2017
1. Brar GA, Weissman JS. Ribosome profiling reveals the what, when, where and how of protein synthesis. Nat Rev Mol Cell Biol. 2015;16:651–664. - PMC - PubMed
1. Bray NL, Pimentel H, Melsted P, Pachter L. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol. 2016;34:525–527. - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Molecular Biology Databases
- Saccharomyces Genome Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Scikit-ribo Enables Accurate Estimation and Robust Modeling of Translation Dynamics at Codon Resolution

Affiliations

Scikit-ribo Enables Accurate Estimation and Robust Modeling of Translation Dynamics at Codon Resolution

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases