Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 3;21(7):e1013179.
doi: 10.1371/journal.pcbi.1013179. eCollection 2025 Jul.

Expanding and improving analyses of nucleotide recoding RNA-seq experiments with the EZbakR suite

Affiliations

Expanding and improving analyses of nucleotide recoding RNA-seq experiments with the EZbakR suite

Isaac W Vock et al. PLoS Comput Biol. .

Abstract

Nucleotide recoding RNA sequencing methods (NR-seq; TimeLapse-seq, SLAM-seq, TUC-seq, etc.) are powerful approaches for assaying transcript population dynamics. In addition, these methods have been extended to probe a host of regulated steps in the RNA life cycle. Current bioinformatic tools significantly constrain analyses of NR-seq data. To address this limitation, we developed EZbakR (https://github.com/isaacvock/EZbakR), an R package to facilitate a more comprehensive set of NR-seq analyses, and fastq2EZbakR (https://github.com/isaacvock/fastq2EZbakR), a Snakemake pipeline for flexible preprocessing of NR-seq datasets, collectively referred to as the EZbakR suite. Together, these tools generalize many aspects of the NR-seq analysis workflow. The fastq2EZbakR pipeline can assign reads to a diverse set of genomic features (e.g., genes, exons, splice junctions), and EZbakR can perform analyses on any combination of these features. EZbakR extends standard NR-seq mutational modeling to support multi-label analyses (e.g., s4U and s6G dual labeling), and implements an improved hierarchical model to better account for transcript-to-transcript variance in metabolic label incorporation. EZbakR also generalizes dynamical systems modeling of NR-seq data to support analyses of premature mRNA processing and flow between subcellular compartments. Finally, EZbakR implements flexible and well-powered comparative analyses of all estimated parameters via design matrix-specified generalized linear modeling. The EZbakR suite will thus allow researchers to make full, effective use of NR-seq data.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. The EZbakR suite generalizes and improves all steps of the NR-seq analysis pipeline.
The EZbakR suite: 1) Implements a flexible feature assignment strategy, 2) provides processed mutational data in a convenient, compressed format, 3) analyzes mutational data in a way that supports multi-label design and allows for feature-to-feature mutation rate variance, 4) fits any identifiable, linear dynamical systems model to NR-seq data, and 5) performs well-powered, design matrix-specified comparative analyses of all estimated kinetic parameters.
Fig 2
Fig 2. fastq2EZbakR generalizes the assignment of reads to features to support finer dissection of NR-seq data. A) Schematic of the 5 different feature assignment strategies implemented in fastq2EZbakR. If a read does not overlap with a given feature, it will be assigned a value of __no_feature for that feature assignment. If a read overlaps multiple features, all features will be reported, with names separated by +-signs. TEC = transcript equivalence class (set of transcript isoforms with which a read is compatible). Exon bins were introduced in DEXSeq (Anders et al., 2014). B) Schematic of the full fastq2EZbakR pipeline.
Fig 3
Fig 3. EZbakR generalizes NR-seq mixture modeling to support multi-label analyses.
A) Generalized mixture model likelihood. P = number of distinct mutational populations (e.g., high T-to-C and low G-to-A mutation rate). T = number of mutation types being analyzed (e.g., T-to-C and G-to-A). nM = number of mutations of a particular type in a given read. nN = number of mutable nucleotides of a given type in a given read. B) Example of a dual-label NR-seq experimental method: TILAC. In this experiment, s4U fed cells are mixed with s6G fed cells. C) Schematic for how generalized mixture modeling works in the setting of TILAC. In TILAC, there are no dually labeled reads, so the high T-to-C and high G-to-A population does not exist (see mutational populations table). D) Analyses of simulated TILAC data. θ1 = fraction s4U labeled; θ2 = fraction s6G labeled; θ3 = fraction unlabeled. X-axis is simulated ground truth. Y-axis is estimated value. The red dotted line represents perfectly accurate estimation.
Fig 4
Fig 4. EZbakR’s hierarchical NR-seq mixture modeling accounts for plabeled variation.
A) A schematic of the hierarchical modeling strategy to infer a plabeled for each feature (i.e., feature-specific plabeled). Strategy is designed to strongly regularize feature-specific plabeled estimates to reduce estimate variance. See Design and Implementation for details. B) Analyses of simulated data. Left: distribution of simulated feature-specific plabeled. Middle: assessment of feature-specific plabeled estimate accuracy. Right: Assessment of fraction labeled (θ) estimate accuracy. In Middle and Right plots, the red, dotted line represents perfect estimation. Points are colored by simulated read count. C) Estimated gene-specific plabeled (Y-axis) as a function of the estimated fraction labeled (on a logit-scale; X-axis) from analysis of TimeLapse-seq data from K562 cells (Ietswaart et al., 2024) [12]. Left: points colored by density. Right: points colored by whether the gene is on the mitochondrial chromosome (chrMT).
Fig 5
Fig 5. EZbakR generalizes kinetic modeling of NR-seq data.
A) Model assumed when performing standard analysis of mature mRNA synthesis and degradation. M = mature mRNA. B) Analysis of simulated data for model of pre-mRNA maturation. P = premature mRNA; M = mature mRNA. Scatter plots show comparison of true simulated parameter values to those estimated by EZbakR, for all three kinetic parameters in the model. Red dotted line represents perfect estimation. C) Analysis of simulated data for a model of nuclear-to-cytoplasmic trafficking of RNA. N = nuclear RNA; C = cytoplasmic RNA. Red dotted line represents perfect estimation. D) Left: Nuclear degradation rate constant accuracy scatterplot from C, colored by the model’s uncertainty in the rate constant estimate. Right: comparison of the true nuclear degradation and export rate constants, colored by the model’s uncertainty in the nuclear degradation rate constant. Red dotted line represents equal nuclear degradation and export kinetics. Estimating kNdeg is expected to get harder the further points are from this line, for reasons discussed in S1 Text.
Fig 6
Fig 6. EZbakR improves and generalizes performing comparative analyses with NR-seq.
A) Simplified input to the generalized kinetic parameter linear model in EZbakR. Includes metadata for each sample analyzed and a model relating a given kinetic parameter to factors included in the metadata. Any identifiable model can be specified and fit. This approach allows for simple multi-condition comparisons (shown here) or more complicated analysis designs (e.g., multi-factor designs; Fig H, panel A, in S1 Text) B) Comparison of runtimes between two bakR implementations (Markov Chain Monte Carlo (MCMC) and Maximum Likelihood Estimation (MLE)), grandR, and EZbakR. C-E) Analysis of simulated data originally presented in [34]. We include two grandR assessments. In one case (CI) we use grandR’s provided 95% credible intervals to determine the significance of degradation rate constant changes. In the other (oracle), we identified a region of practical equivalence (ROPE) probability cutoff that yields FDR control on-par with the highest power method (bakR MCMC). The latter, while infeasible in real data applications, gives grandR the best chance of maximizing its power. C) Comparison of statistical power (number of true positives/ number simulated positives) between bakR implementations, grandR, and EZbakR. D) Comparison of false discovery rates (FDRs; number of false positives/ number of positives) between bakR implementations, grandR, and EZbakR. E) Comparison of Matthew’s correlation coefficients (MCC) between bakR implementations, grandR, and EZbakR. F) Application of the generalized linear model in EZbakR for analysis of a real, multi-perturbation dataset previously analyzed with bakR [48]. Schematics are adapted from those in that study.
Fig 7
Fig 7. EZbakR identifies the effects of DDX3X knockdown on subcellular RNA kinetics.
A) Model to which nuclear, cytoplasmic, and total RNA NR-seq data from Ietswaart et al. was fit [12]. B) EZbakR comparative analyses of synthesis, export, and degradation kinetics with and without DDX3X knockdown. C) Changes in synthesis, export, and degradation kinetics for DDX3X target and non-target transcripts. Red dotted lines represent median changes in each kinetic parameter for the non-targets.

Update of

References

    1. Herzog VA, Reichholf B, Neumann T, Rescheneder P, Bhat P, Burkard TR, et al. Thiol-linked alkylation of RNA to assess expression dynamics. Nat Methods. 2017;14(12):1198–204. doi: 10.1038/nmeth.4435 - DOI - PMC - PubMed
    1. Riml C, Amort T, Rieder D, Gasser C, Lusser A, Micura R. Osmium‐mediated transformation of 4‐thiouridine to cytidine as key to study RNA dynamics by sequencing. Angewandte Chemie International Edition. 2017;56(43):13479–83. - PubMed
    1. Schofield JA, Duffy EE, Kiefer L, Sullivan MC, Simon MD. TimeLapse-seq: adding a temporal dimension to RNA sequencing through nucleoside recoding. Nat Methods. 2018;15(3):221–5. doi: 10.1038/nmeth.4582 - DOI - PMC - PubMed
    1. Duffy EE, Schofield JA, Simon MD. Gaining insight into transcriptome-wide RNA population dynamics through the chemistry of 4-thiouridine. Wiley Interdiscip Rev RNA. 2019;10(1):e1513. doi: 10.1002/wrna.1513 - DOI - PMC - PubMed
    1. Erhard F, Saliba AE, Lusser A, Toussaint C, Hennig T, Prusty BK. Time-resolved single-cell RNA-seq using metabolic RNA labelling. Nature Reviews Methods Primers. 2022;2(1):77.

LinkOut - more resources