This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2024 Oct 17:2024.10.14.617411.

doi: 10.1101/2024.10.14.617411.

Expanding and improving analyses of nucleotide recoding RNA-seq experiments with the EZbakR suite

Isaac W Vock^{1

2}, Justin W Mabin³, Martin Machyna^{1

2

4}, Alexandra Zhang^{1

2}, J Robert Hogg³, Matthew D Simon^{1

2}

Affiliations

¹ Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA.
² Institute of Biomolecular Design and Discovery, Yale University, West Haven, Connecticut 06516, USA.
³ Biochemistry and Biophysics Center, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA.
⁴ Present address: Paul-Ehrlich-Institut, Host-Pathogen-Interactions, 63225 Langen, Germany.

PMID: 39463977
PMCID: PMC11507695
DOI: 10.1101/2024.10.14.617411

Expanding and improving analyses of nucleotide recoding RNA-seq experiments with the EZbakR suite

Isaac W Vock et al. bioRxiv. 2024.

[Preprint]. 2024 Oct 17:2024.10.14.617411.

doi: 10.1101/2024.10.14.617411.

Authors

Isaac W Vock^{1

2}, Justin W Mabin³, Martin Machyna^{1

2

4}, Alexandra Zhang^{1

2}, J Robert Hogg³, Matthew D Simon^{1

2}

Affiliations

¹ Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA.
² Institute of Biomolecular Design and Discovery, Yale University, West Haven, Connecticut 06516, USA.
³ Biochemistry and Biophysics Center, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA.
⁴ Present address: Paul-Ehrlich-Institut, Host-Pathogen-Interactions, 63225 Langen, Germany.

PMID: 39463977
PMCID: PMC11507695
DOI: 10.1101/2024.10.14.617411

Update in

Expanding and improving analyses of nucleotide recoding RNA-seq experiments with the EZbakR suite.
Vock IW, Mabin JW, Machyna M, Zhang A, Hogg JR, Simon MD. Vock IW, et al. PLoS Comput Biol. 2025 Jul 3;21(7):e1013179. doi: 10.1371/journal.pcbi.1013179. eCollection 2025 Jul. PLoS Comput Biol. 2025. PMID: 40609070 Free PMC article.

Abstract

Nucleotide recoding RNA sequencing methods (NR-seq; TimeLapse-seq, SLAM-seq, TUC-seq, etc.) are powerful approaches for assaying transcript population dynamics. In addition, these methods have been extended to probe a host of regulated steps in the RNA life cycle. Current bioinformatic tools significantly constrain analyses of NR-seq data. To address this limitation, we developed EZbakR, an R package to facilitate a more comprehensive set of NR-seq analyses, and fastq2EZbakR, a Snakemake pipeline for flexible preprocessing of NR-seq datasets, collectively referred to as the EZbakR suite. Together, these tools generalize many aspects of the NR-seq analysis workflow. The fastq2EZbakR pipeline can assign reads to a diverse set of genomic features (e.g., genes, exons, splice junctions, etc.), and EZbakR can perform analyses on any combination of these features. EZbakR extends standard NR-seq mutational modeling to support multi-label analyses (e.g., s⁴U and s⁶G dual labeling), and implements an improved hierarchical model to better account for transcript-to-transcript variance in metabolic label incorporation. EZbakR also generalizes dynamical systems modeling of NR-seq data to support analyses of premature mRNA processing and flow between subcellular compartments. Finally, EZbakR implements flexible and well-powered comparative analyses of all estimated parameters via design matrix-specified generalized linear modeling. The EZbakR suite will thus allow researchers to make full, effective use of NR-seq data.

PubMed Disclaimer

Figures

**Figure 1:**
The EZbakR suite generalizes and improves upon all steps of the NR-seq analysis pipeline. The EZbakR suite: 1) Implements a flexible feature assignment strategy, 2) provides processed mutational data in a convenient, compressed format, 3) analyzes mutational data in a way that supports multi-label design and allows for feature-to-feature mutation rate variance, 4) fits any identifiable, linear dynamical systems model to NR-seq data, and 5) performs well-powered, design matrix-specified comparative analyses of all estimated kinetic parameters.

**Figure 2:**
fastq2EZbakR generalizes feature assignment to support finer dissection of NR-seq data. A) Schematic of the 5 different feature assignment strategies implemented in fastq2EZbakR. If a read does not overlap with a given feature, it will be assigned a value of __no_feature for that assignment. If a read overlaps multiple features, all features will be reported, with names separated by +-signs. TEC = transcript equivalence class (set of transcript isoforms with which a read is compatible). Exon bins were introduced in DEXSeq (Anders et al., 2014). B) Schematic of the full fastq2EZbakR pipeline.

**Figure 3:**
EZbakR generalizes NR-seq mixture modeling to support multi-label analyses. A) Generalized mixture model likelihood. P = number of distinct mutational populations (e.g., high T-to-C and low G-to-A mutation rate). T = number of mutation types being analyzed (e.g., T-to-C and G-to-A). nM = number of mutations of a particular type in a given read. nN = number of mutable nucleotides of a given type in a given read. B) Example of a dual-label NR-seq experimental method: TILAC. In this experiment, s⁴U fed cells are mixed with s⁶G fed cells. C) Schematic for how generalized mixture modeling works in the setting of TILAC. In TILAC, there are no dually labeled reads, so the high T-to-C and high G-to-A population does not exist (see mutational populations table). D) Analyses of simulated TILAC data. θ₁ = fraction s⁴U labeled; θ₂ = fraction s⁶G labeled; θ₃ = fraction unlabeled. X-axis is simulated ground truth. Y-axis is estimated value. Red dotted line is perfectly accurate estimation.

**Figure 4:**
EZbakR’s hierarchical NR-seq mixture modeling accounts for p_labeled variation. A) Schematic of hierarchical modeling strategy to infer a p_labeled for each feature (i.e., feature-specific p_labeled). Strategy is designed to strongly regularize feature-specific p_labeled estimates to reduce estimate variance. See Methods for details. B) Analyses of simulated data. Left: distribution of simulated feature-specific p_labeled. Middle: assessment of feature-specific p_labeled estimate accuracy. Right: Assessment of fraction labeled (θ) estimate accuracy. In Middle and Right plots, red, dotted line represents perfect estimation. Points are colored by simulated read count. C) Estimated feature-specific p_labeled (Y-axis) as a function of the estimated fraction labeled (on a logit-scale; X-axis) from analysis of TimeLapse-seq data from K562 cells (Ietswaart et al., 2024). Left: points colored by density. Right: points colored by whether RNA originated from the mitochondrial chromosome (chrMT).

**Figure 5:**
EZbakR generalizes kinetic modeling of NR-seq data. A) Model assumed when performing standard analysis of mature mRNA synthesis and degradation. B) Analysis of simulated data for model of pre-mRNA maturation. P = premature mRNA; M = mature mRNA. Scatter plots show comparison of true simulated parameter values to those estimated by EZbakR, for all three kinetic parameters in said model. Red dotted line represents perfect estimation. C) Analysis of simulated data for model of nuclear-to-cytoplasmic trafficking of RNA. N = nuclear RNA. C = cytoplasmic RNA. Red dotted line represents perfect estimation. D) Left: Nuclear degradation rate constant accuracy scatterplot from C, colored by model’s uncertainty in rate constant estimate. Right: comparison of the true nuclear degradation and export rate constants, colored by model’s uncertainty in nuclear degradation rate constant. Red dotted line represents equal nuclear degradation and export kinetics. Estimating k_Ndeg is expected to get harder the further points are from this line, for reasons discussed in Supplemental Methods.

**Figure 6:**
EZbakR improves and generalizes performing comparative analyses with NR-seq. A) Input to linear model of kinetic parameters in EZbakR. Includes metadata for each sample analyzed and a model relating a given kinetic parameter to factors included in metadata. Any identifiable model can be specified and fit. This approach allows for simple multi-condition comparisons (top potential model) or more complicated analysis designs (e.g., batch effect modeling; bottom potential model) B) Comparison of runtimes between two bakR implementations (Markov Chain Monte Carlo (MCMC) and Maximum Likelihood Estimation (MLE)) and EZbakR. **C-E)** Analysis of simulated data originally presented in Vock and Simon 2022. C) Comparison of statistical power (number of true positives / number simulated positives) between bakR implementations and EZbakR. D) Comparison of false discovery rates (FDRs; number of false positives / number of positives) between bakR implementations and EZbakR. E) Comparison of Matthew’s correlation coefficients (MCC) between bakR implementations and EZbakR.

See this image and copyright information in PMC

References

1. Anders S., Reyes A. and Huber W. Detecting differential usage of exons from RNA-seq data. Nature Precedings 2012:1–1. - PMC - PubMed
1. Berg K., et al. Correcting 4sU induced quantification bias in nucleotide conversion RNA-seq data. Nucleic Acids Research 2024;52(7):e35–e35. - PMC - PubMed
1. Bonfield J.K., et al. HTSlib: C library for reading/writing high-throughput sequencing data. Gigascience 2021;10(2):giab007. - PMC - PubMed
1. Cao J., et al. Sci-fate characterizes the dynamics of gene expression in single cells. Nature Biotechnology 2020;38(8):980–988. - PMC - PubMed
1. Chen S., et al. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 2018;34(17):i884–i890. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

Expanding and improving analyses of nucleotide recoding RNA-seq experiments with the EZbakR suite

Affiliations

Expanding and improving analyses of nucleotide recoding RNA-seq experiments with the EZbakR suite

Authors

Affiliations

Update in

Abstract

Figures

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources