This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2025 Jul 31:2025.07.25.666829.

doi: 10.1101/2025.07.25.666829.

Tranquillyzer: A Flexible Neural Network Framework for Structural Annotation and Demultiplexing of Long-Read Transcriptomes

Ayush Semwal¹, Jacob Morrison¹, Ian Beddows¹, Theron Palmer¹, Mary F Majewski¹, H Josh Jang², Benjamin K Johnson¹, Hui Shen¹

Affiliations

¹ Department of Epigenetics, Van Andel Research Institute, Grand Rapids, MI, USA.
² Department of Cell Biology, Van Andel Research Institute, Grand Rapids, MI, USA.

PMID: 40766630
PMCID: PMC12324178
DOI: 10.1101/2025.07.25.666829

Tranquillyzer: A Flexible Neural Network Framework for Structural Annotation and Demultiplexing of Long-Read Transcriptomes

Ayush Semwal et al. bioRxiv. 2025.

[Preprint]. 2025 Jul 31:2025.07.25.666829.

doi: 10.1101/2025.07.25.666829.

Authors

Ayush Semwal¹, Jacob Morrison¹, Ian Beddows¹, Theron Palmer¹, Mary F Majewski¹, H Josh Jang², Benjamin K Johnson¹, Hui Shen¹

Affiliations

¹ Department of Epigenetics, Van Andel Research Institute, Grand Rapids, MI, USA.
² Department of Cell Biology, Van Andel Research Institute, Grand Rapids, MI, USA.

PMID: 40766630
PMCID: PMC12324178
DOI: 10.1101/2025.07.25.666829

Abstract

Long-read single-cell RNA sequencing using platforms such as Oxford Nanopore Technologies (ONT) enables full-length transcriptome profiling at single-cell resolution. However, high sequencing error rates, diverse library architectures, and increasing dataset scale introduce major challenges for accurately identifying cell barcodes (CBCs) and unique molecular identifiers (UMIs) - key prerequisites for reliable demultiplexing and deduplication, respectively. Existing pipelines rely on hard-coded heuristics or local transition rules that cannot fully capture this broader structural context and often fail to robustly interpret reads with indel-induced shifts, truncated segments, or non-canonical element ordering. We introduce Tranquillyzer (TRANscript QUantification In Long reads-anaLYZER), a flexible, architecture-aware deep learning framework for processing long-read single-cell RNA-seq data. Tranquillyzer employs a hybrid neural network architecture and a global, context-aware design, and enables precise identification of structural elements - even when elements are shifted, partially degraded, or repeated due to sequencing noise or library construction variability. In addition to supporting established single-cell protocols, Tranquillyzer accommodates custom library formats through rapid, one-time model training on user-defined label schemas, typically completed within a few hours on standard GPUs. Additional features such as scalability across large datasets and comprehensive visualization capabilities further position Tranquillyzer as a flexible and scalable framework solution for processing long-read single-cell transcriptomic datasets.

Keywords: Conditional Random Field; Convolution Neural Network; Long Short-Term Memory; Long-Read; scRNA-seq.

PubMed Disclaimer

Figures

**Figure 1:. Overview of the *Tranquillyzer* framework.**
A) *Tranquillyzer* accepts raw sequencing data in compressed or uncompressed FASTQ/FASTA format, and first preprocesses them into read-length–based binned Parquet files, for downstream tasks such as annotation, demultiplexing, and visualization (e.g., read-length distributions and per-read annotation visualization). Read annotation and demultiplexing are executed concurrently using multi-GPU inference and multi-threaded CPU processing, respectively. Structural annotations, comprising the start and end coordinates of each feature, inferred cell barcode identity, and filtering status, are stored in annotation metadata files. Successfully demultiplexed reads are separated into a dedicated FASTA file, while ambiguous reads are stored in their own file. Demultiplexed reads are subsequently aligned to a user-defined reference genome using splice-aware alignment, and only primary alignments are retained in the coordinate-sorted BAM output. This file is then used for duplicate marking. B) Read annotation is performed using a deep neural network model. Reads are first retrieved from Parquet files in user-defined or default chunks, then encoded and padded to accommodate variable lengths. The model includes a series of convolutional neural network (**CNN**) blocks, each composed of 1D convolutions (**Conv1D**) with **ReLU** activation and optional batch normalization, stacked N times as needed for optimal performance. This is followed by a bidirectional long short-term memory (**BiLSTM**) module repeated over L layers, a time-distributed dense (**TD-Dense**) layer, and optionally a conditional random field (**CRF**) layer to enforce consistency in base-wise label prediction. The final output is a sequence of per-base annotations identifying structural elements.

**Figure 2.. *Tranquillyzer* outperforms existing tools in recovering valid reads and resolving structural artifacts in simulated long-read single-cell datasets.**
A) Read retention across key processing steps - input, structural filtering, and demultiplexing for variable read counts (5–100 million reads) with 100–500 bp cDNA lengths. B) Structural filtering efficiency across increasing read lengths (500–2,500 bp with 10 million reads each), with a constant read count of 10M reads. C) Demultiplexing performance after structural filtering, stratified by molecule length. D) Proportion of sub-fragments recoverable from concatenated reads with known architectures (e.g., FWD_FWD, FWD_REV_FWD, FWD_REV_REV). *scNanoGPS* was excluded from this analysis as it lacks explicit modeling of internal structural complexity.

**Figure 3.. Tranquillyzer provides high-fidelity structural classification and resolves complex artifacts in real single-cell long-read transcriptome datasets.**
A) Read composition stratified by length bin in a real dataset processed with *Tranquillyzer_CRF*. Left panel shows absolute read counts per bin; right panel displays the relative proportions of read categories, including valid single fragments, concatenated molecules, and various classes of truncated or artifactual reads. B) Proportion of mapped reads containing supplementary alignments across tools. C) Read composition across read-length bins after processing with *Sicelore*, with subsequent structural reclassification using *Tranquillyzer_CRF*. D) Composition of supplementary alignments unique to each tool (i.e., not shared with *Tranquillyzer_CRF*).

**Figure 4.. Read annotation visualization reveals misclassified structural artifacts across other tools.**
A representative read (SRR21492159.31864199) exhibiting supplementary alignments in the outputs of *Sicelore*, *wf-single-cell, and scNanoGPS* is shown. All three tools aligned the upstream cDNA fragment as the primary alignment (green box), while the downstream fragment - originating from a concatenated fragment - was misclassified as a valid supplementary alignment (red box). In contrast, *Tranquillyzer* annotated this read as a concatenated read composed of two distinct sub-fragments, flagging it as structurally artifactual. Color-coded labels denote annotated elements including adapters (5′, 3′), polyT tail, cell barcode (CBC), UMI, and cDNA.

See this image and copyright information in PMC

References

1. Haque A, Engel J, Teichmann SA, Lönnberg T. A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications. Genome Med 2017;9:75. 10.1186/s13073-017-0467-4. - DOI - PMC - PubMed
1. Li X, Wang C-Y. From bulk, single-cell to spatial RNA sequencing. Int J Oral Sci 2021;13:36. 10.1038/s41368-021-00146-0. - DOI - PMC - PubMed
1. Tian L, Jabbari JS, Thijssen R, Gouil Q, Amarasinghe SL, Voogd O, Kariyawasam H, Du MRM, Schuster J, Wang C, Su S, Dong X, Law CW, Lucattini A, Prawer YDJ, Collar-Fernández C, Chung JD, Naim T, Chan A, Ly CH, Lynch GS, Ryall JG, Anttila CJA, Peng H, Anderson MA, Flensburg C, Majewski I, Roberts AW, Huang DCS, Clark MB, Ritchie ME. Comprehensive characterization of single-cell full-length isoforms in human and mouse with long-read sequencing. Genome Biol 2021;22:310. 10.1186/s13059-021-02525-6. - DOI - PMC - PubMed
1. Gupta P, O’Neill H, Wolvetang EJ, Chatterjee A, Gupta I. Advances in single-cell long-read sequencing technologies. NAR Genomics and Bioinformatics 2024;6:lqae047. 10.1093/nargab/lqae047. - DOI - PMC - PubMed
1. Perocchi F, Xu Z, Clauder-Münster S, Steinmetz LM. Antisense artifacts in transcriptome microarray experiments are resolved by actinomycin D. Nucleic Acids Research 2007;35:e128. 10.1093/nar/gkm683. - DOI - PMC - PubMed

Publication types

Actions

Grants and funding

UM1 DA058219/DA/NIDA NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Cold Spring Harbor Laboratory
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

Tranquillyzer: A Flexible Neural Network Framework for Structural Annotation and Demultiplexing of Long-Read Transcriptomes

Affiliations

Tranquillyzer: A Flexible Neural Network Framework for Structural Annotation and Demultiplexing of Long-Read Transcriptomes

Authors

Affiliations

Abstract

Figures

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources