Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Jul 31:2025.07.25.666829.
doi: 10.1101/2025.07.25.666829.

Tranquillyzer: A Flexible Neural Network Framework for Structural Annotation and Demultiplexing of Long-Read Transcriptomes

Affiliations

Tranquillyzer: A Flexible Neural Network Framework for Structural Annotation and Demultiplexing of Long-Read Transcriptomes

Ayush Semwal et al. bioRxiv. .

Abstract

Long-read single-cell RNA sequencing using platforms such as Oxford Nanopore Technologies (ONT) enables full-length transcriptome profiling at single-cell resolution. However, high sequencing error rates, diverse library architectures, and increasing dataset scale introduce major challenges for accurately identifying cell barcodes (CBCs) and unique molecular identifiers (UMIs) - key prerequisites for reliable demultiplexing and deduplication, respectively. Existing pipelines rely on hard-coded heuristics or local transition rules that cannot fully capture this broader structural context and often fail to robustly interpret reads with indel-induced shifts, truncated segments, or non-canonical element ordering. We introduce Tranquillyzer (TRANscript QUantification In Long reads-anaLYZER), a flexible, architecture-aware deep learning framework for processing long-read single-cell RNA-seq data. Tranquillyzer employs a hybrid neural network architecture and a global, context-aware design, and enables precise identification of structural elements - even when elements are shifted, partially degraded, or repeated due to sequencing noise or library construction variability. In addition to supporting established single-cell protocols, Tranquillyzer accommodates custom library formats through rapid, one-time model training on user-defined label schemas, typically completed within a few hours on standard GPUs. Additional features such as scalability across large datasets and comprehensive visualization capabilities further position Tranquillyzer as a flexible and scalable framework solution for processing long-read single-cell transcriptomic datasets.

Keywords: Conditional Random Field; Convolution Neural Network; Long Short-Term Memory; Long-Read; scRNA-seq.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:. Overview of the Tranquillyzer framework.
A) Tranquillyzer accepts raw sequencing data in compressed or uncompressed FASTQ/FASTA format, and first preprocesses them into read-length–based binned Parquet files, for downstream tasks such as annotation, demultiplexing, and visualization (e.g., read-length distributions and per-read annotation visualization). Read annotation and demultiplexing are executed concurrently using multi-GPU inference and multi-threaded CPU processing, respectively. Structural annotations, comprising the start and end coordinates of each feature, inferred cell barcode identity, and filtering status, are stored in annotation metadata files. Successfully demultiplexed reads are separated into a dedicated FASTA file, while ambiguous reads are stored in their own file. Demultiplexed reads are subsequently aligned to a user-defined reference genome using splice-aware alignment, and only primary alignments are retained in the coordinate-sorted BAM output. This file is then used for duplicate marking. B) Read annotation is performed using a deep neural network model. Reads are first retrieved from Parquet files in user-defined or default chunks, then encoded and padded to accommodate variable lengths. The model includes a series of convolutional neural network (CNN) blocks, each composed of 1D convolutions (Conv1D) with ReLU activation and optional batch normalization, stacked N times as needed for optimal performance. This is followed by a bidirectional long short-term memory (BiLSTM) module repeated over L layers, a time-distributed dense (TD-Dense) layer, and optionally a conditional random field (CRF) layer to enforce consistency in base-wise label prediction. The final output is a sequence of per-base annotations identifying structural elements.
Figure 2.
Figure 2.. Tranquillyzer outperforms existing tools in recovering valid reads and resolving structural artifacts in simulated long-read single-cell datasets.
A) Read retention across key processing steps - input, structural filtering, and demultiplexing for variable read counts (5–100 million reads) with 100–500 bp cDNA lengths. B) Structural filtering efficiency across increasing read lengths (500–2,500 bp with 10 million reads each), with a constant read count of 10M reads. C) Demultiplexing performance after structural filtering, stratified by molecule length. D) Proportion of sub-fragments recoverable from concatenated reads with known architectures (e.g., FWD_FWD, FWD_REV_FWD, FWD_REV_REV). scNanoGPS was excluded from this analysis as it lacks explicit modeling of internal structural complexity.
Figure 3.
Figure 3.. Tranquillyzer provides high-fidelity structural classification and resolves complex artifacts in real single-cell long-read transcriptome datasets.
A) Read composition stratified by length bin in a real dataset processed with Tranquillyzer_CRF. Left panel shows absolute read counts per bin; right panel displays the relative proportions of read categories, including valid single fragments, concatenated molecules, and various classes of truncated or artifactual reads. B) Proportion of mapped reads containing supplementary alignments across tools. C) Read composition across read-length bins after processing with Sicelore, with subsequent structural reclassification using Tranquillyzer_CRF. D) Composition of supplementary alignments unique to each tool (i.e., not shared with Tranquillyzer_CRF).
Figure 4.
Figure 4.. Read annotation visualization reveals misclassified structural artifacts across other tools.
A representative read (SRR21492159.31864199) exhibiting supplementary alignments in the outputs of Sicelore, wf-single-cell, and scNanoGPS is shown. All three tools aligned the upstream cDNA fragment as the primary alignment (green box), while the downstream fragment - originating from a concatenated fragment - was misclassified as a valid supplementary alignment (red box). In contrast, Tranquillyzer annotated this read as a concatenated read composed of two distinct sub-fragments, flagging it as structurally artifactual. Color-coded labels denote annotated elements including adapters (5′, 3′), polyT tail, cell barcode (CBC), UMI, and cDNA.

References

    1. Haque A, Engel J, Teichmann SA, Lönnberg T. A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications. Genome Med 2017;9:75. 10.1186/s13073-017-0467-4. - DOI - PMC - PubMed
    1. Li X, Wang C-Y. From bulk, single-cell to spatial RNA sequencing. Int J Oral Sci 2021;13:36. 10.1038/s41368-021-00146-0. - DOI - PMC - PubMed
    1. Tian L, Jabbari JS, Thijssen R, Gouil Q, Amarasinghe SL, Voogd O, Kariyawasam H, Du MRM, Schuster J, Wang C, Su S, Dong X, Law CW, Lucattini A, Prawer YDJ, Collar-Fernández C, Chung JD, Naim T, Chan A, Ly CH, Lynch GS, Ryall JG, Anttila CJA, Peng H, Anderson MA, Flensburg C, Majewski I, Roberts AW, Huang DCS, Clark MB, Ritchie ME. Comprehensive characterization of single-cell full-length isoforms in human and mouse with long-read sequencing. Genome Biol 2021;22:310. 10.1186/s13059-021-02525-6. - DOI - PMC - PubMed
    1. Gupta P, O’Neill H, Wolvetang EJ, Chatterjee A, Gupta I. Advances in single-cell long-read sequencing technologies. NAR Genomics and Bioinformatics 2024;6:lqae047. 10.1093/nargab/lqae047. - DOI - PMC - PubMed
    1. Perocchi F, Xu Z, Clauder-Münster S, Steinmetz LM. Antisense artifacts in transcriptome microarray experiments are resolved by actinomycin D. Nucleic Acids Research 2007;35:e128. 10.1093/nar/gkm683. - DOI - PMC - PubMed

Publication types

LinkOut - more resources