Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2026 Jan 2;42(1):btaf680.
doi: 10.1093/bioinformatics/btaf680.

isoSeQL: comparing long-read isoforms across multiple datasets

Affiliations

isoSeQL: comparing long-read isoforms across multiple datasets

Christine S Liu et al. Bioinformatics. .

Abstract

Motivation: Long-read sequencing has made RNA isoform detection and characterization more accessible. While several bioinformatics tools have been developed to examine the data generated by these approaches, a major challenge in the field has been comparing isoform profiles across several samples.

Results: We developed isoSeQL, a tool for compiling long-read transcriptomic data, identifying common and unique isoforms across multiple samples, and extracting and visualizing various metrics. isoSeQL will augment approaches that utilize long-read sequencing to discover novel isoforms and to examine how isoforms vary across different experimental and biological conditions and cell types. We demonstrate how to use isoSeQL with publicly available datasets.

Availability and implementation: isoSeQL is available on Github: https://github.com/christine-liu/isoSeQL and Zenodo:https://doi.org/10.5281/zenodo.15717809.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Generalized workflow for processing long-read isoform sequencing data to run through isoSeQL. Every sample is processed individually and added to the database using two SQANTI3 output files and user-supplied config files with sample and experiment information. Additional steps and files for single-cell analysis are indicated in green (“Link isoforms to cell type of origin” and “cell type info”).
Figure 2.
Figure 2.
Collapsing redundant isoforms and tracking transcript start/end variability in isoSeQL. (A) Simulated transcripts of LPAR3 for two samples (Sample A and Sample B). Reference exons are shown in black. The numbers indicate the number of base pairs truncated on each end. (B) The result of using isoseq collapse is to remove redundant transcripts within each sample. (C) The result of collapsing isoforms from Sample A with isoforms from Sample B using isoseq collapse. (D) The resulting isoform output from merging isoforms from Sample A with isoforms from Sample B with gffcompare. (E) Schematic of how the isoforms from Sample A and Sample B would be stored in isoSeQL with corresponding read counts. Isoform 1 refers to all isoforms with the same intron chain, while 1_x_y, where x and y are the start/end coordinates, refers to isoforms that share the same intron chain but have different start/end coordinates. Coordinates shown are illustrative and not actual genomic coordinates for clarity. (F and G) Start/end site variability plots. Plots showing the proportion of reads supporting the transcript with different start (F) and end (G) coordinates by sample.
Figure 3.
Figure 3.
isoSeQL analysis of bulk Iso-Seq data. (A) Plot of proportions of isoforms belonging to different structural categories. (B) Plot showing the proportion of reads from isoforms belonging to different structural categories. (C) UpSet plot showing overlap of isoforms defined by intron chain across samples, colored by structural category. The UpSet plot only shows the top ten intersections.
Figure 4.
Figure 4.
isoSeQL analysis of single-cell Iso-Seq data (A) Read proportion of known isoforms of IKZF1 expressed in each sample, analyzed as a pseudobulk. (B) Read proportion of known IKZF1 isoforms expressed by cell type. Numbers on top of the bars indicate the total read count for that particular cell type.

References

    1. Aguzzoli Heberle B, Brandon JA, Page ML et al. Mapping medically relevant RNA isoform diversity in the aged human frontal cortex with deep long-read RNA-seq. Nat Biotechnol 2025;43:635–46. - PMC - PubMed
    1. Amarasinghe SL, Ritchie ME, Gouil Q. Long-read-tools.org: An interactive catalogue of analysis methods for long-read sequencing data. Gigascience 2021;10:1–7. - PMC - PubMed
    1. Amarasinghe SL, Su S, Dong X et al. Opportunities and challenges in long-read sequencing data analysis. Genome Biol 2020;21:30. - PMC - PubMed
    1. De Paoli-Iseppi R, Gleeson J, Clark MB. Isoform age—splice isoform profiling using long-read technologies. Front Mol Biosci 2021;8:767743. - PMC - PubMed
    1. Gupta I, Collier PG, Haase B et al. Single-cell isoform RNA sequencing characterizes isoforms in thousands of cerebellar cells. Nat Biotechnol 2018;36:1197–202. 10.1038/nbt.4259 - DOI - PubMed