. 2025 Dec 1;21(12):e1013692.

doi: 10.1371/journal.pcbi.1013692. eCollection 2025 Dec.

Long-read sequencing transcriptome quantification with lr-kallisto

Rebekah K Loving¹, Delaney K Sullivan^{1

2}, Fairlie Reese^{3

4}, Elisabeth Rebboah^{3

4}, Jasmine Sakr^{3

4}, Narges Rezaie^{3

4}, Heidi Y Liang^{3

4}, Ghassan Filimban^{3

4}, Shimako Kawauchi³, A Sina Booeshaghi⁵, Páll Melsted^{6

7}, Conrad Oakes¹, Diane Trout¹, Brian A Williams¹, Grant R MacGregor³, Barbara J Wold¹, Ali Mortazavi^{3

4}, Lior Pachter^{1

8}

Affiliations

¹ Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America.
² UCLA-Caltech Medical Scientist Training Program, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, United States of America.
³ Developmental and Cell Biology, University of California Irvine, Irvine, California, United States of America.
⁴ Center for Complex Biological Systems, University of California Irvine, Irvine, California, United States of America.
⁵ Department of Bioengineering, University of California, Berkeley, Berkeley, California, United States of America.
⁶ deCODE Genetics/Amgen Inc., Sturlugata Reykjavík, Iceland.
⁷ Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, School of Engineering and Natural Sciences, University of Iceland, Sæmundargata Reykjavík, Iceland.
⁸ Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California, United States of America.

PMID: 41325434
PMCID: PMC12680354
DOI: 10.1371/journal.pcbi.1013692

Long-read sequencing transcriptome quantification with lr-kallisto

Rebekah K Loving et al. PLoS Comput Biol. 2025.

. 2025 Dec 1;21(12):e1013692.

doi: 10.1371/journal.pcbi.1013692. eCollection 2025 Dec.

Authors

Affiliations

¹ Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, United States of America.
² UCLA-Caltech Medical Scientist Training Program, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, United States of America.
³ Developmental and Cell Biology, University of California Irvine, Irvine, California, United States of America.
⁴ Center for Complex Biological Systems, University of California Irvine, Irvine, California, United States of America.
⁵ Department of Bioengineering, University of California, Berkeley, Berkeley, California, United States of America.
⁶ deCODE Genetics/Amgen Inc., Sturlugata Reykjavík, Iceland.
⁷ Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, School of Engineering and Natural Sciences, University of Iceland, Sæmundargata Reykjavík, Iceland.
⁸ Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California, United States of America.

PMID: 41325434
PMCID: PMC12680354
DOI: 10.1371/journal.pcbi.1013692

Abstract

RNA abundance quantification has become routine and affordable thanks to high-throughput "short-read" technologies that provide accurate molecule counts at the gene level. Similarly accurate and affordable quantification of definitive full-length, transcript isoforms has remained a stubborn challenge, despite its obvious biological significance across a wide range of problems. "Long-read" sequencing platforms now produce data-types that can, in principle, drive routine definitive isoform quantification. However some particulars of contemporary long-read datatypes, together with isoform complexity and genetic variation, present bioinformatic challenges. We show here, using ONT data, that fast and accurate quantification of long-read data is possible and that it is improved by exome capture. To perform quantifications we developed lr-kallisto, which adapts the kallisto bulk and single-cell RNA-seq quantification methods for long-read technologies.

Copyright: © 2025 Loving et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. lr-kallisto demonstrates high concordance between Illumina and ONT.**
(a) Experimental overview for comparison of exome capture vs. non-exome capture LR-Split-seq libraries. (b) Kernel density estimations for read length distributions by capture strategy. (c) Percentage of demultiplexed reads by number of exons in each read between exome and non-exome capture. (d-g) Each point is a hexbin representing the number of transcript in the bin with expression in log2(TPM) with x-coordinate quantified from long reads and y-coordinate quantified from short reads. Total number of points is the total number of annotated transcripts in the reference transcriptome. CCC is a measure of how close the data is to x = y, while Pearson R and Spearman ρ are measures of correlation between x and y. (d) lr-kallisto pseudobulk quantifications of exome capture for the C57BL/6J sample. (e) lr-kallisto pseudobulk quantifications of exome capture for the CAST/Eij sample. (f) lr-kallisto pseudobulk quantifications of non-exome capture for the C57BL/6J sample. (g) lr-kallisto pseudobulk quantifications of non-exome capture for the CAST/Eij sample. Concordance Correlation Coefficient (CCC), Pearson, and Spearman correlations are shown for each comparison. Created with https://BioRender.com

Fig 2. Comparison of Bambu, IsoQuant, lr-kallisto, and Oarfish in (a) abundance estimates as measured by CCC of expression and (b) variability between isoforms as measured by CCC of isoform CV², with 90% CI to measure consistency and reproducibility among replicates between the tools.

**Fig 3. lr-kallisto is highly accurate in simulations with error up to ∼3%.**
A comparison of performance of Bambu, IsoQuant, lr-kallisto, and Oarfish on PacBio (top) and ONT (bottom) simulations with Concordance Correlation Coefficient (CCC), Normalized Root Mean Squared Error, and Pearson’s and Spearman’s correlation coefficients reported.

**Fig 4. Overview of biosample to lr-kallisto pipeline for long read RNA sequencing.**
To study the complexity of life, we can study the genome, transcriptome, and proteome. Through long read sequencing, we can achieve greater insight into both the workings of the genome and the proteome at the individual level and even the functionality of RNA as a molecule. Therefore, improving our ability to analyze long read RNA sequences increases our understanding of biology itself. 1. RNA is extracted from cells and tissues in either single-cell, single-nucleus, or bulk preparation of RNA creating an RNA sequencing library. 2. The RNA sequencing library is then sequenced with either PacBio or Oxford Nanopore Sequencing (Nanopore illustration shown). 3. The raw electrical signal from the nanopore or the raw fluorescent signal from PacBio is then basecalled to create the raw RNA sequenced reads. 4. The raw RNA sequenced reads are input to lr-kallisto outputting both transcriptome quantification of the tissue or single- cells or nuclei as well as the pseudobam alignments for the reads. 5. The analysis and visualization of lr-kallisto’s outputs: single-cell or bulk transcript and gene count matrices and pseudobam (pseudoalignments are output in bam format). Created with https://BioRender.com

**Fig 5. Overview of lr-kallisto pseudoalignment algorithm.**
The input consists of a reference transcriptome and reads from a long read RNA sequencing experiment. (A) An example of two reads (blue and green with unmapping regions (black) and erroneously mapped regions (purple)) and three (pink, blue, and green) overlapping transcripts. (B) An index is constructed by creating the transcriptome de Bruijn Graph (T-DBG) where nodes are k-mers, each transcript corresponds to a colored path as shown and the path cover of the transcriptome induces transcript compatibility class (TCC) for each k-mer. (C) Conceptually, the k-mers of a read are hashed (black nodes) to find the TCC of a read. (D) The TCC of the read is determined by taking the intersection of the transcript compatibility classes of its constituent k-mers, if it exists; otherwise, the mode of the TCCs of the k-mers of the read is taken. Created with https://BioRender.com

See this image and copyright information in PMC

Update of

Long-read sequencing transcriptome quantification with lr-kallisto.
Loving RK, Sullivan DK, Booeshagi AS, Reese F, Rebboah E, Sakr J, Rezaie N, Liang HY, Filimban G, Kawauchi S, Oakes C, Trout D, Williams BA, MacGregor G, Wold BJ, Mortazavi A, Pachter L. Loving RK, et al. bioRxiv [Preprint]. 2025 Jan 29:2024.07.19.604364. doi: 10.1101/2024.07.19.604364. bioRxiv. 2025. Update in: PLoS Comput Biol. 2025 Dec 1;21(12):e1013692. doi: 10.1371/journal.pcbi.1013692. PMID: 39071335 Free PMC article. Updated. Preprint.

References

1. Amarasinghe SL, Su S, Dong X, Zappia L, Ritchie ME, Gouil Q. Opportunities and challenges in long-read sequencing data analysis. Genome Biol. 2020;21(1):30. doi: 10.1186/s13059-020-1935-5 - DOI - PMC - PubMed
1. Pardo-Palacios FJ, Wang D, Reese F, Diekhans M, Carbonell-Sala S, Williams B, et al. Systematic assessment of long-read RNA-seq methods for transcript identification and quantification. bioRxiv. 2023:2023.07.25.550582. doi: 10.1101/2023.07.25.550582 - DOI - PMC - PubMed
1. Reese F, Williams B, Balderrama-Gutierrez G, Wyman D, Çelik MH, Rebboah E, et al. The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity. bioRxiv. 2023:2023.05.15.540865. doi: 10.1101/2023.05.15.540865 - DOI - PMC - PubMed
1. Sakamoto Y, Sereewattanawoot S, Suzuki A. A new era of long-read sequencing for cancer genomics. J Hum Genet. 2020;65(1):3–10. doi: 10.1038/s10038-019-0658-5 - DOI - PMC - PubMed
1. Wang C, Shi Z, Huang Q, Liu R, Su D, Chang L, et al. Single-cell analysis of isoform switching and transposable element expression during preimplantation embryonic development. PLoS Biol. 2024;22(2):e3002505. doi: 10.1371/journal.pbio.3002505 - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- PubMed Central
- Public Library of Science

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Long-read sequencing transcriptome quantification with lr-kallisto

Affiliations

Long-read sequencing transcriptome quantification with lr-kallisto

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Update of

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources