. 2024 Jul;21(7):1349-1363.

doi: 10.1038/s41592-024-02298-3. Epub 2024 Jun 7.

Systematic assessment of long-read RNA-seq methods for transcript identification and quantification

Francisco J Pardo-Palacios^#¹, Dingjie Wang^#^{2

3}, Fairlie Reese^#^{4

5}, Mark Diekhans^#⁶, Sílvia Carbonell-Sala^#⁷, Brian Williams^#⁸, Jane E Loveland^#⁹, Maite De María^#^{10

11}, Matthew S Adams¹², Gabriela Balderrama-Gutierrez^{4

5}, Amit K Behera¹³, Jose M Gonzalez Martinez⁹, Toby Hunt⁹, Julien Lagarde^{7

14}, Cindy E Liang¹², Haoran Li^{2

3}, Marcus Jerryd Meade¹⁵, David A Moraga Amador¹⁶, Andrey D Prjibelski^{17

18}, Inanc Birol¹⁹, Hamed Bostan²⁰, Ashley M Brooks²⁰, Muhammed Hasan Çelik^{4

5}, Ying Chen²¹, Mei R M Du²², Colette Felton¹³, Jonathan Göke^{21

23}, Saber Hafezqorani¹⁹, Ralf Herwig²⁴, Hideya Kawaji²⁵, Joseph Lee²¹, Jian-Liang Li²⁰, Matthias Lienhard²⁴, Alla Mikheenko²⁶, Dennis Mulligan¹³, Ka Ming Nip¹⁹, Mihaela Pertea^{27

28}, Matthew E Ritchie^{22

29}, Andre D Sim²¹, Alison D Tang¹³, Yuk Kei Wan^{21

30}, Changqing Wang²², Brandon Y Wong^{27

28}, Chen Yang¹⁹, If Barnes⁹, Andrew E Berry⁹, Salvador Capella-Gutierrez³¹, Alyssa Cousineau³², Namrita Dhillon¹³, Jose M Fernandez-Gonzalez³¹, Luis Ferrández-Peral¹, Natàlia Garcia-Reyero³³, Stefan Götz³⁴, Carles Hernández-Ferrer³¹, Liudmyla Kondratova³⁵, Tianyuan Liu³⁶, Alessandra Martinez-Martin¹, Carlos Menor³⁴, Jorge Mestre-Tomás¹, Jonathan M Mudge⁹, Nedka G Panayotova¹⁶, Alejandro Paniagua¹, Dmitry Repchevsky³¹, Xingjie Ren³⁷, Eric Rouchka³⁸, Brandon Saint-John¹³, Enrique Sapena³⁹, Leon Sheynkman¹⁵, Melissa Laird Smith³⁸, Marie-Marthe Suner⁹, Hazuki Takahashi⁴⁰, Ingrid A Youngworth⁴¹, Piero Carninci^{40

42}, Nancy D Denslow^{10

43}, Roderic Guigó^{7

44}, Margaret E Hunter⁴⁵, Rene Maehr³², Yin Shen⁴⁶, Hagen U Tilgner⁴⁷, Barbara J Wold⁸, Christopher Vollmers⁴⁸, Adam Frankish⁴⁹, Kin Fai Au^{50

51}, Gloria M Sheynkman^{52

53

54}, Ali Mortazavi^{55

56}, Ana Conesa^{57

58}, Angela N Brooks^{59

60}

Affiliations

¹ Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain.
² Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA.
³ Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
⁴ Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA.
⁵ Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, USA.
⁶ UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA.
⁷ Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.
⁸ Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA.
⁹ European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus Hinxton, Cambridge, UK.
¹⁰ Department of Physiological Sciences, College of Veterinary Medicine, Gainesville, FL, USA.
¹¹ Cherokee Nation System Solutions, contractor to the US Geological Survey-Wetland and Aquatic Research Center, Gainesville, FL, USA.
¹² Department of Molecular Cell and Developmental Biology, University of California, Santa Cruz, Santa Cruz, CA, USA.
¹³ Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA.
¹⁴ Flomics Biotech, SL, Barcelona, Spain.
¹⁵ Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA.
¹⁶ Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, FL, USA.
¹⁷ Department of Computer Science, University of Helsinki, Helsinki, Finland.
¹⁸ Center for Bioinformatics and Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia.
¹⁹ Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada.
²⁰ Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, NC, USA.
²¹ Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore.
²² Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia.
²³ Department of Statistics and Data Science, National University of Singapore, Singapore, Singapore.
²⁴ Department Computational Molecular Biology, Max-Planck-Institute for Molecular Genetics, Berlin, Germany.
²⁵ Research Center for Genome & Medical Sciences, Tokyo Metropolitan Institute of Medical Science, Tokyo, Japan.
²⁶ Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK.
²⁷ Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
²⁸ Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
²⁹ Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia.
³⁰ Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
³¹ Barcelona Supercomputing Center, Barcelona, Spain.
³² Program in Molecular Medicine, Diabetes Center of Excellence, University of Massachusetts Chan Medical School, Worcester, MA, USA.
³³ Energy, Installations & Environment, Office of the Assistant Secretary of Defense, Washington, DC, USA.
³⁴ Biobam Bioinformatics, Valencia, Spain.
³⁵ Genetics Institute, University of Florida, Gainesville, FL, USA.
³⁶ Cardiff University, Cardiff, UK.
³⁷ Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA.
³⁸ Department of Biochemistry & Molecular Genetics, University of Louisville, Louisville, KY, USA.
³⁹ European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK.
⁴⁰ Center for Integrative Medical Sciences, Laboratory for Transcriptome Technology, RIKEN, Yokohama, Japan.
⁴¹ Department of Genetics, Stanford University, Palo Alto, CA, USA.
⁴² Human Technopole, Milano, Italy.
⁴³ Center for Environmental and Human Toxicology, Department of Physiological Sciences, University of Florida, Gainesville, FL, USA.
⁴⁴ Universitat Pompeu Fabra (UPF), Barcelona, Spain.
⁴⁵ US Geological Survey, Wetland and Aquatic Research Center, Gainesville, FL, USA.
⁴⁶ Institute for Human Genetics, Department of Neurology, University of California, San Francisco, San Francisco, CA, USA.
⁴⁷ Brain and Mind Research Institute and Center for Neurogenetics, Weill Cornell Medicine, New York City, NY, USA.
⁴⁸ Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA. vollmers@ucsc.edu.
⁴⁹ European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus Hinxton, Cambridge, UK. frankish@ebi.ac.uk.
⁵⁰ Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA. kinfai@med.umich.edu.
⁵¹ Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA. kinfai@med.umich.edu.
⁵² Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA. gs9yr@virginia.edu.
⁵³ Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA. gs9yr@virginia.edu.
⁵⁴ UVA Cancer Center, University of Virginia, Charlottesville, VA, USA. gs9yr@virginia.edu.
⁵⁵ Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA. ali.mortazavi@uci.edu.
⁵⁶ Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, USA. ali.mortazavi@uci.edu.
⁵⁷ Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain. ana.conesa@csic.es.
⁵⁸ Microbiology and Cell Science Department, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, FL, USA. ana.conesa@csic.es.
⁵⁹ UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA. anbrooks@ucsc.edu.
⁶⁰ Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA. anbrooks@ucsc.edu.

^# Contributed equally.

PMID: 38849569
PMCID: PMC11543605
DOI: 10.1038/s41592-024-02298-3

Systematic assessment of long-read RNA-seq methods for transcript identification and quantification

Francisco J Pardo-Palacios et al. Nat Methods. 2024 Jul.

. 2024 Jul;21(7):1349-1363.

doi: 10.1038/s41592-024-02298-3. Epub 2024 Jun 7.

Authors

Affiliations

¹ Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain.
² Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA.
³ Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
⁴ Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA.
⁵ Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, USA.
⁶ UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA.
⁷ Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.
⁸ Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA.
⁹ European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus Hinxton, Cambridge, UK.
¹⁰ Department of Physiological Sciences, College of Veterinary Medicine, Gainesville, FL, USA.
¹¹ Cherokee Nation System Solutions, contractor to the US Geological Survey-Wetland and Aquatic Research Center, Gainesville, FL, USA.
¹² Department of Molecular Cell and Developmental Biology, University of California, Santa Cruz, Santa Cruz, CA, USA.
¹³ Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA.
¹⁴ Flomics Biotech, SL, Barcelona, Spain.
¹⁵ Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA.
¹⁶ Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, FL, USA.
¹⁷ Department of Computer Science, University of Helsinki, Helsinki, Finland.
¹⁸ Center for Bioinformatics and Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia.
¹⁹ Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada.
²⁰ Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, NC, USA.
²¹ Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore.
²² Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia.
²³ Department of Statistics and Data Science, National University of Singapore, Singapore, Singapore.
²⁴ Department Computational Molecular Biology, Max-Planck-Institute for Molecular Genetics, Berlin, Germany.
²⁵ Research Center for Genome & Medical Sciences, Tokyo Metropolitan Institute of Medical Science, Tokyo, Japan.
²⁶ Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK.
²⁷ Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
²⁸ Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
²⁹ Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia.
³⁰ Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
³¹ Barcelona Supercomputing Center, Barcelona, Spain.
³² Program in Molecular Medicine, Diabetes Center of Excellence, University of Massachusetts Chan Medical School, Worcester, MA, USA.
³³ Energy, Installations & Environment, Office of the Assistant Secretary of Defense, Washington, DC, USA.
³⁴ Biobam Bioinformatics, Valencia, Spain.
³⁵ Genetics Institute, University of Florida, Gainesville, FL, USA.
³⁶ Cardiff University, Cardiff, UK.
³⁷ Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA.
³⁸ Department of Biochemistry & Molecular Genetics, University of Louisville, Louisville, KY, USA.
³⁹ European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK.
⁴⁰ Center for Integrative Medical Sciences, Laboratory for Transcriptome Technology, RIKEN, Yokohama, Japan.
⁴¹ Department of Genetics, Stanford University, Palo Alto, CA, USA.
⁴² Human Technopole, Milano, Italy.
⁴³ Center for Environmental and Human Toxicology, Department of Physiological Sciences, University of Florida, Gainesville, FL, USA.
⁴⁴ Universitat Pompeu Fabra (UPF), Barcelona, Spain.
⁴⁵ US Geological Survey, Wetland and Aquatic Research Center, Gainesville, FL, USA.
⁴⁶ Institute for Human Genetics, Department of Neurology, University of California, San Francisco, San Francisco, CA, USA.
⁴⁷ Brain and Mind Research Institute and Center for Neurogenetics, Weill Cornell Medicine, New York City, NY, USA.
⁴⁸ Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA. vollmers@ucsc.edu.
⁴⁹ European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus Hinxton, Cambridge, UK. frankish@ebi.ac.uk.
⁵⁰ Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA. kinfai@med.umich.edu.
⁵¹ Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA. kinfai@med.umich.edu.
⁵² Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA. gs9yr@virginia.edu.
⁵³ Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA. gs9yr@virginia.edu.
⁵⁴ UVA Cancer Center, University of Virginia, Charlottesville, VA, USA. gs9yr@virginia.edu.
⁵⁵ Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA. ali.mortazavi@uci.edu.
⁵⁶ Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, USA. ali.mortazavi@uci.edu.
⁵⁷ Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain. ana.conesa@csic.es.
⁵⁸ Microbiology and Cell Science Department, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, FL, USA. ana.conesa@csic.es.
⁵⁹ UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA. anbrooks@ucsc.edu.
⁶⁰ Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA. anbrooks@ucsc.edu.

^# Contributed equally.

PMID: 38849569
PMCID: PMC11543605
DOI: 10.1038/s41592-024-02298-3

Abstract

The Long-read RNA-Seq Genome Annotation Assessment Project Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. Using different protocols and sequencing platforms, the consortium generated over 427 million long-read sequences from complementary DNA and direct RNA datasets, encompassing human, mouse and manatee species. Developers utilized these data to address challenges in transcript isoform detection, quantification and de novo transcript detection. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. Incorporating additional orthogonal data and replicate samples is advised when aiming to detect rare and novel transcripts or using reference-free approaches. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis.

PubMed Disclaimer

Conflict of interest statement

The design of the project was discussed with ONT, PacBio and Lexogen. ONT provided partial support for flow cells and reagents. H.U.T. and A. Conesa have, in the past, presented at events organized by PacBio and have received reimbursement or support for travel, accommodation and conference fees. H.U.T. has also spoken at local ONT events during the duration of this project and received food. Unrelated to this project, the laboratory of H.U.T. has purchased reagents from Illumina, PacBio and ONT at discounted prices. S.C.-S., A.N.B. and J.G. have received reimbursement for travel, accommodation and conference fees to speak at events organized by ONT. A.N.B. is a consultant for Remix Therapeutics. A. Conesa is the founder of Biobam Bioinformatics. The other authors declare no competing interests.

Figures

**Fig. 1. Overview of the LRGASP.**
a, Data produced for LRGASP. b, Distribution of read lengths, identify Q score and sequencing depth (per biological replicate) for the WTC11 sample. c, The collaborative design of the LRGASP organizers and participants. d, Number of isoforms reported by each tool on different data types for the human WTC11 sample for Challenge 1. Number of submissions per tool, in order, n = 6, 6, 4, 1, 6, 1, 6, 3, 1, 1 and 12. e, Median TPM value reported by each tool on different data types for the human WTC11 sample for Challenge 2. Number of submissions per tool, in order, n = 11, 3, 4, 6, 1, 6 and 1. f, Number of isoforms reported by each tool on different data types for the mouse ES data for Challenge 3. Number of submissions per tool, in order, n = 6, 5, 2 and 4. g, Pairwise relative overlap of unique junction chains (UJCs) reported by each submission. The UJCs reported by a submission are used as a reference set for each row. The fraction of overlap of UJCs from the column submission is shown as a heatmap. For example, a submission that has a small subset of many other UJCs from other submissions will have a high fraction shown in the rows but a low fraction by column for that submission. Data are only shown for WTC11 submissions. h, Spearman correlation of TPM values between submissions to Challenge 2. i, Pairwise relative overlap of UJCs reported by each submission. The UJCs reported by a submission are used as a reference set for each row. The fraction of overlap of UJCs from the column submission is shown as a heatmap. Ba, Bambu; Bl, RNA-Bloom; FM, FLAMES; FR, FLAIR; IB, Iso_IB; IQ, IsoQuant; IT, IsoTools; Ly, LyRic; Ma, Mandalorion; rS, rnaSPAdes; Sp, Spectra; ST, StringTie2; TL, TALON-LAPA. The figure was partially created with BioRender.com.

**Fig. 2. Evaluation of transcript identification with a reference annotation for Challenge 1.**
a, Percentage of transcript models fully supported at 5′ ends either by reference annotation or same-sample CAGE data (left), 3′ end either by reference annotation or same-sample QuantSeq data (middle) and splice junctions (SJ) by short-read coverage or a canonical site (right). b, Agreement in transcript detection as a function of the number of detecting pipelines, c, Performance of tools based on spliced-short (top) and unspliced long SIRVs (bottom). d, Performance of tools based on simulated data. e, Performance of tools on known and novel transcripts of 50 genes manually annotated by GENCODE. f, Summary of performance metrics of tools for the cDNA-PacBio and cDNA-ONT benchmarking datasets. The color scale represents the performance value ranging from worse (dark blue) to better (light yellow). The graphic symbol indicates the ranking position of the tool for the metric represented in each row. LO, long (reads) only; LS, long and short (reads); Sen_kn, sensitivity for known transcripts; Pre_kn, precision for known transcripts; Sen_no, sensitivity for novel transcripts; Pre_no, precision for novel transcripts; 1/Red, inverse of redundancy.

**Fig. 3. Evaluation of transcript isoform quantification for Challenge 2.**
a, Cartoon diagrams to explain evaluation metrics without or with a ground truth. b–e, Overall evaluation results of eight quantification tools and seven protocols-platforms on real data with multiple replicates (b), cell mixing experiment (c), SIRV-Set 4 data (d) and simulation data (e). Box plots of evaluation metrics across various datasets, depicting the minimum, lower quartile, median, upper quartile and maximum values. Bar plots represent the mean values of evaluation metrics across diverse datasets, with error bars indicating the s.d. b, Number of submissions per tool or protocol-platform, in order, n = 36, 12, 16, 24, 4, 24, 6, and 4 per tool or n = 22, 24, 26, 18, 18, 14 and 4 per protocol-platform. c, Number of submissions per tool or per protocol-platform, in order, n = 6, 3, 4, 6, 1, 6, 1 and 1 per tool or n = 5, 5, 6, 4, 4, 3 and 1 per protocol-platform. d, Number of submissions per tool or per protocol-platform, in order, n = 36, 12, 16, 24, 4, 24, 6 and 4 per tool or n = 22, 24, 26, 18, 18, 14 and 4 per protocol-platform. e, Number of submissions per tool or per protocol-platform, in order, n = 8, 4, 2, 4, 2, 4, 1 and 2 per tool or n = 12, 6, 7, 0, 0, 0 and 2 per protocol-platform. f, Quantification tool scores under common cDNA-ONT and cDNA-PacBio platforms across various evaluation metrics, with the top three performers highlighted for each metric. g, Based on the average values of each metric across all quantification tools, scores for protocols-platforms are displayed, along with the top three performers for each metric. Blank spaces denote instances where the tool or protocols-platforms did not have participants submitting the corresponding quantitative results. h, Evaluation of quantification tools with respect to multiple transcript features, including the number of isoforms, number of exons, isoform length and a customized statistic K-value representing the complexity of exon-isoform structures. Here, the normalized MRD metric is used to evaluate the performance of quantification tools on human cDNA-PacBio simulation data. Additionally, RSEM evaluation results with respect to transcript features based on human short-read simulation data are shown as a control.

**Fig. 4. Evaluation of transcript identification without a reference annotation for Challenge 3.**
a, Number of detected transcripts and distribution of SQANTI structural categories, mouse ES cell sample. b, Number of detected transcripts and distribution of transcripts per loci, manatee sample. c, Length distribution of mouse ES cell transcripts predictions. Number of transcripts reported by each pipeline, in order, n = 23,540, 15,054, 21,312, 27,215, 21,913, 27,056, 85,720, 107,832, 192,324, 144,752, 164,117, 91,833, 28,293, 75,106, 52,944, 29,458 and 44,079. d, Length distribution of manatee transcripts predictions. Number of transcripts reported by each pipeline, in order, n = 1,911, 179,258, 176,895, 695,167, 535,845, 288,958, 63,000 and 25,643. e, Support by orthogonal data. f, BUSCO metrics. g, Performance metrics based on SIRVs. Sen, sensitivity; PDR, positive detection rate; Pre, precision; nrPred, non-redundant precision; SO, short only.

**Fig. 5. Experimental validation of known and novel isoforms.**
a, Schematic for the experimental validation pipeline. QC, quality control b, Example of a consistently detected NIC isoform (detected in over half of all LRGASP pipeline submissions), which was successfully validated by targeted PCR. The primer set amplifies a new event of exon skipping (NIC). Only transcripts above ~5 CPM and any part of the GENCODE Basic annotation are shown. c, Example of a successfully validated new terminal exon, with ONT amplicon reads shown in the IGV track (PacBio produces similar results). d, Recovery rates for GENCODE-annotated isoforms that are reference matched (known), novel and rejected. e, Recovery rates for consistently versus rarely detected isoforms for known and novel isoforms. f, Recovery rates between isoforms that are more frequently identified in ONT versus PacBio pipelines. g–i, Relationship between estimated transcript abundances (calculated as the sum of reads across all WTC11 sequencing samples) and validation success for GENCODE (g), consistent versus rare (h) and platform-preferential (i) isoforms. NV, not validated; V, validated. The number of transcripts in each category is shown in d–f. j, Fraction of validated transcripts as a function of the number of WTC11 samples in which supportive reads were observed. k, Example of two de novo isoforms in manatee validated through isoform-specific PCR amplification. Purple corresponds to the designed primers, orange to the possible amplification product associated with one isoform and black to the predicted isoforms. l, PCR validation results for manatee isoforms for seven target genes. Blue corresponds to supported transcripts and red to unsupported transcripts. The figure was partially created with BioRender.com.

**Extended Data Fig. 1. SQANTI3 classifications of LRGASP submissions on the WTC11 dataset.**
a) Comparison of the number of known genes to transcripts in those genes for the WTC11 dataset. b) Percentage of FSM (Full Splice Match) vs ISM (Incomplete Splice Match). c) Percentage of NIC (Novel In Catalog) vs NNC (Novel Not in Catalog). d) Percentage of known and novel transcripts with full support at junctions and end positions. Ba: Bambu, FM: FLAMES, FL: FLAIR, IQ: IsoQuant, IT: IsoTools, IB: Iso_IB, Ly: LyRic, Ma: Mandalorion, TL: TALON-LAPA, Sp: Spectra, ST: StringTie2.

**Extended Data Fig. 2. Percentage of transcript models with different ranges of sequence coverage by long reads.**
a) WTC11. b) H1-mix. c) Mouse ES. Ba: Bambu, FM: FLAMES, FL: FLAIR, IQ: IsoQuant, IT: IsoTools, IB: Iso_IB, Ly: LyRic, Ma: Mandalorion, TL: TALON-LAPA, Sp: Spectra, ST: StringTie2.

**Extended Data Fig. 3. Positional coverage of long unspliced SIRV transcript sequences by long reads for each sample type.**
The coverage of bases of long unspliced SIRV transcript by long reads for each sample type, grouped by sequence length range.

**Extended Data Fig. 4. Properties of GENCODE manually annotated loci for WTC11 sample.**
a) Distribution of gene expression. b) Distribution of SQANTI categories. c) Intersection of Unique Intron Chains (UIC) among experimental protocols.

**Extended Data Fig. 5. Properties of GENCODE manually annotated loci for mouse ES sample.**
a) Distribution of gene expression. b) Distribution of SQANTI categories. c) Intersection of Unique Intron Chains (UIC) among experimental protocols.

**Extended Data Fig. 6. Overall evaluation results of eight quantification tools.**
Evaluation results from seven protocols-platforms on four data scenarios: real data with multiple replicates, cell mixing experiment, SIRV-set 4 data, and simulation data. To display the evaluation results more effectively, we normalized all metrics to 0–1 range: 0 corresponds to the worst performance, and 1 corresponds to the best performance.

**Extended Data Fig. 7. Top three performance on quantification tools.**
Quantification results under six different protocols-platforms for each metric. Here, quantification tools showcase scores under six different protocols-platforms across various evaluation metrics, with the top three performers highlighted for each metric. Blank spaces denote instances where the tool or protocols-platforms did not have participants submitting the corresponding quantitative results.

**Extended Data Fig. 8. SQANTI category classification of transcript models.**
Results on transcript models detected by the same tools in Challenge 1 predictions using the reference annotation, and Challenge 3 predictions did not. Ba = Bambu, IQ = StringTie2/IsoQuant.

**Extended Data Fig. 9. Fraction of experimentally validated WTC11 transcripts.**
Experimental validation of WTC11 transcripts as a function of the total numbers of long reads that were observed across the 21 library preparations (for example, PacBio cDNA, ONT cDNA, PacBio CapTrap).

See this image and copyright information in PMC

Update of

Systematic assessment of long-read RNA-seq methods for transcript identification and quantification.
Pardo-Palacios FJ, Wang D, Reese F, Diekhans M, Carbonell-Sala S, Williams B, Loveland JE, De María M, Adams MS, Balderrama-Gutierrez G, Behera AK, Gonzalez JM, Hunt T, Lagarde J, Liang CE, Li H, Jerryd Meade M, Moraga Amador DA, Prjibelski AD, Birol I, Bostan H, Brooks AM, Hasan Çelik M, Chen Y, Du MRM, Felton C, Göke J, Hafezqorani S, Herwig R, Kawaji H, Lee J, Liang Li J, Lienhard M, Mikheenko A, Mulligan D, Ming Nip K, Pertea M, Ritchie ME, Sim AD, Tang AD, Kei Wan Y, Wang C, Wong BY, Yang C, Barnes I, Berry A, Capella S, Dhillon N, Fernandez-Gonzalez JM, Ferrández-Peral L, Garcia-Reyero N, Goetz S, Hernández-Ferrer C, Kondratova L, Liu T, Martinez-Martin A, Menor C, Mestre-Tomás J, Mudge JM, Panayotova NG, Paniagua A, Repchevsky D, Rouchka E, Saint-John B, Sapena E, Sheynkman L, Laird Smith M, Suner MM, Takahashi H, Youngworth IA, Carninci P, Denslow ND, Guigó R, Hunter ME, Tilgner HU, Wold BJ, Vollmers C, Frankish A, Fai Au K, Sheynkman GM, Mortazavi A, Conesa A, Brooks AN. Pardo-Palacios FJ, et al. bioRxiv [Preprint]. 2023 Jul 27:2023.07.25.550582. doi: 10.1101/2023.07.25.550582. bioRxiv. 2023. Update in: Nat Methods. 2024 Jul;21(7):1349-1363. doi: 10.1038/s41592-024-02298-3. PMID: 37546854 Free PMC article. Updated. Preprint.

References

1. Reese, M. G. et al. Genome annotation assessment in Drosophila melanogaster. Genome Res.10, 483–501 (2000). - DOI - PMC - PubMed
1. Guigó, R. et al. EGASP: the human ENCODE genome annotation assessment project. Genome Biol.7, S2.1–31 (2006). - DOI - PMC - PubMed
1. Engström, P. G. et al. Systematic evaluation of spliced alignment programs for RNA-seq data. Nat. Methods10, 1185–1191 (2013). - DOI - PMC - PubMed
1. Steijger, T. et al. Assessment of transcript reconstruction methods for RNA-seq. Nat. Methods10, 1177–1184 (2013). - DOI - PMC - PubMed
1. Carbonell-Sala, S. et al. CapTrap-Seq: a platform-agnostic and quantitative approach for high-fidelity full-length RNA transcript sequencing. Preprint at bioRxiv10.1101/2023.06.16.543444 (2023). - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Systematic assessment of long-read RNA-seq methods for transcript identification and quantification

Affiliations

Systematic assessment of long-read RNA-seq methods for transcript identification and quantification

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Update of

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources