Review

. 2019 Nov 25;374(1786):20190097.

doi: 10.1098/rstb.2019.0097. Epub 2019 Oct 7.

Realizing the potential of full-length transcriptome sequencing

Ashley Byrne¹, Charles Cole², Roger Volden², Christopher Vollmers²

Affiliations

¹ Department of Molecular, Cellular, and Developmental Biology, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
² Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA.

PMID: 31587638
PMCID: PMC6792442
DOI: 10.1098/rstb.2019.0097

Review

Realizing the potential of full-length transcriptome sequencing

Ashley Byrne et al. Philos Trans R Soc Lond B Biol Sci. 2019.

. 2019 Nov 25;374(1786):20190097.

doi: 10.1098/rstb.2019.0097. Epub 2019 Oct 7.

Authors

Ashley Byrne¹, Charles Cole², Roger Volden², Christopher Vollmers²

Affiliations

¹ Department of Molecular, Cellular, and Developmental Biology, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
² Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA.

PMID: 31587638
PMCID: PMC6792442
DOI: 10.1098/rstb.2019.0097

Abstract

Long-read sequencing holds great potential for transcriptome analysis because it offers researchers an affordable method to annotate the transcriptomes of non-model organisms. This, in turn, will greatly benefit future work on less-researched organisms like unicellular eukaryotes that cannot rely on large consortia to generate these transcriptome annotations. However, to realize this potential, several remaining molecular and computational challenges will have to be overcome. In this review, we have outlined the limitations of short-read sequencing technology and how long-read sequencing technology overcomes these limitations. We have also highlighted the unique challenges still present for long-read sequencing technology and provided some suggestions on how to overcome these challenges going forward. This article is part of a discussion meeting issue 'Single cell ecology'.

Keywords: Oxford Nanopore Technologies; Pacific Biosciences; long-read sequencing; transcriptome analysis.

PubMed Disclaimer

Conflict of interest statement

Some of the methods we discuss in this review include our own and we have filed patent applications on aspects on them.

Figures

**Figure 1.**
Fundamental difference between short- and long-read sequencing of transcripts. Short RNA-seq reads only capture small fragments of transcripts. RNA-seq data, therefore, lacks unambiguous isoform data leading to the inference of many erroneous isoforms. Long-read full-length cDNA data captures transcripts end-to-end making isoform inference unambiguous.

**Figure 2.**
Long-read transcriptome sequencing approaches do not cover long transcripts. Swarmplots of length distributions of 1000 randomly sampled PacBio [9], ONT dRNA and cDNA [28] reads covering the GM12878 (human lymphoblast cell line) transcriptome. These distributions are not representative of the length distribution of the human transcriptome as annotated by GENCODE. *While we show the most recent dataset on GM12878 we could find for PacBio technology it is several years old and might not be fully representative of current platform performance.

**Figure 3.**
Error-prone reads pose analysis challenge. Representative alignments of ONT cDNA [28] reads. Thirty read alignments (grey) to the first two exons of the CD19 gene (dark blue) are shown. Read alignments contain many insertions (orange), mismatches (red) and deletions (thin line) within exons. These errors complicate the detection of exact transcript sequences and exact positions of splice sites, TSSs and polyA sites.

**Figure 4.**
Analysis challenges of long-read full-length sequencing. A simplified schematic shows the steps required to extract information out of long-read sequencing data. Each read has to be aligned, ideally in a allele-aware manner to the genome it originated from. Read alignments then have to be analysed to identify RNA modifications as well as new isoform features that are missing in the current transcriptome annotation. For each allele, reads then have to be grouped into isoforms which allows isoform identification and quantification. For real datasets, all these steps have to take into account the often substantial rates of sequencing errors and incomplete reads in long-read sequencing. These will complicate all steps of the analysis.

See this image and copyright information in PMC

References

1. Salzberg SL. 2019. Next-generation genome annotation: we still struggle to get it right. Genome Biol. 20, 92 (10.1186/s13059-019-1715-2) - DOI - PMC - PubMed
1. Jain M, et al. 2018. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol. 36, 338–345. (10.1038/nbt.4060) - DOI - PMC - PubMed
1. Putnam NH, et al. 2016. Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res. 26, 342–350. (10.1101/gr.193474.115) - DOI - PMC - PubMed
1. Dixon JR, et al. 2018. Integrative detection and analysis of structural variation in cancer genomes. Nat. Genet. 50, 1388–1398. (10.1038/s41588-018-0195-8) - DOI - PMC - PubMed
1. Adams MD, Dubnick M, Kerlavage AR, Moreno R, Kelley JM, Utterback TR, Nagle JW, Fields C, Venter JC. 1992. Sequence identification of 2,375 human brain genes. Nature 355, 632–634. (10.1038/355632a0) - DOI - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions

Grants and funding

T32 HG008345/HG/NHGRI NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Realizing the potential of full-length transcriptome sequencing

Affiliations

Realizing the potential of full-length transcriptome sequencing

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources