Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2019 Nov 25;374(1786):20190097.
doi: 10.1098/rstb.2019.0097. Epub 2019 Oct 7.

Realizing the potential of full-length transcriptome sequencing

Affiliations
Review

Realizing the potential of full-length transcriptome sequencing

Ashley Byrne et al. Philos Trans R Soc Lond B Biol Sci. .

Abstract

Long-read sequencing holds great potential for transcriptome analysis because it offers researchers an affordable method to annotate the transcriptomes of non-model organisms. This, in turn, will greatly benefit future work on less-researched organisms like unicellular eukaryotes that cannot rely on large consortia to generate these transcriptome annotations. However, to realize this potential, several remaining molecular and computational challenges will have to be overcome. In this review, we have outlined the limitations of short-read sequencing technology and how long-read sequencing technology overcomes these limitations. We have also highlighted the unique challenges still present for long-read sequencing technology and provided some suggestions on how to overcome these challenges going forward. This article is part of a discussion meeting issue 'Single cell ecology'.

Keywords: Oxford Nanopore Technologies; Pacific Biosciences; long-read sequencing; transcriptome analysis.

PubMed Disclaimer

Conflict of interest statement

Some of the methods we discuss in this review include our own and we have filed patent applications on aspects on them.

Figures

Figure 1.
Figure 1.
Fundamental difference between short- and long-read sequencing of transcripts. Short RNA-seq reads only capture small fragments of transcripts. RNA-seq data, therefore, lacks unambiguous isoform data leading to the inference of many erroneous isoforms. Long-read full-length cDNA data captures transcripts end-to-end making isoform inference unambiguous.
Figure 2.
Figure 2.
Long-read transcriptome sequencing approaches do not cover long transcripts. Swarmplots of length distributions of 1000 randomly sampled PacBio [9], ONT dRNA and cDNA [28] reads covering the GM12878 (human lymphoblast cell line) transcriptome. These distributions are not representative of the length distribution of the human transcriptome as annotated by GENCODE. *While we show the most recent dataset on GM12878 we could find for PacBio technology it is several years old and might not be fully representative of current platform performance.
Figure 3.
Figure 3.
Error-prone reads pose analysis challenge. Representative alignments of ONT cDNA [28] reads. Thirty read alignments (grey) to the first two exons of the CD19 gene (dark blue) are shown. Read alignments contain many insertions (orange), mismatches (red) and deletions (thin line) within exons. These errors complicate the detection of exact transcript sequences and exact positions of splice sites, TSSs and polyA sites.
Figure 4.
Figure 4.
Analysis challenges of long-read full-length sequencing. A simplified schematic shows the steps required to extract information out of long-read sequencing data. Each read has to be aligned, ideally in a allele-aware manner to the genome it originated from. Read alignments then have to be analysed to identify RNA modifications as well as new isoform features that are missing in the current transcriptome annotation. For each allele, reads then have to be grouped into isoforms which allows isoform identification and quantification. For real datasets, all these steps have to take into account the often substantial rates of sequencing errors and incomplete reads in long-read sequencing. These will complicate all steps of the analysis.

References

    1. Salzberg SL. 2019. Next-generation genome annotation: we still struggle to get it right. Genome Biol. 20, 92 (10.1186/s13059-019-1715-2) - DOI - PMC - PubMed
    1. Jain M, et al. 2018. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol. 36, 338–345. (10.1038/nbt.4060) - DOI - PMC - PubMed
    1. Putnam NH, et al. 2016. Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res. 26, 342–350. (10.1101/gr.193474.115) - DOI - PMC - PubMed
    1. Dixon JR, et al. 2018. Integrative detection and analysis of structural variation in cancer genomes. Nat. Genet. 50, 1388–1398. (10.1038/s41588-018-0195-8) - DOI - PMC - PubMed
    1. Adams MD, Dubnick M, Kerlavage AR, Moreno R, Kelley JM, Utterback TR, Nagle JW, Fields C, Venter JC. 1992. Sequence identification of 2,375 human brain genes. Nature 355, 632–634. (10.1038/355632a0) - DOI - PubMed

Publication types

MeSH terms

LinkOut - more resources