. 2017 Feb 3:6:100.

doi: 10.12688/f1000research.10571.2. eCollection 2017.

Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis

Jason L Weirather^#¹, Mariateresa de Cesare^#², Yunhao Wang^#^{1

3

4}, Paolo Piazza², Vittorio Sebastiano^{5

6}, Xiu-Jie Wang³, David Buck², Kin Fai Au^{1

7}

Affiliations

¹ Department of Internal Medicine, University of Iowa, Iowa City, IA, USA.
² Oxford Genomics Centre, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.
³ Key laboratory of Genetics Network Biology, Collaborative Innovation Center of Genetics and Development, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China.
⁴ University of Chinese Academy of Sciences, Beijing, China.
⁵ Institute for Stem Cell Biology and Regenerative Medicine, Stanford University, Stanford, CA, USA.
⁶ Department of Obstetrics and Gynecology, Stanford University, Stanford, CA, USA.
⁷ Department of Biostatistics, University of Iowa, Iowa City, USA.

^# Contributed equally.

PMID: 28868132
PMCID: PMC5553090
DOI: 10.12688/f1000research.10571.2

Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis

Jason L Weirather et al. F1000Res. 2017.

. 2017 Feb 3:6:100.

doi: 10.12688/f1000research.10571.2. eCollection 2017.

Authors

Jason L Weirather^#¹, Mariateresa de Cesare^#², Yunhao Wang^#^{1

3

4}, Paolo Piazza², Vittorio Sebastiano^{5

6}, Xiu-Jie Wang³, David Buck², Kin Fai Au^{1

7}

Affiliations

¹ Department of Internal Medicine, University of Iowa, Iowa City, IA, USA.
² Oxford Genomics Centre, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.
³ Key laboratory of Genetics Network Biology, Collaborative Innovation Center of Genetics and Development, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China.
⁴ University of Chinese Academy of Sciences, Beijing, China.
⁵ Institute for Stem Cell Biology and Regenerative Medicine, Stanford University, Stanford, CA, USA.
⁶ Department of Obstetrics and Gynecology, Stanford University, Stanford, CA, USA.
⁷ Department of Biostatistics, University of Iowa, Iowa City, USA.

^# Contributed equally.

PMID: 28868132
PMCID: PMC5553090
DOI: 10.12688/f1000research.10571.2

Abstract

Background: Given the demonstrated utility of Third Generation Sequencing [Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT)] long reads in many studies, a comprehensive analysis and comparison of their data quality and applications is in high demand. Methods: Based on the transcriptome sequencing data from human embryonic stem cells, we analyzed multiple data features of PacBio and ONT, including error pattern, length, mappability and technical improvements over previous platforms. We also evaluated their application to transcriptome analyses, such as isoform identification and quantification and characterization of transcriptome complexity, by comparing the performance of size-selected PacBio, non-size-selected ONT and their corresponding Hybrid-Seq strategies (PacBio+Illumina and ONT+Illumina). Results: PacBio shows overall better data quality, while ONT provides a higher yield. As with data quality, PacBio performs marginally better than ONT in most aspects for both long reads only and Hybrid-Seq strategies in transcriptome analysis. In addition, Hybrid-Seq shows superior performance over long reads only in most transcriptome analyses. Conclusions: Both PacBio and ONT sequencing are suitable for full-length single-molecule transcriptome analysis. As this first use of ONT reads in a Hybrid-Seq analysis has shown, both PacBio and ONT can benefit from a combined Illumina strategy. The tools and analytical methods developed here provide a resource for future applications and evaluations of these rapidly-changing technologies.

Keywords: Oxford Nanopore Technologies; PacBio; Third Generation Sequencing; Transcriptome.

PubMed Disclaimer

Conflict of interest statement

Competing interests: No competing interests were disclosed.

Figures

**Figure 1.. Length distribution of reads.**
The length distribution of Oxford Nanopore Technologies (ONT) 2D and 1D reads (top) and Pacific Biosciences (PacBio) CCS and subreads (bottom). Aligned reads are color-coded to indicate fraction of reads that are: single best alignments (gray), gapped alignments consisting of multiple paths (red), self-chimeric alignments (purple) where different read segments map to overlapping sequences, and trans-chimeric alignments (blue) where read segments map to different loci; white color represents unaligned reads. The leftmost bar represents all reads, the middle portion reads from 0–4kb in length, and the rightmost are reads greater than 4kb. PacBio libraries were size-selected, while ONT libraries were not; this provides PacBio with a larger proportion of longer reads. The total number of reads sequenced and the number of aligned reads from each sequencing platform are available in Supplementary Table 2.

**Figure 2.. Mappability of different length bins.**
The leftmost bar represents the fraction of the mappable read length out of the total read length for all reads. The middle section shows the mappable fraction for 500bp increments ranging from 0–4kb read lengths, and the rightmost bar represents the mappable fraction of reads greater than 4kb. ONT: non-size-selected Oxford Nanopore Technologies reads; PacBio: size-selected Pacific Biosciences reads. The numbers of aligned reads contributing to the box plots in each panel are listed above each panel: total aligned reads, aligned reads <4kb and aligned reads >4kb (from left to right).

**Figure 3.. Context-specific errors.**
Context specific-errors are shown for Oxford Nanopore Technologies (ONT) 2D and 1D reads (top), and Pacific Biosciences (PacBio) CCS and subreads (bottom). The error types shown are insertions, deletions and mismatches. For insertions, the large base above the plot indicates the inserted base, and for deletions, the deleted base. For mismatch errors, the large base to the left indicates the expected reference base, and the large base above indicates the base observed in the read. A block of color tiles shows the error frequency within specific contexts for each error; the small base to the left of the tiles indicates the base preceding the error, and the small base above is the base following error. Error frequency is plotted on separate scales for insertions, deletions, and mismatches. Homopolymer error patterns are highlighted with a bold cross- or L-shaped outlines in the ONT 2D, PacBio CCS and PacBio Subreads plots. Context-specific insertions and mismatches of interest in the ONT 1D, 2D and PacBio CCS reads are highlighted by a bold outlines. For a better contrast of lower error rate in PacBio CCS reads and ONT 2D reads, Supplementary Figure S4 displays each result with its own scale.

**Figure 4.. Isoform identification in human embryonic stem cells.**
( a) Length distribution of isoforms identified by full-length by long read only and Hybrid-Seq strategies. ( b) Numbers of identified isoforms with single exon (singleton isoform) and multiple exons (multi-exon isoform). ( c) Overlap between isoforms identified by two Hybrid-Seq strategies. ( d) Accuracy of splice sites detected by four strategies. Perfect means the detected splice sites exactly match known splice sites annotated by Gencode (version 24). Imperfect means the detected splice sites are shorter or longer than known splice sites annotated by Gencode (version 24). ( e) Overlap between novel isoforms identified by two Hybrid-Seq strategies. ( f) Numbers of identified isoforms with different ratios of repetitive elements. ONT: Oxford Nanopore Technologies; PacBio: Pacific Biosciences.

**Figure 5.. Estimation errors of isoform abundance estimation in Spike-in RNA Variant data.**
The X axis shows 7 strategies. The label “correct”, “insufficient” and “over-annotated” in parentheses represent three different SIRV annotation libraries, respectively. The Y axis shows the euclidean distance between real relative expression percentage (1/68≈0.15) and estimated relative expression percentage (for more details see Methods). ONT: Oxford Nanopore Technologies; PacBio: Pacific Biosciences.

**Figure 6.. Numbers of different alternative splicing (AS) events in human embryonic stem cells transcriptome.**
A5SS: alternative 5’ splicing site; A3SS: alternative 3’ splicing site; ES: exon skipping; RI: retained intron; MXE: mutually exclusive exons; ONT: Oxford Nanopore Technologies; PacBio: Pacific Biosciences.

**Figure 7.. Functional analysis of identified isoforms.**
( a) Feature statistics of isoforms annotated by Gencode (version 24). ( b) Length distribution of open reading frames (ORFs) of novel isoforms identified by two Hybrid-Seq strategies. ( c) Gene enrichment analysis of genes with at least one novel isoform identified by two Hybrid-Seq strategies. ( d) Five novel isoforms (red tracks) of the human embryonic stem cell-relevant gene *ESRG* were identified by two Hybrid-Seq strategies. The topmost isoform (blue track) is annotated by Gencode (version 24). ESRG: Embryonic Stem Cell Related Gene; ONT: Oxford Nanopore Technologies; PacBio: Pacific Biosciences.

See this image and copyright information in PMC

References

1. McCarthy A: Third generation DNA sequencing: pacific biosciences' single molecule real time technology. Chem Biol. 2010;17(7):675–6. 10.1016/j.chembiol.2010.07.004 - DOI - PubMed
1. Laver T, Harrison J, O'Neill PA, et al. : Assessing the performance of the Oxford Nanopore Technologies MinION. Biomol Detect Quantif. 2015;3:1–8. 10.1016/j.bdq.2015.02.001 - DOI - PMC - PubMed
1. Rhoads A, Au KF: PacBio Sequencing and Its Applications. Genomics Proteomics Bioinformatics. 2015;13(5):278–89. 10.1016/j.gpb.2015.08.002 - DOI - PMC - PubMed
1. Lu H, Giordano F, Ning Z: Oxford Nanopore MinION Sequencing and Genome Assembly. Genomics Proteomics Bioinformatics. 2016;14(5):265–79. 10.1016/j.gpb.2016.05.004 - DOI - PMC - PubMed
1. Reuter JA, Spacek DV, Snyder MP: High-throughput sequencing technologies. Mol Cell. 2015;58(4):586–97. 10.1016/j.molcel.2015.05.004 - DOI - PMC - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis

Affiliations

Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials