Performance of neural network basecalling tools for Oxford Nanopore sequencing
- PMID: 31234903
- PMCID: PMC6591954
- DOI: 10.1186/s13059-019-1727-y
Performance of neural network basecalling tools for Oxford Nanopore sequencing
Abstract
Background: Basecalling, the computational process of translating raw electrical signal to nucleotide sequence, is of critical importance to the sequencing platforms produced by Oxford Nanopore Technologies (ONT). Here, we examine the performance of different basecalling tools, looking at accuracy at the level of bases within individual reads and at majority-rule consensus basecalls in an assembly. We also investigate some additional aspects of basecalling: training using a taxon-specific dataset, using a larger neural network model and improving consensus basecalls in an assembly by additional signal-level analysis with Nanopolish.
Results: Training basecallers on taxon-specific data results in a significant boost in consensus accuracy, mostly due to the reduction of errors in methylation motifs. A larger neural network is able to improve both read and consensus accuracy, but at a cost to speed. Improving consensus sequences ('polishing') with Nanopolish somewhat negates the accuracy differences in basecallers, but pre-polish accuracy does have an effect on post-polish accuracy.
Conclusions: Basecalling accuracy has seen significant improvements over the last 2 years. The current version of ONT's Guppy basecaller performs well overall, with good accuracy and fast performance. If higher accuracy is required, users should consider producing a custom model using a larger neural network and/or training data from the same species.
Keywords: Basecalling; Long-read sequencing; Oxford Nanopore.
Conflict of interest statement
In July 2018, Ryan Wick attended a hackathon in Bermuda at ONT’s expense. ONT also paid his travel, accommodation and registration to attend the London Calling (2017) and Nanopore Community Meeting (2017) events as an invited speaker.
Figures




Similar articles
-
NanoReviser: An Error-Correction Tool for Nanopore Sequencing Based on a Deep Learning Algorithm.Front Genet. 2020 Aug 12;11:900. doi: 10.3389/fgene.2020.00900. eCollection 2020. Front Genet. 2020. PMID: 32903372 Free PMC article.
-
Species-specific basecallers improve actual accuracy of nanopore sequencing in plants.Plant Methods. 2022 Dec 14;18(1):137. doi: 10.1186/s13007-022-00971-2. Plant Methods. 2022. PMID: 36517904 Free PMC article.
-
SACall: A Neural Network Basecaller for Oxford Nanopore Sequencing Data Based on Self-Attention Mechanism.IEEE/ACM Trans Comput Biol Bioinform. 2022 Jan-Feb;19(1):614-623. doi: 10.1109/TCBB.2020.3039244. Epub 2022 Feb 3. IEEE/ACM Trans Comput Biol Bioinform. 2022. PMID: 33211664
-
Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions.Brief Bioinform. 2019 Jul 19;20(4):1542-1559. doi: 10.1093/bib/bby017. Brief Bioinform. 2019. PMID: 29617724 Free PMC article. Review.
-
From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy.Genome Biol. 2018 Jul 13;19(1):90. doi: 10.1186/s13059-018-1462-9. Genome Biol. 2018. PMID: 30005597 Free PMC article. Review.
Cited by
-
NanoReviser: An Error-Correction Tool for Nanopore Sequencing Based on a Deep Learning Algorithm.Front Genet. 2020 Aug 12;11:900. doi: 10.3389/fgene.2020.00900. eCollection 2020. Front Genet. 2020. PMID: 32903372 Free PMC article.
-
Nearly Complete Genome Sequence of Raoultella ornithinolytica Strain MQB_Silv_108, Carrying an Uncommon Extended-Spectrum-β-Lactamase-like blaBEL Gene.Microbiol Resour Announc. 2022 Nov 17;11(11):e0101222. doi: 10.1128/mra.01012-22. Epub 2022 Oct 31. Microbiol Resour Announc. 2022. PMID: 36314936 Free PMC article.
-
Complete Genome Sequence of Methylosinus sp. Strain C49, a Methane-Oxidizing Bacterium Harboring phaABC Genes for Polyhydroxyalkanoate Synthesis.Microbiol Resour Announc. 2020 Jul 2;9(27):e00113-20. doi: 10.1128/MRA.00113-20. Microbiol Resour Announc. 2020. PMID: 32616630 Free PMC article.
-
Whole-Genome Sequences of Antibiotic-Resistant Trueperella pyogenes Isolates from Surgical Site Infections in Dairy Cows in Switzerland.Microbiol Resour Announc. 2022 Dec 15;11(12):e0086522. doi: 10.1128/mra.00865-22. Epub 2022 Nov 15. Microbiol Resour Announc. 2022. PMID: 36377956 Free PMC article.
-
The first Chromosomal-level genome assembly of Sageretia thea using Nanopore long reads and Pore-C technology.Sci Data. 2024 Sep 4;11(1):959. doi: 10.1038/s41597-024-03798-9. Sci Data. 2024. PMID: 39242678 Free PMC article.
References
-
- Charalampous T, Richardson H, Kay GL, Baldan R, Jeanes C, Rae D, Grundy S, Turner DJ, Wain J, Leggett RM, Livermore DM, O’Grady J. Rapid diagnosis of lower respiratory infection using Nanopore-based clinical metagenomics. bioRxiv. 2018:387548. 10.1101/387548.
-
- Graves A, Fernández S, Gomez F, Schmidhuber J. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In: ICML ’06 Proceedings of the 23rd International Conference on Machine Learning: 2006. p. 369–76. 10.1145/1143844.1143891. http://arxiv.org/abs/1607.03597.
-
- Stoiber M, Brown J. BasecRAWller: Streaming nanopore basecalling directly from raw signal. bioRxiv. 2017:1–15. 10.1101/133058.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical