. 2019 Jun 24;20(1):129.

doi: 10.1186/s13059-019-1727-y.

Performance of neural network basecalling tools for Oxford Nanopore sequencing

Ryan R Wick¹, Louise M Judd², Kathryn E Holt^{2

3}

Affiliations

¹ Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, 3004, Australia. rrwick@gmail.com.
² Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, 3004, Australia.
³ London School of Hygiene & Tropical Medicine, London, WC1E 7HT, UK.

PMID: 31234903
PMCID: PMC6591954
DOI: 10.1186/s13059-019-1727-y

Performance of neural network basecalling tools for Oxford Nanopore sequencing

Ryan R Wick et al. Genome Biol. 2019.

. 2019 Jun 24;20(1):129.

doi: 10.1186/s13059-019-1727-y.

Authors

Ryan R Wick¹, Louise M Judd², Kathryn E Holt^{2

3}

Affiliations

¹ Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, 3004, Australia. rrwick@gmail.com.
² Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, 3004, Australia.
³ London School of Hygiene & Tropical Medicine, London, WC1E 7HT, UK.

PMID: 31234903
PMCID: PMC6591954
DOI: 10.1186/s13059-019-1727-y

Abstract

Background: Basecalling, the computational process of translating raw electrical signal to nucleotide sequence, is of critical importance to the sequencing platforms produced by Oxford Nanopore Technologies (ONT). Here, we examine the performance of different basecalling tools, looking at accuracy at the level of bases within individual reads and at majority-rule consensus basecalls in an assembly. We also investigate some additional aspects of basecalling: training using a taxon-specific dataset, using a larger neural network model and improving consensus basecalls in an assembly by additional signal-level analysis with Nanopolish.

Results: Training basecallers on taxon-specific data results in a significant boost in consensus accuracy, mostly due to the reduction of errors in methylation motifs. A larger neural network is able to improve both read and consensus accuracy, but at a cost to speed. Improving consensus sequences ('polishing') with Nanopolish somewhat negates the accuracy differences in basecallers, but pre-polish accuracy does have an effect on post-polish accuracy.

Conclusions: Basecalling accuracy has seen significant improvements over the last 2 years. The current version of ONT's Guppy basecaller performs well overall, with good accuracy and fast performance. If higher accuracy is required, users should consider producing a custom model using a larger neural network and/or training data from the same species.

Keywords: Basecalling; Long-read sequencing; Oxford Nanopore.

PubMed Disclaimer

Conflict of interest statement

In July 2018, Ryan Wick attended a hackathon in Bermuda at ONT’s expense. ONT also paid his travel, accommodation and registration to attend the London Calling (2017) and Nanopore Community Meeting (2017) events as an invited speaker.

Figures

**Fig. 1**
Read accuracy, consensus accuracy and speed performance for each basecaller version, plotted against the release date (version numbers specified in Additional file 2: Table S3). Accuracies are expressed as qscores (also known as Phred quality scores) on a logarithmic scale where Q10 = 90%, Q20 = 99%, Q30 = 99.9%, etc. Each basecaller was run using its default model, except for Guppy v2.2.3 which was also run with its included flip-flop model and our two custom-trained models

**Fig. 2**
Read and consensus accuracy from Guppy v2.2.3 for a variety of genomes using different models: the default RGRGR model, the included flip-flop model and the two custom models we trained for this study. Both custom models used the same training set which focused primarily on *K. pneumoniae*, secondarily on the Enterobacteriaceae family and lastly on the Proteobacteria phylum

**Fig. 3**
Consensus errors per basecaller for the *K. pneumoniae* benchmarking set, broken down by type. Dcm refers to errors occurring in the CCAGG/CCTGG Dcm motif. Homopolymer errors are changes in the length of a homopolymer three or more bases in length (in the reference). This plot is limited to basecallers/versions with less than 1.2% consensus error and excludes redundant results from similar versions

**Fig. 4**
Consensus accuracy before (red) and after Nanopolish (blue) for the assemblies of *K. pneumoniae* benchmarking set

See this image and copyright information in PMC

References

1. Charalampous T, Richardson H, Kay GL, Baldan R, Jeanes C, Rae D, Grundy S, Turner DJ, Wain J, Leggett RM, Livermore DM, O’Grady J. Rapid diagnosis of lower respiratory infection using Nanopore-based clinical metagenomics. bioRxiv. 2018:387548. 10.1101/387548.
1. Graves A, Fernández S, Gomez F, Schmidhuber J. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In: ICML ’06 Proceedings of the 23rd International Conference on Machine Learning: 2006. p. 369–76. 10.1145/1143844.1143891. http://arxiv.org/abs/1607.03597.
1. Teng H, Cao MD, Hall MB, Duarte T, Wang S, Coin LJM. Chiron: Translating nanopore raw signal directly into nucleotide sequence using deep learning. GigaScience. 2018;7(5):1–9. doi: 10.1093/gigascience/giy037. - DOI - PMC - PubMed
1. Boža V, Brejová B, Vinař T. DeepNano: Deep recurrent neural networks for base calling in MinION Nanopore reads. PLoS ONE. 2017;12(6):1–13. doi: 10.1371/journal.pone.0178751. - DOI - PMC - PubMed
1. Stoiber M, Brown J. BasecRAWller: Streaming nanopore basecalling directly from raw signal. bioRxiv. 2017:1–15. 10.1101/133058.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Medical
- ClinicalTrials.gov

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Performance of neural network basecalling tools for Oxford Nanopore sequencing

Affiliations

Performance of neural network basecalling tools for Oxford Nanopore sequencing

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical