Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jun 24;20(1):129.
doi: 10.1186/s13059-019-1727-y.

Performance of neural network basecalling tools for Oxford Nanopore sequencing

Affiliations

Performance of neural network basecalling tools for Oxford Nanopore sequencing

Ryan R Wick et al. Genome Biol. .

Abstract

Background: Basecalling, the computational process of translating raw electrical signal to nucleotide sequence, is of critical importance to the sequencing platforms produced by Oxford Nanopore Technologies (ONT). Here, we examine the performance of different basecalling tools, looking at accuracy at the level of bases within individual reads and at majority-rule consensus basecalls in an assembly. We also investigate some additional aspects of basecalling: training using a taxon-specific dataset, using a larger neural network model and improving consensus basecalls in an assembly by additional signal-level analysis with Nanopolish.

Results: Training basecallers on taxon-specific data results in a significant boost in consensus accuracy, mostly due to the reduction of errors in methylation motifs. A larger neural network is able to improve both read and consensus accuracy, but at a cost to speed. Improving consensus sequences ('polishing') with Nanopolish somewhat negates the accuracy differences in basecallers, but pre-polish accuracy does have an effect on post-polish accuracy.

Conclusions: Basecalling accuracy has seen significant improvements over the last 2 years. The current version of ONT's Guppy basecaller performs well overall, with good accuracy and fast performance. If higher accuracy is required, users should consider producing a custom model using a larger neural network and/or training data from the same species.

Keywords: Basecalling; Long-read sequencing; Oxford Nanopore.

PubMed Disclaimer

Conflict of interest statement

In July 2018, Ryan Wick attended a hackathon in Bermuda at ONT’s expense. ONT also paid his travel, accommodation and registration to attend the London Calling (2017) and Nanopore Community Meeting (2017) events as an invited speaker.

Figures

Fig. 1
Fig. 1
Read accuracy, consensus accuracy and speed performance for each basecaller version, plotted against the release date (version numbers specified in Additional file 2: Table S3). Accuracies are expressed as qscores (also known as Phred quality scores) on a logarithmic scale where Q10 = 90%, Q20 = 99%, Q30 = 99.9%, etc. Each basecaller was run using its default model, except for Guppy v2.2.3 which was also run with its included flip-flop model and our two custom-trained models
Fig. 2
Fig. 2
Read and consensus accuracy from Guppy v2.2.3 for a variety of genomes using different models: the default RGRGR model, the included flip-flop model and the two custom models we trained for this study. Both custom models used the same training set which focused primarily on K. pneumoniae, secondarily on the Enterobacteriaceae family and lastly on the Proteobacteria phylum
Fig. 3
Fig. 3
Consensus errors per basecaller for the K. pneumoniae benchmarking set, broken down by type. Dcm refers to errors occurring in the CCAGG/CCTGG Dcm motif. Homopolymer errors are changes in the length of a homopolymer three or more bases in length (in the reference). This plot is limited to basecallers/versions with less than 1.2% consensus error and excludes redundant results from similar versions
Fig. 4
Fig. 4
Consensus accuracy before (red) and after Nanopolish (blue) for the assemblies of K. pneumoniae benchmarking set

Similar articles

Cited by

References

    1. Charalampous T, Richardson H, Kay GL, Baldan R, Jeanes C, Rae D, Grundy S, Turner DJ, Wain J, Leggett RM, Livermore DM, O’Grady J. Rapid diagnosis of lower respiratory infection using Nanopore-based clinical metagenomics. bioRxiv. 2018:387548. 10.1101/387548.
    1. Graves A, Fernández S, Gomez F, Schmidhuber J. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In: ICML ’06 Proceedings of the 23rd International Conference on Machine Learning: 2006. p. 369–76. 10.1145/1143844.1143891. http://arxiv.org/abs/1607.03597.
    1. Teng H, Cao MD, Hall MB, Duarte T, Wang S, Coin LJM. Chiron: Translating nanopore raw signal directly into nucleotide sequence using deep learning. GigaScience. 2018;7(5):1–9. doi: 10.1093/gigascience/giy037. - DOI - PMC - PubMed
    1. Boža V, Brejová B, Vinař T. DeepNano: Deep recurrent neural networks for base calling in MinION Nanopore reads. PLoS ONE. 2017;12(6):1–13. doi: 10.1371/journal.pone.0178751. - DOI - PMC - PubMed
    1. Stoiber M, Brown J. BasecRAWller: Streaming nanopore basecalling directly from raw signal. bioRxiv. 2017:1–15. 10.1101/133058.

Publication types