NanoReviser: An Error-Correction Tool for Nanopore Sequencing Based on a Deep Learning Algorithm
- PMID: 32903372
- PMCID: PMC7434944
- DOI: 10.3389/fgene.2020.00900
NanoReviser: An Error-Correction Tool for Nanopore Sequencing Based on a Deep Learning Algorithm
Abstract
Nanopore sequencing is regarded as one of the most promising third-generation sequencing (TGS) technologies. Since 2014, Oxford Nanopore Technologies (ONT) has developed a series of devices based on nanopore sequencing to produce very long reads, with an expected impact on genomics. However, the nanopore sequencing reads are susceptible to a fairly high error rate owing to the difficulty in identifying the DNA bases from the complex electrical signals. Although several basecalling tools have been developed for nanopore sequencing over the past years, it is still challenging to correct the sequences after applying the basecalling procedure. In this study, we developed an open-source DNA basecalling reviser, NanoReviser, based on a deep learning algorithm to correct the basecalling errors introduced by current basecallers provided by default. In our module, we re-segmented the raw electrical signals based on the basecalled sequences provided by the default basecallers. By employing convolution neural networks (CNNs) and bidirectional long short-term memory (Bi-LSTM) networks, we took advantage of the information from the raw electrical signals and the basecalled sequences from the basecallers. Our results showed NanoReviser, as a post-basecalling reviser, significantly improving the basecalling quality. After being trained on standard ONT sequencing reads from public E. coli and human NA12878 datasets, NanoReviser reduced the sequencing error rate by over 5% for both the E. coli dataset and the human dataset. The performance of NanoReviser was found to be better than those of all current basecalling tools. Furthermore, we analyzed the modified bases of the E. coli dataset and added the methylation information to train our module. With the methylation annotation, NanoReviser reduced the error rate by 7% for the E. coli dataset and specifically reduced the error rate by over 10% for the regions of the sequence rich in methylated bases. To the best of our knowledge, NanoReviser is the first post-processing tool after basecalling to accurately correct the nanopore sequences without the time-consuming procedure of building the consensus sequence. The NanoReviser package is freely available at https://github.com/pkubioinformatics/NanoReviser.
Keywords: DNA methylation; convolution neural network; deep learning; long short-term memory networks; nanopore sequencing; sequencing revising.
Copyright © 2020 Wang, Qu, Yang, Wang and Zhu.
Figures



Similar articles
-
Performance of neural network basecalling tools for Oxford Nanopore sequencing.Genome Biol. 2019 Jun 24;20(1):129. doi: 10.1186/s13059-019-1727-y. Genome Biol. 2019. PMID: 31234903 Free PMC article.
-
SACall: A Neural Network Basecaller for Oxford Nanopore Sequencing Data Based on Self-Attention Mechanism.IEEE/ACM Trans Comput Biol Bioinform. 2022 Jan-Feb;19(1):614-623. doi: 10.1109/TCBB.2020.3039244. Epub 2022 Feb 3. IEEE/ACM Trans Comput Biol Bioinform. 2022. PMID: 33211664
-
Estimated Nucleotide Reconstruction Quality Symbols of Basecalling Tools for Oxford Nanopore Sequencing.Sensors (Basel). 2023 Jul 29;23(15):6787. doi: 10.3390/s23156787. Sensors (Basel). 2023. PMID: 37571570 Free PMC article.
-
Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions.Brief Bioinform. 2019 Jul 19;20(4):1542-1559. doi: 10.1093/bib/bby017. Brief Bioinform. 2019. PMID: 29617724 Free PMC article. Review.
-
Beyond sequencing: machine learning algorithms extract biology hidden in Nanopore signal data.Trends Genet. 2022 Mar;38(3):246-257. doi: 10.1016/j.tig.2021.09.001. Epub 2021 Oct 25. Trends Genet. 2022. PMID: 34711425 Review.
Cited by
-
Comprehensive comparison of the third-generation sequencing tools for bacterial 6mA profiling.Nat Commun. 2025 Apr 28;16(1):3982. doi: 10.1038/s41467-025-59187-2. Nat Commun. 2025. PMID: 40295502 Free PMC article.
-
Tracing Viral Transmission and Evolution of Bovine Leukemia Virus through Long Read Oxford Nanopore Sequencing of the Proviral Genome.Pathogens. 2021 Sep 14;10(9):1191. doi: 10.3390/pathogens10091191. Pathogens. 2021. PMID: 34578223 Free PMC article.
-
Nanopore sequencing technology, bioinformatics and applications.Nat Biotechnol. 2021 Nov;39(11):1348-1365. doi: 10.1038/s41587-021-01108-x. Epub 2021 Nov 8. Nat Biotechnol. 2021. PMID: 34750572 Free PMC article. Review.
-
Analyzing Modern Biomolecules: The Revolution of Nucleic-Acid Sequencing - Review.Biomolecules. 2021 Jul 28;11(8):1111. doi: 10.3390/biom11081111. Biomolecules. 2021. PMID: 34439777 Free PMC article. Review.
-
Phylogenomic Evidence of Reinfection and Persistence of SARS-CoV-2: First Report from Colombia.Vaccines (Basel). 2021 Mar 19;9(3):282. doi: 10.3390/vaccines9030282. Vaccines (Basel). 2021. PMID: 33808687 Free PMC article.
References
-
- Bouthillier X., Konda K., Vincent P., Memisevic R. (2015). Dropout as data augmentation. arXiv [Preprint] Available online at: http://arxiv.org/abs/1506.08700 (accessed February 16, 2019).
LinkOut - more resources
Full Text Sources
Miscellaneous