SACall: A Neural Network Basecaller for Oxford Nanopore Sequencing Data Based on Self-Attention Mechanism
- PMID: 33211664
- DOI: 10.1109/TCBB.2020.3039244
SACall: A Neural Network Basecaller for Oxford Nanopore Sequencing Data Based on Self-Attention Mechanism
Abstract
Highly portable Oxford Nanopore sequencer producing long reads in real-time at low cost has made many breakthroughs in genomics studies. However, a major limitation of nanopore sequencing is its high errors when deciphering DNA sequences from noisy and complex raw data. In this paper, we developed an end-to-end basecaller, SACall, based on convolution layers, transformer self-attention layers and a CTC decoder. In SACall, the convolution layers are used to downsample the signals and capture the local patterns. To achieve the contextual relevance of signals, self-attention layers are adopted to calculate the similarity of the signals at any two positions in the raw signal sequence. Finally, the CTC decoder generates the DNA sequence by a beam search algorithm. We use a benchmark consisting of nine isolated genomes to test the quality of different basecallers including SACall, Albacore, and Guppy. The performances of basecallers are evaluated from the perspective of read accuracy, assembly quality, and consensus accuracy. Among most of the genomes in the test benchmark, the reads basecalled by SACall have fewer errors than the reads basecalled by other basecallers. When assembling the basecalled reads of each genome, the assembly from SACall basecalled reads achieves a higher assembly identity. In addition, there are fewer errors in the polished assembly from reads basecalled by SACall compared to those basecalled by Albacore and Guppy. In general, SACall outperforms the Nanopore official basecallers Albacore and Guppy in the benchmark. Moreover, SACall is an open-source and freely available basecaller, which gives a chance for researchers to train their own basecalling models on specific data and basecall Nanopore reads.
Similar articles
-
NanoReviser: An Error-Correction Tool for Nanopore Sequencing Based on a Deep Learning Algorithm.Front Genet. 2020 Aug 12;11:900. doi: 10.3389/fgene.2020.00900. eCollection 2020. Front Genet. 2020. PMID: 32903372 Free PMC article.
-
Halcyon: an accurate basecaller exploiting an encoder-decoder model with monotonic attention.Bioinformatics. 2021 Jun 9;37(9):1211-1217. doi: 10.1093/bioinformatics/btaa953. Bioinformatics. 2021. PMID: 33165508 Free PMC article.
-
Performance of neural network basecalling tools for Oxford Nanopore sequencing.Genome Biol. 2019 Jun 24;20(1):129. doi: 10.1186/s13059-019-1727-y. Genome Biol. 2019. PMID: 31234903 Free PMC article.
-
Beyond sequencing: machine learning algorithms extract biology hidden in Nanopore signal data.Trends Genet. 2022 Mar;38(3):246-257. doi: 10.1016/j.tig.2021.09.001. Epub 2021 Oct 25. Trends Genet. 2022. PMID: 34711425 Review.
-
From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy.Genome Biol. 2018 Jul 13;19(1):90. doi: 10.1186/s13059-018-1462-9. Genome Biol. 2018. PMID: 30005597 Free PMC article. Review.
Cited by
-
Simple, reference-independent assessment to empirically guide correction and polishing of hybrid microbial community metagenomic assembly.PeerJ. 2024 Nov 8;12:e18132. doi: 10.7717/peerj.18132. eCollection 2024. PeerJ. 2024. PMID: 39529629 Free PMC article.
-
BaseNet: A transformer-based toolkit for nanopore sequencing signal decoding.Comput Struct Biotechnol J. 2024 Sep 25;23:3430-3444. doi: 10.1016/j.csbj.2024.09.016. eCollection 2024 Dec. Comput Struct Biotechnol J. 2024. PMID: 39391372 Free PMC article.
-
GCRTcall: a transformer based basecaller for nanopore RNA sequencing enhanced by gated convolution and relative position embedding via joint loss training.Front Genet. 2024 Nov 22;15:1443532. doi: 10.3389/fgene.2024.1443532. eCollection 2024. Front Genet. 2024. PMID: 39649096 Free PMC article.
-
RUBICON: a framework for designing efficient deep learning-based genomic basecallers.Genome Biol. 2024 Feb 16;25(1):49. doi: 10.1186/s13059-024-03181-2. Genome Biol. 2024. PMID: 38365730 Free PMC article.
-
Detecting m6A RNA modification from nanopore sequencing using a semi-supervised learning framework.bioRxiv [Preprint]. 2024 Jan 7:2024.01.06.574484. doi: 10.1101/2024.01.06.574484. bioRxiv. 2024. Update in: Genome Res. 2024 Nov 20;34(11):1987-1999. doi: 10.1101/gr.278960.124. PMID: 38260359 Free PMC article. Updated. Preprint.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources