BaseNet: A transformer-based toolkit for nanopore sequencing signal decoding
- PMID: 39391372
- PMCID: PMC11465205
- DOI: 10.1016/j.csbj.2024.09.016
BaseNet: A transformer-based toolkit for nanopore sequencing signal decoding
Abstract
Nanopore sequencing provides a rapid, convenient and high-throughput solution for nucleic acid sequencing. Accurate basecalling in nanopore sequencing is crucial for downstream analysis. Traditional approaches such as Hidden Markov Models (HMM), Recurrent Neural Networks (RNN), and Convolutional Neural Networks (CNN) have improved basecalling accuracy but there is a continuous need for higher accuracy and reliability. In this study, we introduce BaseNet (https://github.com/liqingwen98/BaseNet), an open-source toolkit that utilizes transformer models for advanced signal decoding in nanopore sequencing. BaseNet incorporates both autoregressive and non-autoregressive transformer-based decoding mechanisms, offering state-of-the-art algorithms freely accessible for future improvement. Our research indicates that cross-attention weights effectively map the relationship between current signals and base sequences, joint loss training through adding a pair of forward and reverse decoder facilitate model converge, and large-scale pre-trained models achieve superior decoding accuracy. This study helps to advance the field of nanopore sequencing signal decoding, contributes to technological advancements, and provides novel concepts and tools for researchers and practitioners.
Keywords: Basecall; Machine learning algorithm; Nanopore sequencing; Transformer.
© 2024 The Authors.
Conflict of interest statement
Daqian Wang and Jizhong Lou are co-founders and shareholders of Beijing Polyseq Biotech Co. Ltd. Beijing Polyseq Biotech Co. Ltd. and Institute of Biophysics, Chinese Academy of Sciences have filed a patent using materials described in this article.
Figures











Similar articles
-
Lokatt: a hybrid DNA nanopore basecaller with an explicit duration hidden Markov model and a residual LSTM network.BMC Bioinformatics. 2023 Dec 7;24(1):461. doi: 10.1186/s12859-023-05580-x. BMC Bioinformatics. 2023. PMID: 38062356 Free PMC article.
-
End-to-end simulation of nanopore sequencing signals with feed-forward transformers.Bioinformatics. 2024 Dec 26;41(1):btae744. doi: 10.1093/bioinformatics/btae744. Bioinformatics. 2024. PMID: 39710838 Free PMC article.
-
Decoding bacterial methylomes in four public health-relevant microbial species: nanopore sequencing enables reproducible analysis of DNA modifications.BMC Genomics. 2025 Apr 23;26(1):394. doi: 10.1186/s12864-025-11592-z. BMC Genomics. 2025. PMID: 40269718 Free PMC article.
-
Beyond sequencing: machine learning algorithms extract biology hidden in Nanopore signal data.Trends Genet. 2022 Mar;38(3):246-257. doi: 10.1016/j.tig.2021.09.001. Epub 2021 Oct 25. Trends Genet. 2022. PMID: 34711425 Review.
-
Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions.Brief Bioinform. 2019 Jul 19;20(4):1542-1559. doi: 10.1093/bib/bby017. Brief Bioinform. 2019. PMID: 29617724 Free PMC article. Review.
Cited by
-
Next-generation sequencing-based tools or nanopore-based tools: which is more suitable for short tandem repeats genotyping of nanopore sequencing?Bioinform Adv. 2025 Jun 12;5(1):vbaf119. doi: 10.1093/bioadv/vbaf119. eCollection 2025. Bioinform Adv. 2025. PMID: 40521380 Free PMC article.
-
GCRTcall: a transformer based basecaller for nanopore RNA sequencing enhanced by gated convolution and relative position embedding via joint loss training.Front Genet. 2024 Nov 22;15:1443532. doi: 10.3389/fgene.2024.1443532. eCollection 2024. Front Genet. 2024. PMID: 39649096 Free PMC article.
References
-
- Davenport C.F., Scheithauer T., Dunst A., Bahr F.S., Dorda M., Wiehlmann L., et al. Genome-Wide Methylation Mapping Using Nanopore Sequencing Technology Identifies Novel Tumor Suppressor Genes in Hepatocellular Carcinoma. Int J Mol Sci. 2021;22(8):3937. https://www.mdpi.com/1422-0067/22/8/3937 - PMC - PubMed
LinkOut - more resources
Full Text Sources