Sigmoni: classification of nanopore signal with a compressed pangenome index
- PMID: 38940135
- PMCID: PMC11211819
- DOI: 10.1093/bioinformatics/btae213
Sigmoni: classification of nanopore signal with a compressed pangenome index
Abstract
Summary: Improvements in nanopore sequencing necessitate efficient classification methods, including pre-filtering and adaptive sampling algorithms that enrich for reads of interest. Signal-based approaches circumvent the computational bottleneck of basecalling. But past methods for signal-based classification do not scale efficiently to large, repetitive references like pangenomes, limiting their utility to partial references or individual genomes. We introduce Sigmoni: a rapid, multiclass classification method based on the r-index that scales to references of hundreds of Gbps. Sigmoni quantizes nanopore signal into a discrete alphabet of picoamp ranges. It performs rapid, approximate matching using matching statistics, classifying reads based on distributions of picoamp matching statistics and co-linearity statistics, all in linear query time without the need for seed-chain-extend. Sigmoni is 10-100× faster than previous methods for adaptive sampling in host depletion experiments with improved accuracy, and can query reads against large microbial or human pangenomes. Sigmoni is the first signal-based tool to scale to a complete human genome and pangenome while remaining fast enough for adaptive sampling applications.
Availability and implementation: Sigmoni is implemented in Python, and is available open-source at https://github.com/vshiv18/sigmoni.
© The Author(s) 2024. Published by Oxford University Press.
Conflict of interest statement
S.K. has received travel funding from Oxford Nanopore Technologies Limited.
Figures



Update of
-
Sigmoni: classification of nanopore signal with a compressed pangenome index.bioRxiv [Preprint]. 2023 Aug 30:2023.08.15.553308. doi: 10.1101/2023.08.15.553308. bioRxiv. 2023. Update in: Bioinformatics. 2024 Jun 28;40(Suppl 1):i287-i296. doi: 10.1093/bioinformatics/btae213. PMID: 37645873 Free PMC article. Updated. Preprint.
Similar articles
-
Sigmoni: classification of nanopore signal with a compressed pangenome index.bioRxiv [Preprint]. 2023 Aug 30:2023.08.15.553308. doi: 10.1101/2023.08.15.553308. bioRxiv. 2023. Update in: Bioinformatics. 2024 Jun 28;40(Suppl 1):i287-i296. doi: 10.1093/bioinformatics/btae213. PMID: 37645873 Free PMC article. Updated. Preprint.
-
Accelerated nanopore basecalling with SLOW5 data format.Bioinformatics. 2023 Jun 1;39(6):btad352. doi: 10.1093/bioinformatics/btad352. Bioinformatics. 2023. PMID: 37252813 Free PMC article.
-
Simulation of nanopore sequencing signal data with tunable parameters.Genome Res. 2024 Jun 25;34(5):778-783. doi: 10.1101/gr.278730.123. Genome Res. 2024. PMID: 38692839 Free PMC article.
-
Beyond sequencing: machine learning algorithms extract biology hidden in Nanopore signal data.Trends Genet. 2022 Mar;38(3):246-257. doi: 10.1016/j.tig.2021.09.001. Epub 2021 Oct 25. Trends Genet. 2022. PMID: 34711425 Review.
-
Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions.Brief Bioinform. 2019 Jul 19;20(4):1542-1559. doi: 10.1093/bib/bby017. Brief Bioinform. 2019. PMID: 29617724 Free PMC article. Review.
Cited by
-
Faster Maximal Exact Matches with Lazy LCP Evaluation.Proc Data Compress Conf. 2024 Mar;2024:123-132. doi: 10.1109/dcc58796.2024.00020. Epub 2024 May 21. Proc Data Compress Conf. 2024. PMID: 39157794 Free PMC article.
-
Uncalled4 improves nanopore DNA and RNA modification detection via fast and accurate signal alignment.bioRxiv [Preprint]. 2024 Mar 10:2024.03.05.583511. doi: 10.1101/2024.03.05.583511. bioRxiv. 2024. Update in: Nat Methods. 2025 Apr;22(4):681-691. doi: 10.1038/s41592-025-02631-4. PMID: 38496646 Free PMC article. Updated. Preprint.
-
RawHash2: mapping raw nanopore signals using hash-based seeding and adaptive quantization.Bioinformatics. 2024 Aug 2;40(8):btae478. doi: 10.1093/bioinformatics/btae478. Bioinformatics. 2024. PMID: 39078113 Free PMC article.
-
Improved pangenomic classification accuracy with chain statistics.bioRxiv [Preprint]. 2024 Nov 2:2024.10.29.620953. doi: 10.1101/2024.10.29.620953. bioRxiv. 2024. PMID: 39554056 Free PMC article. Preprint.
-
Uncalled4 improves nanopore DNA and RNA modification detection via fast and accurate signal alignment.Nat Methods. 2025 Apr;22(4):681-691. doi: 10.1038/s41592-025-02631-4. Epub 2025 Mar 28. Nat Methods. 2025. PMID: 40155722 Free PMC article.
References
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Miscellaneous