Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jun 28;40(Suppl 1):i287-i296.
doi: 10.1093/bioinformatics/btae213.

Sigmoni: classification of nanopore signal with a compressed pangenome index

Affiliations

Sigmoni: classification of nanopore signal with a compressed pangenome index

Vikram S Shivakumar et al. Bioinformatics. .

Abstract

Summary: Improvements in nanopore sequencing necessitate efficient classification methods, including pre-filtering and adaptive sampling algorithms that enrich for reads of interest. Signal-based approaches circumvent the computational bottleneck of basecalling. But past methods for signal-based classification do not scale efficiently to large, repetitive references like pangenomes, limiting their utility to partial references or individual genomes. We introduce Sigmoni: a rapid, multiclass classification method based on the r-index that scales to references of hundreds of Gbps. Sigmoni quantizes nanopore signal into a discrete alphabet of picoamp ranges. It performs rapid, approximate matching using matching statistics, classifying reads based on distributions of picoamp matching statistics and co-linearity statistics, all in linear query time without the need for seed-chain-extend. Sigmoni is 10-100× faster than previous methods for adaptive sampling in host depletion experiments with improved accuracy, and can query reads against large microbial or human pangenomes. Sigmoni is the first signal-based tool to scale to a complete human genome and pangenome while remaining fast enough for adaptive sampling applications.

Availability and implementation: Sigmoni is implemented in Python, and is available open-source at https://github.com/vshiv18/sigmoni.

PubMed Disclaimer

Conflict of interest statement

S.K. has received travel funding from Oxford Nanopore Technologies Limited.

Figures

Figure 1.
Figure 1.
Overview of Sigmoni mapping procedure: (top) Query is discretized into “bins,” which are further converted into arbitrary characters from a small alphabet for exact matching. The reference is digested into k-mers and converted to the same alphabet based on the expected current level. (bottom) The matching length profile (left) defines the exact match length at each position along the query with respect to the reference. Using a “shredded” sampled document array, matches are mapped back to reference regions to identify a cluster of matches. Here, a read maps to Ref 5, which is the predicted reference.
Figure 2.
Figure 2.
(A) Comparison of mock community multi-class classification confusion matrices for each signal-based method. The diagonal (TP) is omitted, with proportion of reads provided instead to highlight off-diagonal (misclassified) reads. (B) Confusion matrix of human chromosome-level classification of NA12878 reads against CHM13. As the donor individual is female, ChrY was omitted from the reference. NC, not classified.
Figure 3.
Figure 3.
(A–C) Binary classification of yeast-origin reads from a mock community on “chunks” of signal. Each chunk represents 1 s of sequencing, ∼ 420 bp; (A) F1 score (unclassified reads are considered bacterial in origin), (B) classification speed for increasing length signal chunks, (C) proportion of reads classified by each method. (D–F) Binary classification on a hybrid dataset of human-origin (NA12878) reads and Zymo mock community reads; (D) F1 score (unclassified reads are considered human-origin, as in the case of a “host depletion” experiment), (E) classification speed, (F) read classification rate.

Update of

Similar articles

Cited by

References

    1. Ahmed O, Rossi M, Kovaka S. et al. Pan-genomic matching statistics for targeted nanopore sequencing. iScience 2021;24:102696. - PMC - PubMed
    1. Ahmed OY, Rossi M, Gagie T. et al. SPUMONI 2: improved classification using a pangenome index of minimizer digests. Genome Biol 2023;24:122. - PMC - PubMed
    1. Alser M, Lindegger J, Firtina C. et al. From molecules to genomic variations: accelerating genome analysis via intelligent algorithms and architectures. Comput Struct Biotechnol J 2022;20:4579–99. - PMC - PubMed
    1. Bao Y, Wadden J, Erb-Downward JR. et al. SquiggleNet: real-time, direct classification of nanopore signals. Genome Biol 2021;22:298. - PMC - PubMed
    1. Boucher C, Gagie T, Tomohiro I. et al. PHONI: streamed matching statistics with multi-genome references. Proc Data Compress Conf 2021;2021:193–202. - PMC - PubMed