Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 1;41(7):btaf368.
doi: 10.1093/bioinformatics/btaf368.

Fast and flexible minimizer digestion with digest

Affiliations

Fast and flexible minimizer digestion with digest

Alan Zheng et al. Bioinformatics. .

Abstract

Summary: Minimizer digestion is an increasingly common component of bioinformatics tools, including tools for de Bruijn graph assembly and sequence classification. We describe a new open source tool and library to facilitate efficient digestion of genomic sequences. It can produce digests based on the related ideas of minimizers, modimizers or syncmers. Digest uses efficient data structures, scales well to many threads, and produces digests with expected spacings between digested elements.

Availability and implementation: Digest is implemented in C++17 with a Python API, and is available open-source at https://github.com/VeryAmazed/digest. The python library is available on Bioconda. Rust bindings are available as a public crate at https://crates.io/crates/digest-rs.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
(A) Comparison of min query speed for different data structures as a function of window size. In this benchmark, each data-structure performs 10 million queries on an array of uniformly distributed 32-bit hash values. (B) Shows the throughput of the different digestion schemes in Digest (using a segment tree data-structure) when computing the digest of a 62M human chromosome Y sequence consisting of only A/C/G/T characters. Benchmarking for both (A) and (B) were performed on a 48-core 3 GHz Intel Xeon Gold Cascade Lake 6248R CPU with 192 GB RAM.

Update of

References

    1. Ahmed OY, Rossi M, Gagie T et al. SPUMONI 2: improved classification using a pangenome index of minimizer digests. Genome Biol 2023;24:122. - PMC - PubMed
    1. Bentley JL. Solutions to Klee’s rectangle problems. Unpublished manuscript. 1977:282–300.
    1. Edgar R. Syncmers are more sensitive than minimizers for selecting conserved kmers in biological sequences. PeerJ 2021;9:e10805. Feb. - PMC - PubMed
    1. Ekim B, Berger B, Chikhi R. Minimizer-space de bruijn graphs: whole-genome assembly of long reads in minutes on a personal computer. Cell Syst 2021;12:958–68.e6. - PMC - PubMed
    1. Groot Koerkamp R, Liu D, Pibiri GE. The open-closed mod-minimizer algorithm. Algorithms Mol Biol 2025;20:4. - PMC - PubMed