Conway-Bromage-Lyndon (CBL): an exact, dynamic representation of k-mer sets
- PMID: 38940123
- PMCID: PMC11211824
- DOI: 10.1093/bioinformatics/btae217
Conway-Bromage-Lyndon (CBL): an exact, dynamic representation of k-mer sets
Abstract
Summary: In this article, we introduce the Conway-Bromage-Lyndon (CBL) structure, a compressed, dynamic and exact method for representing k-mer sets. Originating from Conway and Bromage's concept, CBL innovatively employs the smallest cyclic rotations of k-mers, akin to Lyndon words, to leverage lexicographic redundancies. In order to support dynamic operations and set operations, we propose a dynamic bit vector structure that draws a parallel with Elias-Fano's scheme. This structure is encapsulated in a Rust library, demonstrating a balanced blend of construction efficiency, cache locality, and compression. Our findings suggest that CBL outperforms existing dynamic k-mer set methods. Unique to this work, CBL stands out as the only known exact k-mer structure offering in-place set operations. Its different combined abilities position it as a flexible Swiss knife structure for k-mer set management.
Availability and implementation: https://github.com/imartayan/CBL.
© The Author(s) 2024. Published by Oxford University Press.
Conflict of interest statement
None declared.
Figures










References
-
- Agret C, Chateau A, Droc G. et al. RedOak: A reference-free and alignment-free structurefor indexing a collection of similar genomes. JOSS 2022;7:4363.
-
- Alanko JN, Puglisi SJ, Vuohtoniemi J.. Small searchable κ-spectra via subset rank queries on the spectral Burrows–Wheeler transform. In: SIAM Conference on Applied and Computational Discrete Algorithms (ACDA23). SIAM, 2023a, 225–36.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources