b-move: faster lossless approximate pattern matching in a run-length compressed index
- PMID: 40796877
- PMCID: PMC12345024
- DOI: 10.1186/s13015-025-00281-x
b-move: faster lossless approximate pattern matching in a run-length compressed index
Abstract
Background: Due to the increasing availability of high-quality genome sequences, pan-genomes are gradually replacing single consensus reference genomes in many bioinformatics pipelines to better capture genetic diversity. Traditional bioinformatics tools using the FM-index face memory limitations with such large genome collections. Recent advancements in run-length compressed indices like Gagie et al.'s r-index and Nishimoto and Tabei's move structure, alleviate memory constraints but focus primarily on backward search for MEM-finding. Arakawa et al.'s br-index initiates complete approximate pattern matching using bidirectional search in run-length compressed space, but with significant computational overhead due to complex memory access patterns.
Results: We introduce b-move, a novel bidirectional extension of the move structure, enabling fast, cache-efficient, lossless approximate pattern matching in run-length compressed space. It achieves bidirectional character extensions up to 7 times faster than the br-index, closing the performance gap with FM-index-based alternatives. For locating occurrences, b-move performs and operations up to 7 times faster than the br-index. At the same time, it maintains the favorable memory characteristics of the br-index, for example, all available complete E. coli genomes on NCBI's RefSeq collection can be compiled into a b-move index that fits into the RAM of a typical laptop.
Conclusions: b-move proves practical and scalable for pan-genome indexing and querying. We provide a C++ implementation of b-move, supporting efficient lossless approximate pattern matching including locate functionality, available at https://github.com/biointec/b-move under the AGPL-3.0 license.
Keywords: Approximate pattern matching; Bidirectional search; Cache efficiency; FM-index; Lossless alignment; Move structure; Pan-genomics; r-index.
© 2025. The Author(s).
Conflict of interest statement
Declarations. Ethics Approval and Consent to Participate: Not applicable. Consent for Publication: Not applicable. Competing Interests: The authors declare that they have no Conflict of interest.
Figures











Update of
-
b-move: Faster Lossless Approximate Pattern Matching in a Run-Length Compressed Index.Res Sq [Preprint]. 2024 Nov 18:rs.3.rs-5367343. doi: 10.21203/rs.3.rs-5367343/v1. Res Sq. 2024. Update in: Algorithms Mol Biol. 2025 Aug 12;20(1):15. doi: 10.1186/s13015-025-00281-x. PMID: 39606487 Free PMC article. Updated. Preprint.
References
-
- Ferragina P, Manzini G. Opportunistic data structures with applications. In: Proceedings 41st annual symposium on Foundations of Computer Science; 2000. p. 390-8. 10.1109/SFCS.2000.892127.
-
- Burrows M, Wheeler D. A Block-sorting lossless data compression algorithm. 130 Lytton Avenue, Palo Alto, California 94301: Digital Equipment Corporation Systems Research Center; 1994. p. 124.
Grants and funding
LinkOut - more resources
Full Text Sources
Miscellaneous