Finding Maximal Exact Matches Using the r-Index

Massimiliano Rossi¹, Marco Oliva¹, Paola Bonizzoni², Ben Langmead³, Travis Gagie⁴, Christina Boucher¹

Affiliations

¹ Department of Computer and Information Science and Engineering, University of Florida, Gainesville, Florida, USA.
² Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano, Italy.
³ Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA.
⁴ Faculty of Computer Science, Dalhousie University, Halifax, Canada.

PMID: 35041518
PMCID: PMC8902461
DOI: 10.1089/cmb.2021.0445

Finding Maximal Exact Matches Using the r-Index

Massimiliano Rossi et al. J Comput Biol. 2022 Feb.

. 2022 Feb;29(2):188-194.

doi: 10.1089/cmb.2021.0445. Epub 2022 Jan 17.

Authors

Massimiliano Rossi¹, Marco Oliva¹, Paola Bonizzoni², Ben Langmead³, Travis Gagie⁴, Christina Boucher¹

Affiliations

¹ Department of Computer and Information Science and Engineering, University of Florida, Gainesville, Florida, USA.
² Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano, Italy.
³ Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA.
⁴ Faculty of Computer Science, Dalhousie University, Halifax, Canada.

PMID: 35041518
PMCID: PMC8902461
DOI: 10.1089/cmb.2021.0445

Abstract

Efficiently finding maximal exact matches (MEMs) between a sequence read and a database of genomes is a key first step in read alignment. But until recently, it was unknown how to build a data structure in [Formula: see text] space that supports efficient MEM finding, where r is the number of runs in the Burrows-Wheeler Transform. In 2021, Rossi et al. showed how to build a small auxiliary data structure called thresholds in addition to the r-index in [Formula: see text] space. This addition enables efficient MEM finding using the r-index. In this article, we present the tool that implements this solution, which we call MONI. Namely, we give a high-level view of the main components of the data structure and show how the source code can be downloaded, compiled, and used to find MEMs between a set of sequence reads and a set of genomes.

Keywords: MEM finding; r-index; run-length-encoded Burrows–Wheeler transform; thresholds.

PubMed Disclaimer

Conflict of interest statement

The authors declare they have no competing financial interests.

References

1. Au, K.F., Underwood, J.G., Lee, L., et al. . 2012. Improving PacBio long read accuracy by short read alignment. PLoS ONE 7, e46679. - PMC - PubMed
1. Bannai, H., Gagie, T., and Tomohiro, I.. 2020. Refining the r-index. Theor. Comput. Sci. 812, 96–108.
1. Boucher, C., Gagie, T., Kunhle, A., et al. . 2019. Prefix-free parsing for building big BWTs. Algorithms Mol. Biol. 14, 13:1–13:15. - PMC - PubMed
1. Ferragina, P., and Manzini, G.. 2005. Indexing compressed text. J. ACM 52, 552–581.
1. Gagie, T., Navarro, G., and Prezza N.. 2020a. Fully functional suffix trees and optimal text searching in BWT-Runs Bounded Space. J. ACM 67, 2:1–2:54.

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Finding Maximal Exact Matches Using the r-Index

Affiliations

Finding Maximal Exact Matches Using the r-Index

Authors

Affiliations

Abstract

Conflict of interest statement

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous