Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun 17;26(1):169.
doi: 10.1186/s13059-025-03644-0.

Mumemto: efficient maximal matching across pangenomes

Affiliations

Mumemto: efficient maximal matching across pangenomes

Vikram S Shivakumar et al. Genome Biol. .

Abstract

Aligning genomes into common coordinates is central to pangenome construction, though computationally expensive. Multi-sequence maximal unique matches (multi-MUMs) help to frame and solve the multiple alignment problem. We introduce Mumemto, a tool that computes multi-MUMs and other match types across large pangenomes. Mumemto allows for visualization of synteny, reveals aberrant assemblies and scaffolds, and highlights pangenome conservation and structural variation. Mumemto computes multi-MUMs across 320 human assemblies (960GB) in 25.7 h with 800 GB of memory and hundreds of fungal assemblies in minutes. Mumemto is implemented in C++ and Python and available open-source at https://github.com/vikshiv/mumemto (v1.1.1 at doi.org/10.5281/zenodo.15053447 ).

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethical approval and consent to participate: Not applicable. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Exact match types that Mumemto can compute. Two flags to control how many sequences a match appears in (-k) and how many times a match may appear in any given sequence (-f)
Fig. 2
Fig. 2
AB Comparison of runtime and peak memory usage (measured as maximum resident set size) between multi-MUM finders. Only the initial multi-MUM computation was considered for ProgressiveMauve and Parsnp2. Note: Parsnp2 took >48 h for chromosome 1 and 2, so these are omitted. Parsnp2 and ProgressiveMauve were run with 48 threads, while Mumemto was run single-threaded. CD Time and memory scaling comparison for increasing sequence collection sizes of chr19. EF Comparison of time and memory for a Mumemto-seeded Parsnp2 alignment pipeline compared to the original Parsnp2 pipeline, and G a comparison of the alignments from each pipeline. H Regions excluded from Minigraph-Cactus (MC) while aligning chr19 assemblies, compared to regions excluded by a Mumemto-seeded MC pipeline (overlaid on a MUM synteny plot in gray). IJ Syntenic view of MUMs vs tube map [15] view of the equivalent graph
Fig. 3
Fig. 3
AB MUM synteny visualization of collinear MUM blocks (red) and MUMs that break collinearity in a single sequence (gray (+)/green (−) based on orientation). C Large (>4 Mbp) syntenic region lost in HG02080.1, but recovered by partial MUMs (in gray). Evidence from non-collinear MUMs (DE) and missing sequence present in partial MUMs (F) points a potential aberrant assembly artifact in the HG02080 paternal haplotype. GH Genome-wide multi-MUMs reveal an interchromosomal join (confirmed to be a misassembly by HPRC [1]) in the aberrant regions
Fig. 4
Fig. 4
A HPRC chr8 assemblies visualized with multi-MUM synteny. Regions of high multi-MEM density shown in red. (Zoom panels) Two examples of incorrectly oriented contigs during scaffolding, with contig breakpoints represented by diamond markers. Inversions shown in green. B Assemblies of chr3 across the potato family, shown with multi-MUM synteny and MEM density colored in red. (top) Density of gene and LTR retrotransposon annotations for potato accession A6-26 (shown in the top row of syntenic view)
Fig. 5
Fig. 5
Aggregate length of partial MUMs that are not present in each genome assembly. A A. thaliana accessions are grouped by geographical region, and B potato (Solanum section Petota) are grouped by species
Algorithm 1
Algorithm 1
Find multi-MEMs/MUMs

Update of

Similar articles

Cited by

References

    1. Liao WW, Asri M, Ebler J, Doerr D, Haukness M, Hickey G, et al. A draft human pangenome reference. Nature. 2023;617(7960):312–24. - PMC - PubMed
    1. Marçais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, Zimin A. MUMmer4: a fast and versatile genome alignment system. PLoS Comput Biol. 2018;14(1):e1005944. - PMC - PubMed
    1. Abouelhoda MI, Kurtz S, Ohlebusch E. Replacing suffix trees with enhanced suffix arrays. J Discret Algoritm. 2004;2(1):53–86.
    1. Darling AC, Mau B, Blattner FR, Perna NT. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 2004;14(7):1394–403. - PMC - PubMed
    1. Kille B, Nute MG, Huang V, Kim E, Phillippy AM, Treangen TJ. Parsnp 2.0: scalable core-genome alignment for massive microbial datasets. Bioinformatics. 2024;40(5):btae311. - PMC - PubMed

LinkOut - more resources