Compressive pangenomics using mutation-annotated networks
- PMID: 41526696
- DOI: 10.1038/s41588-025-02478-7
Compressive pangenomics using mutation-annotated networks
Abstract
Pangenomics is an emerging field that uses collections of genomes, rather than a single reference, to reduce bias and capture intra-species diversity. However, existing pangenomic data formats face challenges in scaling to millions of genomes and primarily emphasize variation, often neglecting the underlying mutational events and evolutionary relationships. This work introduces Pangenome Mutation-Annotated Network (PanMAN), a lossless pangenome representation that achieves compression ratios ranging from 3.5-1,391× in file sizes compared to existing variation-preserving formats, with performance generally improving on larger datasets. In addition to compression, PanMAN increases representational capacity by encoding detailed mutational and evolutionary histories inferred across genomes, thereby enabling new biological insights. Using PanMAN, a comprehensive SARS-CoV-2 pangenome was constructed from 8 million publicly available sequences, requiring only 366 MB of disk space. We also present 'panmanUtils', a toolkit that supports common analyses and ensures interoperability with existing software. PanMAN is poised to greatly improve the scale, speed, resolution and scope of pangenomic analysis and data sharing.
© 2026. The Author(s), under exclusive licence to Springer Nature America, Inc.
Conflict of interest statement
Competing interests: All authors declare no competing interests.
References
-
- Computational Pan-Genomics Consortium. Computational pan-genomics: status, promises and challenges. Brief. Bioinform. 19, 118–135 (2018).
MeSH terms
Grants and funding
- 75D30123C17463/U.S. Department of Health & Human Services | Centers for Disease Control and Prevention (CDC)
- Amazon Research Award/Amazon Web Services (AWS)
- U01HG013755/U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
- AMD AI & HPC fund/Advanced Micro Devices (AMD)
- R35GM128932/U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences (NIGMS)
LinkOut - more resources
Full Text Sources
Miscellaneous
