Seamless, rapid, and accurate analyses of outbreak genomic data using split k-mer analysis
- PMID: 39406504
- PMCID: PMC11529842
- DOI: 10.1101/gr.279449.124
Seamless, rapid, and accurate analyses of outbreak genomic data using split k-mer analysis
Abstract
Sequence variation observed in populations of pathogens can be used for important public health and evolutionary genomic analyses, especially outbreak analysis and transmission reconstruction. Identifying this variation is typically achieved by aligning sequence reads to a reference genome, but this approach is susceptible to reference biases and requires careful filtering of called genotypes. There is a need for tools that can process this growing volume of bacterial genome data, providing rapid results, but that remain simple so they can be used without highly trained bioinformaticians, expensive data analysis, and long-term storage and processing of large files. Here we describe split k-mer analysis (SKA2), a method that supports both reference-free and reference-based mapping to quickly and accurately genotype populations of bacteria using sequencing reads or genome assemblies. SKA2 is highly accurate for closely related samples, and in outbreak simulations, we show superior variant recall compared with reference-based methods, with no false positives. SKA2 can also accurately map variants to a reference and be used with recombination detection methods to rapidly reconstruct vertical evolutionary history. SKA2 is many times faster than comparable methods and can be used to add new genomes to an existing call set, allowing sequential use without the need to reanalyze entire collections. With an inherent absence of reference bias, high accuracy, and a robust implementation, SKA2 has the potential to become the tool of choice for genotyping bacteria. SKA2 is implemented in Rust and is freely available as open-source software.
© 2024 Derelle et al.; Published by Cold Spring Harbor Laboratory Press.
Figures





Similar articles
-
Rapid, reference-free identification of bacterial pathogen transmission using optimized split k-mer analysis.Microb Genom. 2025 Mar;11(3):001347. doi: 10.1099/mgen.0.001347. Microb Genom. 2025. PMID: 40048499 Free PMC article.
-
Benchmarking reveals superiority of deep learning variant callers on bacterial nanopore sequence data.Elife. 2024 Oct 10;13:RP98300. doi: 10.7554/eLife.98300. Elife. 2024. PMID: 39388235 Free PMC article.
-
Rapid and accurate SNP genotyping of clonal bacterial pathogens with BioHansel.Microb Genom. 2021 Sep;7(9):000651. doi: 10.1099/mgen.0.000651. Microb Genom. 2021. PMID: 34554082 Free PMC article.
-
Whole-Genome Sequencing of Bacterial Pathogens: the Future of Nosocomial Outbreak Analysis.Clin Microbiol Rev. 2017 Oct;30(4):1015-1063. doi: 10.1128/CMR.00016-17. Clin Microbiol Rev. 2017. PMID: 28855266 Free PMC article. Review.
-
K-mer-based Approaches to Bridging Pangenomics and Population Genetics.Mol Biol Evol. 2025 Mar 5;42(3):msaf047. doi: 10.1093/molbev/msaf047. Mol Biol Evol. 2025. PMID: 40111256 Free PMC article. Review.
Cited by
-
Phenotypic and genetic heterogeneity of Acinetobacter baumannii in the course of an animal chronic infection.Microb Genom. 2025 Feb;11(2):001352. doi: 10.1099/mgen.0.001352. Microb Genom. 2025. PMID: 39969275 Free PMC article.
-
Genomic diversity of clinically relevant bacterial pathogens from an acute care hospital in Suva, Fiji.JAC Antimicrob Resist. 2025 Jun 9;7(3):dlaf058. doi: 10.1093/jacamr/dlaf058. eCollection 2025 Jun. JAC Antimicrob Resist. 2025. PMID: 40492256 Free PMC article.
-
Simultaneous detection of pathogens and antimicrobial resistance genes with the open source, cloud-based, CZ ID platform.Genome Med. 2025 May 6;17(1):46. doi: 10.1186/s13073-025-01480-2. Genome Med. 2025. PMID: 40329334 Free PMC article.
-
Rapid, reference-free identification of bacterial pathogen transmission using optimized split k-mer analysis.Microb Genom. 2025 Mar;11(3):001347. doi: 10.1099/mgen.0.001347. Microb Genom. 2025. PMID: 40048499 Free PMC article.
-
Genome-wide approaches to bacterial strain typing: a history and review of recent methodological advances.Curr Opin Infect Dis. 2025 Aug 1;38(4):329-338. doi: 10.1097/QCO.0000000000001118. Epub 2025 Jun 12. Curr Opin Infect Dis. 2025. PMID: 40464921 Free PMC article. Review.
References
-
- Becker HEF, Jamin C, Bervoets L, Boleij A, Xu P, Pierik MJ, Stassen FRM, Savelkoul PHM, Penders J, Jonkers DMAE. 2021. Higher prevalence of Bacteroides fragilis in Crohn's disease exacerbations and strain-dependent increase of epithelial resistance. Front Microbiol 12: 598232. 10.3389/fmicb.2021.598232 - DOI - PMC - PubMed
-
- Bickhart DM, Kolmogorov M, Tseng E, Portik DM, Korobeynikov A, Tolstoganov I, Uritskiy G, Liachko I, Sullivan ST, Shin SB, et al. 2022. Generating lineage-resolved, complete metagenome-assembled genomes from complex microbial communities. Nat Biotechnol 40: 711–719. 10.1038/s41587-021-01130-z - DOI - PubMed
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Miscellaneous