Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 May 2;40(5):btae311.
doi: 10.1093/bioinformatics/btae311.

Parsnp 2.0: scalable core-genome alignment for massive microbial datasets

Affiliations

Parsnp 2.0: scalable core-genome alignment for massive microbial datasets

Bryce Kille et al. Bioinformatics. .

Abstract

Motivation: Since 2016, the number of microbial species with available reference genomes in NCBI has more than tripled. Multiple genome alignment, the process of identifying nucleotides across multiple genomes which share a common ancestor, is used as the input to numerous downstream comparative analysis methods. Parsnp is one of the few multiple genome alignment methods able to scale to the current era of genomic data; however, there has been no major release since its initial release in 2014.

Results: To address this gap, we developed Parsnp v2, which significantly improves on its original release. Parsnp v2 provides users with more control over executions of the program, allowing Parsnp to be better tailored for different use-cases. We introduce a partitioning option to Parsnp, which allows the input to be broken up into multiple parallel alignment processes which are then combined into a final alignment. The partitioning option can reduce memory usage by over 4× and reduce runtime by over 2×, all while maintaining a precise core-genome alignment. The partitioning workflow is also less susceptible to complications caused by assembly artifacts and minor variation, as alignment anchors only need to be conserved within their partition and not across the entire input set. We highlight the performance on datasets involving thousands of bacterial and viral genomes.

Availability and implementation: Parsnp v2 is available at https://github.com/marbl/parsnp.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
Gingr visualization of a Parsnp (p=250) alignment on 828 M.tuberculosis genomes. When zoomed out, the visualization displays a heatmap of single-nucleotide variants. Base-level resolution views of the alignments become visible when inspecting smaller regions.

Update of

References

    1. Dalquen DA, Anisimova M, Gonnet GH. et al. ALF—a simulation framework for genome evolution. Mol Biol Evol 2012;29:1115–23. - PMC - PubMed
    1. Dylus D, Altenhoff A, Majidian S. et al. Inference of phylogenetic trees directly from raw sequencing reads using read2tree. Nat Biotechnol 2023;42:139–47. - PMC - PubMed
    1. Edgar RC. Muscle: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 2004;5:113–9. - PMC - PubMed
    1. Elghraoui A, Mirarab S, Swenson KM. et al. Evaluating impacts of syntenic block detection strategies on rearrangement phylogeny using Mycobacterium tuberculosis isolates. Bioinformatics 2023;39:btad024. - PMC - PubMed
    1. Fruzangohar M, Moolhuijzen P, Bakaj N. et al. Coredetector: a flexible and efficient program for core-genome alignment of evolutionary diverse genomes. Bioinformatics 2023;39:btad628. - PMC - PubMed