Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jun 15;31(12):2032-4.
doi: 10.1093/bioinformatics/btv098. Epub 2015 Feb 19.

Sambamba: fast processing of NGS alignment formats

Affiliations

Sambamba: fast processing of NGS alignment formats

Artem Tarasov et al. Bioinformatics. .

Abstract

Sambamba is a high-performance robust tool and library for working with SAM, BAM and CRAM sequence alignment files; the most common file formats for aligned next generation sequencing data. Sambamba is a faster alternative to samtools that exploits multi-core processing and dramatically reduces processing time. Sambamba is being adopted at sequencing centers, not only because of its speed, but also because of additional functionality, including coverage analysis and powerful filtering capability.

Availability and implementation: Sambamba is free and open source software, available under a GPLv2 license. Sambamba can be downloaded and installed from http://www.open-bio.org/wiki/Sambamba.Sambamba v0.5.0 was released with doi:10.5281/zenodo.13200.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Processing speed comparison of samtools and sambamba. Wall-clock time (s) versus number of threads to convert an 11-GB CRAM (1000 genomes HG00110) to 108-GB SAM. With Samtools, view is bound to a single thread at CPU 90%. With Sambamba, IO gets saturated at approximately CPU 250%. When using a faster RAM-disk, IO gets saturated at approximately CPU 350%. For samtools a RAM-disk makes no difference. When adding more threads, performance reproducibly degrades because of CPU cache contention. All timings were performed on a server-class machine with 512 GB of RAM and 48 CPU cores (4 × 12-core AMD Opteron(tm) Processor 6174 @2.2 Ghz with 6 Mb L2 cache) Samtools version v1.0-15 using htslib v1.0-1 and sambamba v0.5.0 compiled with the LLVM D-compiler v0.14.0.

References

    1. Alexandrescu A. (2010) The D Programming Language. Addison-Wesley, Upper Saddle River, NJ.
    1. Bonfield J.K. (2014) The Scramble conversion tool. Bioinformatics , 30, 2818–2819. - PMC - PubMed
    1. Cochrane G., et al. (2013) Facing growth in the European Nucleotide Archive. Nucleic Acids Res. , 41(Database issue), D30–D35. - PMC - PubMed
    1. Faust G.G., Hall I.M. (2014) SAMBLASTER: fast duplicate marking and structural variant read extraction. Bioinformatics , 30, 2503–2505. - PMC - PubMed
    1. Gullapalli R.R., et al. (2012) Next generation sequencing in clinical medicine: challenges and lessons for pathology and biomedical informatics. J. Pathol. Inf. , 3, 40. - PMC - PubMed

Publication types