Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Oct;30(19):2818-9.
doi: 10.1093/bioinformatics/btu390. Epub 2014 Jun 14.

The Scramble conversion tool

Affiliations

The Scramble conversion tool

James K Bonfield. Bioinformatics. 2014 Oct.

Abstract

Motivation: The reference CRAM file format implementation is in Java. We present 'Scramble': a new C implementation of SAM, BAM and CRAM file I/O.

Results: The C implementation of for CRAM is 1.5-1.7× slower than BAM at decoding but 1.8-2.6× faster at encoding. We see file size savings of 34-55%.

Availability and implementation: Source code is available at http://sourceforge.net/projects/staden/files/io_lib/ under the BSD software licence.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Real time taken to convert from 230 Gb BAM to BAM (Scramble, Samtools) and BAM to CRAM (Scramble) formats. The system was a 16 core 2.2 GHz Intel Xeon E5-2660 with a local RAID XFS file system. Tests on slower disks and with smaller locally cached data files are in the Supplementary Material, including benchmarks of Sambamba (https://github.com/lomereiter/sambamba) and Biobambam (Tischler and Leonard, 2013)

Similar articles

  • Software support for SBGN maps: SBGN-ML and LibSBGN.
    van Iersel MP, Villéger AC, Czauderna T, Boyd SE, Bergmann FT, Luna A, Demir E, Sorokin A, Dogrusoz U, Matsuoka Y, Funahashi A, Aladjem MI, Mi H, Moodie SL, Kitano H, Le Novère N, Schreiber F. van Iersel MP, et al. Bioinformatics. 2012 Aug 1;28(15):2016-21. doi: 10.1093/bioinformatics/bts270. Epub 2012 May 10. Bioinformatics. 2012. PMID: 22581176 Free PMC article.
  • Whiteboard: a framework for the programmatic visualization of complex biological analyses.
    Sundström G, Zamani N, Grabherr MG, Mauceli E. Sundström G, et al. Bioinformatics. 2015 Jun 15;31(12):2054-5. doi: 10.1093/bioinformatics/btv078. Epub 2015 Feb 5. Bioinformatics. 2015. PMID: 25661541
  • kSNP3.0: SNP detection and phylogenetic analysis of genomes without genome alignment or reference genome.
    Gardner SN, Slezak T, Hall BG. Gardner SN, et al. Bioinformatics. 2015 Sep 1;31(17):2877-8. doi: 10.1093/bioinformatics/btv271. Epub 2015 Apr 25. Bioinformatics. 2015. PMID: 25913206
  • A library of efficient bioinformatics algorithms.
    Della Vedova G, Dondi R. Della Vedova G, et al. Appl Bioinformatics. 2003;2(2):117-21. Appl Bioinformatics. 2003. PMID: 15130828 Review.
  • Interoperability with Moby 1.0--it's better than sharing your toothbrush!
    BioMoby Consortium; Wilkinson MD, Senger M, Kawas E, Bruskiewich R, Gouzy J, Noirot C, Bardou P, Ng A, Haase D, Saiz Ede A, Wang D, Gibbons F, Gordon PM, Sensen CW, Carrasco JM, Fernández JM, Shen L, Links M, Ng M, Opushneva N, Neerincx PB, Leunissen JA, Ernst R, Twigger S, Usadel B, Good B, Wong Y, Stein L, Crosby W, Karlsson J, Royo R, Párraga I, Ramírez S, Gelpi JL, Trelles O, Pisano DG, Jimenez N, Kerhornou A, Rosset R, Zamacola L, Tarraga J, Huerta-Cepas J, Carazo JM, Dopazo J, Guigo R, Navarro A, Orozco M, Valencia A, Claros MG, Pérez AJ, Aldana J, Rojano M, Fernandez-Santa Cruz R, Navas I, Schiltz G, Farmer A, Gessler D, Schoof H, Groscurth A. BioMoby Consortium, et al. Brief Bioinform. 2008 May;9(3):220-31. doi: 10.1093/bib/bbn003. Epub 2008 Jan 31. Brief Bioinform. 2008. PMID: 18238804 Review.

Cited by

References

    1. Bonfield JK, Mahoney MV. Compression of FASTQ and SAM format sequencing data. PLoS One. 2013;8:e59190. - PMC - PubMed
    1. Cochrane G, et al. Facing growth in the european nucleotide archive. Nucleic Acids Res. 2013;41:D30–D35. - PMC - PubMed
    1. Deutsch P, Gailly JL. 1996. ZLIB compressed data format specification version 3.3. RFC 1950.
    1. Duda J. Asymmetric numeral systems: entropy coding combining speed of Huffman coding with compression rate of arithmetic coding. arXiv:1311.2540. 2013
    1. Fritz MH-Y, et al. Efficient storage of high throughput DNA sequencing data using reference-based compression. Genome Res. 2011;21:734–740. - PMC - PubMed

Publication types