Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Sep 9;21(1):397.
doi: 10.1186/s12859-020-03726-9.

IonCRAM: a reference-based compression tool for ion torrent sequence files

Affiliations

IonCRAM: a reference-based compression tool for ion torrent sequence files

Moustafa Shokrof et al. BMC Bioinformatics. .

Erratum in

Abstract

Background: Ion Torrent is one of the major next generation sequencing (NGS) technologies and it is frequently used in medical research and diagnosis. The built-in software for the Ion Torrent sequencing machines delivers the sequencing results in the BAM format. In addition to the usual SAM/BAM fields, the Ion Torrent BAM file includes technology-specific flow signal data. The flow signals occupy a big portion of the BAM file (about 75% for the human genome). Compressing SAM/BAM into CRAM format significantly reduces the space needed to store the NGS results. However, the tools for generating the CRAM formats are not designed to handle the flow signals. This missing feature has motivated us to develop a new program to improve the compression of the Ion Torrent files for long term archiving.

Results: In this paper, we present IonCRAM, the first reference-based compression tool to compress Ion Torrent BAM files for long term archiving. For the BAM files, IonCRAM could achieve a space saving of about 43%. This space saving is superior to what achieved with the CRAM format by about 8-9%.

Conclusions: Reducing the space consumption of NGS data reduces the cost of storage and data transfer. Therefore, developing efficient compression software for clinical NGS data goes beyond the computational interest; as it ultimately contributes to the overall cost reduction of the clinical test. The space saving achieved by our tool is a practical step in this direction. The tool is open source and available at Code Ocean, github, and http://ioncram.saudigenomeproject.com .

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Flow Signals and their position in the SAM/BAM file. The upper part shows an example DNA fragment to be sequenced by an Ion Torrent machine. The key and bar code sequences are ligated (pre-pended) to the fragment. The key sequence (TCAG) is a control sequence to ensure correct sequencing. A barcode sequence is added to a certain group of fragments. The use of barcodes makes it possible to sequence the DNA of different samples/patients in one run. The lower part of the figure shows a schematic representation of the fields in the SAM file. The SAM file is the non-binary readable version of the BAM. The header part includes the flow cycle and the key sequence. Each line in the SAM file represents one read, aligned to the reference genome. The remaining rows include the read information in a tab-separated format: We show only the columns/fields of relevance to this paper. We show the fields including the read ID, the physical position and the CGAR string which represents the alignment, the bases of the DNA sequence in the read, the quality field, and the flow signals in the ZM field
Fig. 2
Fig. 2
Base calling based on flow signals. The upper part shows an example DNA fragment to be sequenced. The second part shows the sequence of nucleotides in the flow cycle. It also shows the values of the sensed flow signals and the called bases. A flow signal value exceeding a certain threshold means that a base had hybridized to the template and the corresponding base in the flow cycle is reported. If the flow signal value doubles, this indicates a polymer of identical bases. The base calling software calibrates the signal values and decides the length of the polymer
Fig. 3
Fig. 3
Compression Running times and space consumption. Compression running times and RAM consumption: Average running times for compressing (a) gene panels in seconds, (c) in-house exomes in minutes, (e) and public exomes in minutes. The measurements are for using Scramble and for using IonCRAM with the gzip, xz, and Zstd options. The average running time for gene panels is the average running times of the 11 gene panel files, and so did we for the set of the three public exomes and the set of four public exomes. The average RAM consumption in GB for gene panels, in-house exomes, and public exomes is shown in (b), (d), and (f)
Fig. 4
Fig. 4
Decompression Running times and space consumption. Decompression running times and RAM consumption: Average running times for decompressing (a) gene panels in seconds, (c) in-house exomes in minutes, (e) and public exomes in minutes. The measurements are for using Scramble and for using IonCRAM with the gzip, xz, and Zstd options. The average running time for gene panels is the average running times of the 11 gene panel files, and so did we for the set of the three public exomes and the set of four public exomes. The average RAM consumption in GB for gene panels, in-house exomes, and public exomes is shown in (b), (d), and (f)

Similar articles

Cited by

References

    1. The Saudi Mendliome Group Comprehensive gene panels provide advantages over clinical exome sequencing for Mendelian diseases. Genome Biol. 2015;16(1):134. - PMC - PubMed
    1. Rehm HL. Disease-targeted sequencing: A cornerstone in the clinic. Nat Rev Genet. 2013;14(4):295–300. - PMC - PubMed
    1. Xue Y, Ankala A, Wilcox WR, Hegde MR. Solving the molecular diagnostic testing conundrum for Mendelian disorders in the era of next-generation sequencing: Single-gene, gene panel, or exome/genome sequencing. Genet Med. 2015;17(6):444–451. - PubMed
    1. McCullough RM, et al. Non-Invasive Prenatal Chromosomal Aneuploidy Testing - Clinical Experience: 100,000 Clinical Samples. PLoS One. 2014;9(10):e109173. - PMC - PubMed
    1. Hu H, et al. Clinical experience of non-invasive prenatal chromosomal aneuploidy testing in 190,277 patient samples. Curr Mol Med. 2016;16(8):759–766. - PubMed