IonCRAM: a reference-based compression tool for ion torrent sequence files
- PMID: 32907531
- PMCID: PMC7487613
- DOI: 10.1186/s12859-020-03726-9
IonCRAM: a reference-based compression tool for ion torrent sequence files
Erratum in
-
Correction to: IonCRAM: a reference-based compression tool for ion torrent sequence files.BMC Bioinformatics. 2020 Oct 6;21(1):435. doi: 10.1186/s12859-020-03766-1. BMC Bioinformatics. 2020. PMID: 33023475 Free PMC article.
Abstract
Background: Ion Torrent is one of the major next generation sequencing (NGS) technologies and it is frequently used in medical research and diagnosis. The built-in software for the Ion Torrent sequencing machines delivers the sequencing results in the BAM format. In addition to the usual SAM/BAM fields, the Ion Torrent BAM file includes technology-specific flow signal data. The flow signals occupy a big portion of the BAM file (about 75% for the human genome). Compressing SAM/BAM into CRAM format significantly reduces the space needed to store the NGS results. However, the tools for generating the CRAM formats are not designed to handle the flow signals. This missing feature has motivated us to develop a new program to improve the compression of the Ion Torrent files for long term archiving.
Results: In this paper, we present IonCRAM, the first reference-based compression tool to compress Ion Torrent BAM files for long term archiving. For the BAM files, IonCRAM could achieve a space saving of about 43%. This space saving is superior to what achieved with the CRAM format by about 8-9%.
Conclusions: Reducing the space consumption of NGS data reduces the cost of storage and data transfer. Therefore, developing efficient compression software for clinical NGS data goes beyond the computational interest; as it ultimately contributes to the overall cost reduction of the clinical test. The space saving achieved by our tool is a practical step in this direction. The tool is open source and available at Code Ocean, github, and http://ioncram.saudigenomeproject.com .
Conflict of interest statement
The authors declare that they have no competing interests.
Figures




Similar articles
-
A benchmark study of compression software for human short-read sequence data.Sci Rep. 2025 May 2;15(1):15358. doi: 10.1038/s41598-025-00491-8. Sci Rep. 2025. PMID: 40316539 Free PMC article.
-
Sambamba: fast processing of NGS alignment formats.Bioinformatics. 2015 Jun 15;31(12):2032-4. doi: 10.1093/bioinformatics/btv098. Epub 2015 Feb 19. Bioinformatics. 2015. PMID: 25697820 Free PMC article.
-
SamQL: a structured query language and filtering tool for the SAM/BAM file format.BMC Bioinformatics. 2021 Oct 2;22(1):474. doi: 10.1186/s12859-021-04390-3. BMC Bioinformatics. 2021. PMID: 34600480 Free PMC article.
-
Alview: Portable Software for Viewing Sequence Reads in BAM Formatted Files.Cancer Inform. 2015 Sep 13;14:105-7. doi: 10.4137/CIN.S26470. eCollection 2015. Cancer Inform. 2015. PMID: 26417198 Free PMC article. Review.
-
HLA typing by next-generation sequencing - getting closer to reality.Tissue Antigens. 2014 Feb;83(2):65-75. doi: 10.1111/tan.12298. Tissue Antigens. 2014. PMID: 24447174 Review.
Cited by
-
Correction to: IonCRAM: a reference-based compression tool for ion torrent sequence files.BMC Bioinformatics. 2020 Oct 6;21(1):435. doi: 10.1186/s12859-020-03766-1. BMC Bioinformatics. 2020. PMID: 33023475 Free PMC article.
References
-
- Xue Y, Ankala A, Wilcox WR, Hegde MR. Solving the molecular diagnostic testing conundrum for Mendelian disorders in the era of next-generation sequencing: Single-gene, gene panel, or exome/genome sequencing. Genet Med. 2015;17(6):444–451. - PubMed
-
- Hu H, et al. Clinical experience of non-invasive prenatal chromosomal aneuploidy testing in 190,277 patient samples. Curr Mol Med. 2016;16(8):759–766. - PubMed
MeSH terms
LinkOut - more resources
Full Text Sources
Miscellaneous