ScanIndel: a hybrid framework for indel detection via gapped alignment, split reads and de novo assembly

Rendong Yang¹, Andrew C Nelson², Christine Henzler³, Bharat Thyagarajan⁴, Kevin A T Silverstein⁵

Affiliations

¹ Supercomputing Institute for Advanced Computational Research, University of Minnesota, 117 Pleasant St. SE, RM 541, Minneapolis, MN, 55455, USA. yang4414@umn.edu.
² Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, MN, 55455, USA. nels2055@umn.edu.
³ Supercomputing Institute for Advanced Computational Research, University of Minnesota, 117 Pleasant St. SE, RM 541, Minneapolis, MN, 55455, USA. chenzler@umn.edu.
⁴ Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, MN, 55455, USA. thya0003@umn.edu.
⁵ Supercomputing Institute for Advanced Computational Research, University of Minnesota, 117 Pleasant St. SE, RM 541, Minneapolis, MN, 55455, USA. kats@umn.edu.

PMID: 26643039
PMCID: PMC4671222
DOI: 10.1186/s13073-015-0251-2

ScanIndel: a hybrid framework for indel detection via gapped alignment, split reads and de novo assembly

Rendong Yang et al. Genome Med. 2015.

. 2015 Dec 7:7:127.

doi: 10.1186/s13073-015-0251-2.

Authors

Rendong Yang¹, Andrew C Nelson², Christine Henzler³, Bharat Thyagarajan⁴, Kevin A T Silverstein⁵

Affiliations

¹ Supercomputing Institute for Advanced Computational Research, University of Minnesota, 117 Pleasant St. SE, RM 541, Minneapolis, MN, 55455, USA. yang4414@umn.edu.
² Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, MN, 55455, USA. nels2055@umn.edu.
³ Supercomputing Institute for Advanced Computational Research, University of Minnesota, 117 Pleasant St. SE, RM 541, Minneapolis, MN, 55455, USA. chenzler@umn.edu.
⁴ Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, MN, 55455, USA. thya0003@umn.edu.
⁵ Supercomputing Institute for Advanced Computational Research, University of Minnesota, 117 Pleasant St. SE, RM 541, Minneapolis, MN, 55455, USA. kats@umn.edu.

PMID: 26643039
PMCID: PMC4671222
DOI: 10.1186/s13073-015-0251-2

Abstract

Comprehensive identification of insertions/deletions (indels) across the full size spectrum from second generation sequencing is challenging due to the relatively short read length inherent in the technology. Different indel calling methods exist but are limited in detection to specific sizes with varying accuracy and resolution. We present ScanIndel, an integrated framework for detecting indels with multiple heuristics including gapped alignment, split reads and de novo assembly. Using simulation data, we demonstrate ScanIndel's superior sensitivity and specificity relative to several state-of-the-art indel callers across various coverage levels and indel sizes. ScanIndel yields higher predictive accuracy with lower computational cost compared with existing tools for both targeted resequencing data from tumor specimens and high coverage whole-genome sequencing data from the human NIST standard NA12878. Thus, we anticipate ScanIndel will improve indel analysis in both clinical and research settings. ScanIndel is implemented in Python, and is freely available for academic use at https://github.com/cauyrd/ScanIndel.

PubMed Disclaimer

Figures

**Fig. 1**
The ScanIndel workflow. ScanIndel aligns the raw read FASTQ files with a gapped NGS aligner (BWA-MEM) to detect short indels according to the initial mapping results. Soft-clipped reads with breakpoint evidence support are extracted for BLAT re-alignment to refine the CIGAR and genomic positions. Those re-aligned soft-clipped reads help to identify large deletions and medium-sized insertions. Meanwhile, ScanIndel carries out *de novo* assembly with the Inchworm assembler from Trinity for unmapped reads and BLAT realigned soft-clipped reads to detect large indels. All individual calling sets are merged by vcfcombine (from vcflib) to get one final VCF output containing all indel predictions

**Fig. 2**
Effect of different strategies on indel detection. ScanIndel is executed by three modes: (1) BWA-MEM alignment + soft-clipping realignment + FreeBayes indel calling (labeled as ‘*scanindel_mapping_only*’); (2) BWA-MEM alignment + *de novo* assembly + FreeBayes indel calling (labeled as ‘*scanindel_assembly_only*’); and (3) complete ScanIndel procedures — BWA-MEM alignment + softclipping realignment + *de novo* assembly + FreeBayes indel calling (labeled as ‘*scanindel*’). In addition, FreeBayes indel calling directly from BWA-MEM alignment is tested as well (labeled as ‘*freebayes*’). Smoothed histograms (40-bp bins) showed the comparison on simulated short reads 100 bp and 200 bp in length under 10×, 20× and 50× mean coverage for detecting 1000 deletions and 1000 insertions ranged evenly in size from 1 bp to 1 kb

**Fig. 3**
Performance comparison for indel detection with 100-bp simulated reads. Recall (*upper panels*) and precision (*lower panel*s) are evaluated for ScanIndel, GATK HaplotypeCaller, Pindel, Platypus, Scalpel, Delly and FermiKit. Smoothed histograms (100-bp bins) showed the comparison on simulated data of 10×, 20× and 50× mean coverage for detecting 1000 deletions and 1000 insertions, one each from the size range 1 bp to 1 kb. Precision is not calculated if a zero denominator (TP + FP = 0) is given by the method

**Fig. 4**
Performance comparison of large indel detection on NIST standard NA12878. a Validated large deletions (138) from the literature with sizes from 530–155,154 bp are used as a reference standard set. b Novel sequence insertions (105) previously identified by the 1000 Genomes Project with sizes from 37–8224 bp are used as reference standard

**Fig. 5**
Time and peak memory used by ScanIndel and Pindel on NA12878 individual 50× WGS data. The run time of ScanIndel is counted in each module: split read re-alignment (SR), *de novo* assembly (AS) and variant calling (VC). All the measurements refer to the program itself, and do not include BWA-MEM alignment

See this image and copyright information in PMC

Cited by

EBV-negative monomorphic B-cell post-transplant lymphoproliferative disorders are pathologically distinct from EBV-positive cases and frequently contain TP53 mutations.
Courville EL, Yohe S, Chou D, Nardi V, Lazaryan A, Thakral B, Nelson AC, Ferry JA, Sohani AR. Courville EL, et al. Mod Pathol. 2016 Oct;29(10):1200-11. doi: 10.1038/modpathol.2016.130. Epub 2016 Jul 22. Mod Pathol. 2016. PMID: 27443517
Primary mucosal melanomas of the head and neck are characterised by overexpression of the DNA mutating enzyme APOBEC3B.
Argyris PP, Naumann J, Jarvis MC, Wilkinson PE, Ho DP, Islam MN, Bhattacharyya I, Gopalakrishnan R, Li F, Koutlas IG, Giubellino A, Harris RS. Argyris PP, et al. Histopathology. 2023 Mar;82(4):608-621. doi: 10.1111/his.14843. Epub 2022 Dec 5. Histopathology. 2023. PMID: 36416305 Free PMC article.
Convergent Loss of ABC Transporter Genes From Clostridioides difficile Genomes Is Associated With Impaired Tyrosine Uptake and p-Cresol Production.
Steglich M, Hofmann JD, Helmecke J, Sikorski J, Spröer C, Riedel T, Bunk B, Overmann J, Neumann-Schaal M, Nübel U. Steglich M, et al. Front Microbiol. 2018 May 8;9:901. doi: 10.3389/fmicb.2018.00901. eCollection 2018. Front Microbiol. 2018. PMID: 29867812 Free PMC article.
High-resolution mapping reveals the mechanism and contribution of genome insertions and deletions to RNA virus evolution.
Aguilar Rangel M, Dolan PT, Taguwa S, Xiao Y, Andino R, Frydman J. Aguilar Rangel M, et al. Proc Natl Acad Sci U S A. 2023 Aug;120(31):e2304667120. doi: 10.1073/pnas.2304667120. Epub 2023 Jul 24. Proc Natl Acad Sci U S A. 2023. PMID: 37487061 Free PMC article.
Optimization of a microfluidics-based next generation sequencing assay for clinical oncology diagnostics.
Henzler C, Schomaker M, Yang R, Lambert AP, LaRue R, Kincaid R, Beckman K, Kemmer T, Wilson J, Yohe S, Thyagarajan B, Nelson AC. Henzler C, et al. Ann Transl Med. 2018 May;6(9):162. doi: 10.21037/atm.2018.05.07. Ann Transl Med. 2018. PMID: 29911110 Free PMC article.

See all "Cited by" articles

References

1. Mullaney JM, Mills RE, Stephen Pittard W, Devine SE. Small insertions and deletions (INDELs) in human genomes. Hum Mol Genet. 2010;19:R131–6. doi: 10.1093/hmg/ddq400. - DOI - PMC - PubMed
1. Meldrum C, Doyle MA, Tothill RW. Next-generation sequencing for cancer diagnostics: a practical perspective. Clin Biochem Rev. 2011;32:177–95. - PMC - PubMed
1. Ding L, Wendl MC, McMichael JF, Raphael BJ. Expanding the computational toolbox for mining cancer genomes. Nat Rev Genet. 2014;15(July):556–70. doi: 10.1038/nrg3767. - DOI - PMC - PubMed
1. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–60. doi: 10.1093/bioinformatics/btp324. - DOI - PMC - PubMed
1. Neuman JA, Isakov O, Shomron N. Analysis of insertion-deletion from deep-sequencing data: Software evaluation for optimal detection. Brief Bioinform. 2013;14:46–55. doi: 10.1093/bib/bbs013. - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

ScanIndel: a hybrid framework for indel detection via gapped alignment, split reads and de novo assembly

Affiliations

ScanIndel: a hybrid framework for indel detection via gapped alignment, split reads and de novo assembly

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources