Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Apr 4;8(1):5608.
doi: 10.1038/s41598-018-23978-z.

IMSindel: An accurate intermediate-size indel detection tool incorporating de novo assembly and gapped global-local alignment with split read analysis

Affiliations

IMSindel: An accurate intermediate-size indel detection tool incorporating de novo assembly and gapped global-local alignment with split read analysis

Daichi Shigemizu et al. Sci Rep. .

Erratum in

Abstract

Insertions and deletions (indels) have been implicated in dozens of human diseases through the radical alteration of gene function by short frameshift indels as well as long indels. However, the accurate detection of these indels from next-generation sequencing data is still challenging. This is particularly true for intermediate-size indels (≥50 bp), due to the short DNA sequencing reads. Here, we developed a new method that predicts intermediate-size indels using BWA soft-clipped fragments (unmatched fragments in partially mapped reads) and unmapped reads. We report the performance comparison of our method, GATK, PINDEL and ScanIndel, using whole exome sequencing data from the same samples. False positive and false negative counts were determined through Sanger sequencing of all predicted indels across these four methods. The harmonic mean of the recall and precision, F-measure, was used to measure the performance of each method. Our method achieved the highest F-measure of 0.84 in one sample, compared to 0.56 for GATK, 0.52 for PINDEL and 0.46 for ScanIndel. Similar results were obtained in additional samples, demonstrating that our method was superior to the other methods for detecting intermediate-size indels. We believe that this methodology will contribute to the discovery of intermediate-size indels associated with human disease.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
The workflow of intermediate-size indel prediction.
Figure 2
Figure 2
Time and peak memory used by four indel detection methods for NA18943.
Figure 3
Figure 3
Intermediate-size indels detected by the three methods for NA18943. Venn diagram showing the overlap of the indels detected by all four methods: IMSindel, GATK HaplotypeCaller, PINDEL and ScanIndel in NA18943 (a), NA18948 (b) and NA12878 (c). The numbers of indel detected in the each method categorized by size in NA18943 (d), NA18948 (e) and NA12878 (f).
Figure 4
Figure 4
Distribution of intermediate-size indels predicted in IMSindel. (a) The total number of deletions and insertions predicted in 478 WES data. The percentage of 12 functional groups in predicted deletions (b) and insertions (c). The number in parenthesis indicates the number of predicted indels per sample.
Figure 5
Figure 5
Performance comparison for indel detection using simulation data. The indel size ranged from 100 bp to 1,000 bp at interval 100 bp. Their sequence reads were generated with several parameters: point mutation rate (0.001 and 0.005), read length (75 bp and 150 bp), and sequencing coverage (100× and 200×).

References

    1. McKenna A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–1303. - PMC - PubMed
    1. Rimmer A, et al. Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nat Genet. 2014;46:912–918. - PMC - PubMed
    1. Shigemizu D, et al. A practical method to detect SNVs and indels from whole genome and exome sequencing data. Sci Rep. 2013;3:2161. - PMC - PubMed
    1. Iafrate AJ, et al. Detection of large-scale variation in the human genome. Nat Genet. 2004;36:949–951. - PubMed
    1. Sebat J, et al. Large-scale copy number polymorphism in the human genome. Science. 2004;305:525–528. - PubMed

Publication types