ClipCrop: a tool for detecting structural variations with single-base resolution using soft-clipping information
- PMID: 22373054
- PMCID: PMC3287472
- DOI: 10.1186/1471-2105-12-S14-S7
ClipCrop: a tool for detecting structural variations with single-base resolution using soft-clipping information
Abstract
Background: Structural variations (SVs) change the structure of the genome and are therefore the causes of various diseases. Next-generation sequencing allows us to obtain a multitude of sequence data, some of which can be used to infer the position of SVs.
Methods: We developed a new method and implementation named ClipCrop for detecting SVs with single-base resolution using soft-clipping information. A soft-clipped sequence is an unmatched fragment in a partially mapped read. To assess the performance of ClipCrop with other SV-detecting tools, we generated various patterns of simulation data - SV lengths, read lengths, and the depth of coverage of short reads - with insertions, deletions, tandem duplications, inversions and single nucleotide alterations in a human chromosome. For comparison, we selected BreakDancer, CNVnator and Pindel, each of which adopts a different approach to detect SVs, e.g. discordant pair approach, depth of coverage approach and split read approach, respectively.
Results: Our method outperformed BreakDancer and CNVnator in both discovering rate and call accuracy in any type of SV. Pindel offered a similar performance as our method, but our method crucially outperformed for detecting small duplications. From our experiments, ClipCrop infer reliable SVs for the data set with more than 50 bases read lengths and 20x depth of coverage, both of which are reasonable values in current NGS data set.
Conclusions: ClipCrop can detect SVs with higher discovering rate and call accuracy than any other tool in our simulation data set.
Figures








Similar articles
-
Robust and exact structural variation detection with paired-end and soft-clipped alignments: SoftSV compared with eight algorithms.Brief Bioinform. 2016 Jan;17(1):51-62. doi: 10.1093/bib/bbv028. Epub 2015 May 20. Brief Bioinform. 2016. PMID: 25998133
-
Discovery of tandem and interspersed segmental duplications using high-throughput sequencing.Bioinformatics. 2019 Oct 15;35(20):3923-3930. doi: 10.1093/bioinformatics/btz237. Bioinformatics. 2019. PMID: 30937433 Free PMC article.
-
Comparison of multiple algorithms to reliably detect structural variants in pears.BMC Genomics. 2020 Jan 20;21(1):61. doi: 10.1186/s12864-020-6455-x. BMC Genomics. 2020. PMID: 31959124 Free PMC article.
-
Evaluation of the performance of copy number variant prediction tools for the detection of deletions from whole genome sequencing data.J Biomed Inform. 2019 Jun;94:103174. doi: 10.1016/j.jbi.2019.103174. Epub 2019 Apr 6. J Biomed Inform. 2019. PMID: 30965134 Review.
-
Next-generation sequencing as a tool for breakpoint analysis in rearrangements of the globin gene clusters.Int J Lab Hematol. 2017 May;39 Suppl 1:111-120. doi: 10.1111/ijlh.12680. Int J Lab Hematol. 2017. PMID: 28447426 Review.
Cited by
-
Structural variation discovery in the cancer genome using next generation sequencing: computational solutions and perspectives.Oncotarget. 2015 Mar 20;6(8):5477-89. doi: 10.18632/oncotarget.3491. Oncotarget. 2015. PMID: 25849937 Free PMC article. Review.
-
Comparative Genomics Supports That Brazilian Bioethanol Saccharomyces cerevisiae Comprise a Unified Group of Domesticated Strains Related to Cachaça Spirit Yeasts.Front Microbiol. 2021 Apr 15;12:644089. doi: 10.3389/fmicb.2021.644089. eCollection 2021. Front Microbiol. 2021. PMID: 33936002 Free PMC article.
-
SIns: A Novel Insertion Detection Approach Based on Soft-Clipped Reads.Front Genet. 2021 Apr 30;12:665812. doi: 10.3389/fgene.2021.665812. eCollection 2021. Front Genet. 2021. PMID: 33995493 Free PMC article.
-
SeqCNV: a novel method for identification of copy number variations in targeted next-generation sequencing data.BMC Bioinformatics. 2017 Mar 3;18(1):147. doi: 10.1186/s12859-017-1566-3. BMC Bioinformatics. 2017. PMID: 28253855 Free PMC article.
-
Identifying micro-inversions using high-throughput sequencing reads.BMC Genomics. 2016 Jan 11;17 Suppl 1(Suppl 1):4. doi: 10.1186/s12864-015-2305-7. BMC Genomics. 2016. PMID: 26818118 Free PMC article.
References
-
- McCarroll Steven A, Altshuler David M. Copy-number variation and association studies of human disease. Nat. Genetics. 2009;39:S37–S42. - PubMed
-
- Sebat J, Lakshmi B, Malhotra D, Troge J, Lese-Martin C, Walsh T, Yamrom B, Yoon S, Krasnitz A, Kendall J, Leotta A, Pai D, Zhang R, Lee YH, Hicks J, Spence SJ, Lee AT, Puura K, Lehtimäki T, Ledbetter D, Gregersen PK, Bregman J, Sutcliffe JS, Jobanputra V, Chung W, Warburton D, King MC, Skuse D, Geschwind DH, Gilliam TC, Ye K, Wigler M. Strong association of de novo copy number mutations with autism. Science. 2007;316:445–449. doi: 10.1126/science.1138659. - DOI - PMC - PubMed
-
- Singleton AB, Farrer M, Johnson J, Singleton A, Hague S, Kachergus J, Hulihan M, Peuralinna T, Dutra A, Nussbaum R, Lincoln S, Crawley A, Hanson M, Maraganore D, Adler C, Cookson MR, Muenter M, Baptista M, Miller D, Blancato J, Hardy J, Gwinn-Hardy K. Alpha-synuclein locus triplication causes Parkinson’s disease. Science. 2003;302:841. doi: 10.1126/science.1090278. - DOI - PubMed
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
Miscellaneous