Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Dec 14;12 Suppl 14(Suppl 14):S7.
doi: 10.1186/1471-2105-12-S14-S7.

ClipCrop: a tool for detecting structural variations with single-base resolution using soft-clipping information

Affiliations

ClipCrop: a tool for detecting structural variations with single-base resolution using soft-clipping information

Shin Suzuki et al. BMC Bioinformatics. .

Abstract

Background: Structural variations (SVs) change the structure of the genome and are therefore the causes of various diseases. Next-generation sequencing allows us to obtain a multitude of sequence data, some of which can be used to infer the position of SVs.

Methods: We developed a new method and implementation named ClipCrop for detecting SVs with single-base resolution using soft-clipping information. A soft-clipped sequence is an unmatched fragment in a partially mapped read. To assess the performance of ClipCrop with other SV-detecting tools, we generated various patterns of simulation data - SV lengths, read lengths, and the depth of coverage of short reads - with insertions, deletions, tandem duplications, inversions and single nucleotide alterations in a human chromosome. For comparison, we selected BreakDancer, CNVnator and Pindel, each of which adopts a different approach to detect SVs, e.g. discordant pair approach, depth of coverage approach and split read approach, respectively.

Results: Our method outperformed BreakDancer and CNVnator in both discovering rate and call accuracy in any type of SV. Pindel offered a similar performance as our method, but our method crucially outperformed for detecting small duplications. From our experiments, ClipCrop infer reliable SVs for the data set with more than 50 bases read lengths and 20x depth of coverage, both of which are reasonable values in current NGS data set.

Conclusions: ClipCrop can detect SVs with higher discovering rate and call accuracy than any other tool in our simulation data set.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Identification of deletion events. (A) Relation between two breakpoints in deletion events. Clipped sequences are inside the two breakpoints, and they are remapped to outside the opposite breakpoint. (B) The way two breakpoints and clipped sequences are generated from deletion events. Reads generated from deleted region in a donor’s genome are soft-clipped, and remapped.
Figure 2
Figure 2
Identification of inversion events. (A) Relation between two breakpoints in inversion events. Clipped sequences are inside the two breakpoints, and they are remapped to outside the opposite breakpoint reversely. (B) The way two breakpoints and clipped sequences are generated from inversion events. Reads generated from inverted region in a donor’s genome are soft-clipped, and remapped reversely.
Figure 3
Figure 3
Identification of tandem duplication events. (A) Relation between two breakpoints in tandem duplication events. Clipped sequences are outside the two breakpoints, and they are remapped to inside the opposite breakpoint. (B) The way two breakpoints and clipped sequences are generated from tandem duplication events. Soft-clipped sequences are generated from the marginal point of two duplicated sequences.
Figure 4
Figure 4
Identification of insertion/translocation events. (A) Relation between two breakpoints in insertion/translocation events. An L-breakpoint and an R-breakpoint are located in the same position. (B) The way two breakpoints and clipped sequences are generated from insertion/translocation events. If these clipped sequences are remapped to another region, it is a translocation event. If they remain unmapped, it is an insertion event.
Figure 5
Figure 5
Discovery rate and true call rate. (A) Discovery rate is the mean of each ratio of overlapped region in the real SV between the real SV and the called SV determined by formula (2). In this case, the discovery rate is calculated as 0.797. When discovery rate is high, the number of true positive will increase. Thus, this discovery rate can be regarded as the similar concept to sensitivity. (B) True call ratio is the mean of each ratio of overlapped region in the called SV between the real SV and the called SV determined by formula (3). The true call rate is calculated as 0.833. When true call ratio is high, false positive will decrease. Thus, this true call ratio can be regarded as the similar concept to specificity.
Figure 6
Figure 6
Results 1 : discovery rate and true call rate of each method. Discovery rates and true call rates of each data with four methods in various SVs. Numbers in graphs stand for the mean length of SVs. CNVnator only calls deletions and tandem duplications, and BreakDancer doesn’t call tandem duplication.
Figure 7
Figure 7
Discovery rate / true call rate with different depth. Discovery rates and true call rates of ClipCrop with different depth of coverages (5, 10, 15, 20, 40). Numbers in graphs stand for the mean depth of the data.
Figure 8
Figure 8
Discovery rate / true call rate with different lengths of the read sequences. Discovery rates and true call rates of ClipCrop with different lengths of the read sequences (50, 75, 100, 108). Numbers in graphs stand for the lengths of the data.

Similar articles

Cited by

References

    1. Medvedev Paul, Stanciu Monica, Brudno Michael. Computational methods for discovering structural variation with next-generation sequencing. Nat. Methods. 2009;6(11):S13–S20. doi: 10.1038/nmeth.1374. - DOI - PubMed
    1. McCarroll Steven A, Altshuler David M. Copy-number variation and association studies of human disease. Nat. Genetics. 2009;39:S37–S42. - PubMed
    1. Sebat J, Lakshmi B, Malhotra D, Troge J, Lese-Martin C, Walsh T, Yamrom B, Yoon S, Krasnitz A, Kendall J, Leotta A, Pai D, Zhang R, Lee YH, Hicks J, Spence SJ, Lee AT, Puura K, Lehtimäki T, Ledbetter D, Gregersen PK, Bregman J, Sutcliffe JS, Jobanputra V, Chung W, Warburton D, King MC, Skuse D, Geschwind DH, Gilliam TC, Ye K, Wigler M. Strong association of de novo copy number mutations with autism. Science. 2007;316:445–449. doi: 10.1126/science.1138659. - DOI - PMC - PubMed
    1. Singleton AB, Farrer M, Johnson J, Singleton A, Hague S, Kachergus J, Hulihan M, Peuralinna T, Dutra A, Nussbaum R, Lincoln S, Crawley A, Hanson M, Maraganore D, Adler C, Cookson MR, Muenter M, Baptista M, Miller D, Blancato J, Hardy J, Gwinn-Hardy K. Alpha-synuclein locus triplication causes Parkinson’s disease. Science. 2003;302:841. doi: 10.1126/science.1090278. - DOI - PubMed
    1. Xu B, Roos JL, Levy S, van Rensburg EJ, Gogos JA, Karayiorgou M. Strong association of de novo copy number mutations with sporadic schizophrenia. Nat. Genetics. 2008;40:880–885. doi: 10.1038/ng.162. - DOI - PubMed