Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Aug 3;21(1):189.
doi: 10.1186/s13059-020-02107-y.

Long-read-based human genomic structural variation detection with cuteSV

Affiliations

Long-read-based human genomic structural variation detection with cuteSV

Tao Jiang et al. Genome Biol. .

Abstract

Long-read sequencing is promising for the comprehensive discovery of structural variations (SVs). However, it is still non-trivial to achieve high yields and performance simultaneously due to the complex SV signatures implied by noisy long reads. We propose cuteSV, a sensitive, fast, and scalable long-read-based SV detection approach. cuteSV uses tailored methods to collect the signatures of various types of SVs and employs a clustering-and-refinement method to implement sensitive SV detection. Benchmarks on simulated and real long-read sequencing datasets demonstrate that cuteSV has higher yields and scaling performance than state-of-the-art tools. cuteSV is available at https://github.com/tjiangHIT/cuteSV .

Keywords: Long-read sequencing; Scaling performance; Structural variants detection.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Schematic illustration of the cuteSV approach. cuteSV uses sorted BAM file as input to detect SVs in 3 major steps. In step 1 (“discovering SV signatures”), cuteSV collects various types of SV signatures comprehensively from inter- and intra-alignments. In step 2 (“clustering of SV signatures”), a heuristic clustering-and-refinement method is employed to sensitively discover accurate SV alleles. In step3 (“SV calling and genotyping”), cuteSV generates the SV callsets and assigns genotypes
Fig. 2
Fig. 2
Benchmark results of the SV callers on various simulated datasets. F1 scores of a deletion, b insertion, c duplication, d inversion, e translocation at breakpoint level, and f translocation at breakend level, for the simulated datasets in various coverages and w/o genotyping. In the figure, “N×” and “N×-GT” indicate the statistics without and with genotyping, respectively
Fig. 3
Fig. 3
Benchmark results of the SV callers on various HG002 PacBio sequencing datasets. a Precisions, recalls, and F1 scores on the whole and down-sampled HG002 PacBio CLR datasets. b Precisions, recalls, and F1 scores on the whole and down-sampled HG002 PacBio CCS datasets. c Recall rate of homozygous parental variants. d Mendelian-Discordance-Rates (MDRs) for the variants unique to the offspring
Fig. 4
Fig. 4
Benchmark results of the SV callers on various of HG002 ONT sequencing datasets. a Precisions, recalls, and F-scores on the whole and down-sampled HG002 ONT datasets. b The Venn diagram of SV calls produced by cuteSV from HG002 PacBio CLR, CCS, and ONT PromethION datasets (indicated by “CLR”, “CCS,” and “ONT”, respectively). c The Venn diagram of SV calls produced by different tools on HG002 PacBio CLR data. d The Venn diagram of SV calls produced by different tools on HG002 PacBio CCS data. e The Venn diagram of SV calls produced by different tools on HG002 ONT PromethION data
Fig. 5
Fig. 5
Performance of the benchmarked SV callers. The a runtimes and b memory footprints of cuteSV, cuteSV, Sniffles, and PBSV with 1, 2, 4, 8, and 16 CPU threads. “Skip GT” indicates the statistics without genotyping. SVIM was benchmarked with single CPU thread only since it does not support multiple thread computing

References

    1. Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, Zhang Y, Ye K, Jun G, Fritz MH, et al. An integrated map of structural variation in 2,504 human genomes. Nature. 2015;526:75–81. doi: 10.1038/nature15394. - DOI - PMC - PubMed
    1. Alkan C, Coe BP, Eichler EE. Genome structural variation discovery and genotyping. Nat Rev Genet. 2011;12:363–376. doi: 10.1038/nrg2958. - DOI - PMC - PubMed
    1. Rovelet-Lecrux A, Hannequin D, Raux G, Le Meur N, Laquerriere A, Vital A, Dumanchin C, Feuillette S, Brice A, Vercelletto M, et al. APP locus duplication causes autosomal dominant early-onset Alzheimer disease with cerebral amyloid angiopathy. Nat Genet. 2006;38:24–26. doi: 10.1038/ng1718. - DOI - PubMed
    1. Hedges DJ, Hamilton-Nelson KL, Sacharow SJ, Nations L, Beecham GW, Kozhekbaeva ZM, Butler BL, Cukier HN, Whitehead PL, Ma DQ, et al. Evidence of novel fine-scale structural variation at autism spectrum disorder candidate loci. Molecular Autism. 2012;3:2. - PMC - PubMed
    1. Weischenfeldt J, Symmons O, Spitz F, Korbel JO. Phenotypic impact of genomic structural variation: insights from and for human disease. Nat Rev Genet. 2013;14:125–138. doi: 10.1038/nrg3373. - DOI - PubMed

Publication types

LinkOut - more resources