Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2025 Feb 6;20(2):e0314982.
doi: 10.1371/journal.pone.0314982. eCollection 2025.

Comparisons of performances of structural variants detection algorithms in solitary or combination strategy

Affiliations
Comparative Study

Comparisons of performances of structural variants detection algorithms in solitary or combination strategy

De-Min Duan et al. PLoS One. .

Abstract

Structural variants (SVs) have been associated with changes in gene expression, which may contribute to alterations in phenotypes and disease development. However, the precise identification and characterization of SVs remain challenging. While long-read sequencing offers superior accuracy for SV detection, short-read sequencing remains essential due to practical and cost considerations, as well as the need to analyze existing short-read datasets. Numerous algorithms for short-read SV detection exist, but none are universally optimal, each having limitations for specific SV sizes and types. In this study, we evaluated the efficacy of six advanced SV detection algorithms, including the commercial software DRAGEN, using the GIAB v0.6 Tier 1 benchmark and HGSVC2 cell lines. We employed both individual and combination strategies, with systematic assessments of recall, precision, and F1 scores. Our results demonstrate that the union combination approach enhanced detection capabilities, surpassing single algorithms in identifying deletions and insertions, and delivered comparable recall and F1 scores to the commercial software DRAGEN. Interestingly, expanding the number of algorithms from three to five in the combination did not enhance performance, highlighting the efficiency of a well-chosen ensemble over a larger algorithmic pool.

PubMed Disclaimer

Conflict of interest statement

NO authors have competing interests.

Figures

Fig 1
Fig 1. SV detection workflow.
SV detection workflow starts with benchmark set collection from GIAB and HGSVC2, followed by sequence alignment using BWA-MEM to produce BAM files. These BAM files are then processed by various SV callers (DRAGEN, Manta, DELLY, LUMPY, GRIDSS, SvABA) to generate VCF files, which are subsequently used for performance assessment, including single tool and combination strategies, based on recall, precision, and F1 score.
Fig 2
Fig 2. The SVs detected by six SV callers.
(A) SVs of sizes ≥ 50 bp and CTX detected by individual SV callers in samples, HG002, HG00514, HG00733, and NA19240. (B) Length distribution of different variants for all samples detected by individual SV callers. The maximum, minimum, and median are based on the integrated values from all sample sets. DEL: deletion, INS: insertion, DUP: duplication, INV: inversion, and CTX: complex translocation.
Fig 3
Fig 3. The distribution of SVs in truth sets and the performance of individual algorithms.
(A) The distribution of SVs size and types in truth sets. (B) The comparison of False negative (FN) and False positive (FP) numbers among individual algorithms. (C) The precision, recall, and F1 score of each individual algorithm in detecting “DELs” and “INSs” with the size ≥ 50 bp. DEL: deletion, INS: insertion.
Fig 4
Fig 4. Concordance of neighbor SVs detected by member callers in different combination strategies.
(A) The agreement among three SV detection tools (Manta, DELLY, and GRIDSS). (B) The agreement among five SV detection tools (Manta, DELLY, GRIDSS, LUMPY, and SvABA).
Fig 5
Fig 5. Distribution and performance of combination strategies.
(A) The distribution of SVs in different combination strategies. (B) The comparison of False negative (FN) and False positive (FP) numbers among individual algorithms. (C) The precision, recall, and F1 score of combination strategies in the detection of DELs and INSs. DEL: deletion, INS: insertions, DUP: duplication, INV: inversion.
Fig 6
Fig 6. Combination performance of each single caller and combination strategy in detecting DELs and INSs.
The maximum, minimum, and macro average of recall, precision, and F1 score are based on the integrated values from all sample sets.

References

    1. Zook JM, Hansen NF, Olson ND, Chapman L, Mullikin JC, Xiao C, et al.. A robust benchmark for detection of germline large deletions and insertions. Nat Biotechnol. 2020;38(11):1347–55. Epub 20200615. doi: 10.1038/s41587-020-0538-8 ; PubMed Central PMCID: PMC8454654. - DOI - PMC - PubMed
    1. Pang AW, MacDonald JR, Pinto D, Wei J, Rafiq MA, Conrad DF, et al.. Towards a comprehensive structural variation map of an individual human genome. Genome Biology. 2010;11(5):R52. doi: 10.1186/gb-2010-11-5-r52 - DOI - PMC - PubMed
    1. Weischenfeldt J, Symmons O, Spitz F, Korbel JO. Phenotypic impact of genomic structural variation: insights from and for human disease. Nat Rev Genet. 2013;14(2):125–38. doi: 10.1038/nrg3373 . - DOI - PubMed
    1. Collins RL, Brand H, Karczewski KJ, Zhao X, Alföldi J, Francioli LC, et al.. A structural variation reference for medical and population genetics. Nature. 2020;581(7809):444–51. Epub 20200527. doi: 10.1038/s41586-020-2287-8 ; PubMed Central PMCID: PMC7334194. - DOI - PMC - PubMed
    1. Mahmoud M, Gobet N, Cruz-Dávalos DI, Mounier N, Dessimoz C, Sedlazeck FJ. Structural variant calling: the long and the short of it. Genome Biology. 2019;20(1):246. Epub 20191120. doi: 10.1186/s13059-019-1828-7 ; PubMed Central PMCID: PMC6868818. - DOI - PMC - PubMed

Publication types

LinkOut - more resources