Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 13;15(1):8707.
doi: 10.1038/s41598-025-92750-x.

Benchmarking long-read structural variant calling tools and combinations for detecting somatic variants in cancer genomes

Affiliations

Benchmarking long-read structural variant calling tools and combinations for detecting somatic variants in cancer genomes

Safa Kerem Aydin et al. Sci Rep. .

Abstract

Cancer genomes have a complicated landscape of mutations, including large-scale rearrangements known as structural variants (SVs). These SVs can disrupt genes or regulatory elements, playing a critical role in cancer development and progression. Despite their importance, accurate identification of somatic structural variants (SVs) remains a significant bottleneck in cancer genomics. Long-read sequencing technologies hold great promise in SV discovery, and there is an increasing number of efforts to develop new tools to detect them. In this study, we employ eight widely used SV callers on paired tumor and matched normal samples from both the NCI-H2009 lung cancer cell line and the COLO829 melanoma cell line, the latter of which has a well-established somatic SV truth set. Following separate variation detection in both tumor and normal DNA, the VCF merging procedure and a subtraction method were used to identify candidate somatic SVs. Additionally, we explored different combinations of the tools to enhance the accuracy of true somatic SV detection. Our analysis adopts a comprehensive approach, evaluating the performance of each SV caller across a spectrum of variant types and numbers in finding cancer-related somatic SVs. This study, by comparing eight different tools and their combinations, not only reveals the benefits and limitations of various techniques but also establishes a framework for developing more robust SV calling pipelines. Our findings highlight the strengths and weaknesses of current SV calling tools and suggest that combining multiple tools and testing different combinations can significantly enhance the validation of somatic alterations.

Keywords: Cancer genomics; Long-read sequencing; SV calling tools; Somatic structural variants; Tool benchmarking; Whole-genome sequencing data.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests. Consent for publication: The authors declare that they consent to publication.

Figures

Fig. 1
Fig. 1
Overall workflow of obtaining candidate somatic structural variants.
Fig. 2
Fig. 2
Quality statistics for tumor and matched-normal samples. This figure compares the sequencing quality metrics, including mean sequence quality, coverage, and mean mapping quality, for tumor samples (H2009, COLO829) and their matched-normal counterparts (BL2009, COLO829BL).
Fig. 3
Fig. 3
Tumor, normal, and somatic SV counts across all tools for both lung adenocarcinoma and melanoma datasets. The graph illustrates the number of structural variants (SVs) detected by each tool, split into tumor (red), normal (green), and somatic (blue) categories. These counts provide insight into the detection capabilities of the tools and their effectiveness in distinguishing somatic variants across both cancer types.
Fig. 4
Fig. 4
Stacked bar plot of somatic structural variant counts across allSV calling tools datasets. This figure illustrates the distribution of structural variant (SV) types—insertions (INS), deletions (DEL), duplications (DUP), inversions (INV), and translocations (TRA)—across multiple SV calling tools for both the COLO829/COLO829BL and H2009/BL2009 datasets.
Fig. 5
Fig. 5
Intersection of somatic structural variants detected by multiple tools in H2009/BL2009 and COLO829/COLO829BL pairs. The overlap of somatic structural variants (SVs) detected by multiple tools, with only 284 and 5 common variants across all tools in H2009/BL2009 and COLO829/COLO829BL datasets, respectively These common variants, detected by several SV callers, represent the strongest somatic candidates due to their consistent identification, increasing confidence in their validity.
Fig. 6
Fig. 6
Circos plots illustrating somatic structural variants detected by each tool, categorized by variant types (INS, DEL, INV, DUP, and TRA) across both datasets for H2009/BL2009 somatic structural variant counts (A), and for the COLO829/COLO829BL somatic structural variant counts (B). The color-coding and arc positioning indicate the SV type and location of each variant, providing a comprehensive way to observe structural rearrangements, the frequency of specific variants, and their distribution across different datasets. The outer side of the plot displays chromosome numbers, while the legend from inside to outside showcases duplication-translocation, inversion, deletion, and insertion events. Each red line represents a distinct insertion event, blue lines signify separate deletion events, and orange lines denote inversion events. Translocation events between chromosomes are displayed in the middle with connecting lines, while duplication variants are shown as individual lines in the center.
Fig. 7
Fig. 7
F1 score, precision, and recall performance line plot. The performance metrics—F1 score, precision, and recall—across different tools and their combinations in detecting structural variants (SVs).

References

    1. Currall, B. B., Chiang, C., Talkowski, M. E. & Morton, C. C. Mechanisms for structural variation in the human genome. Curr. Genet. Med. Rep.1(2), 81–90. 10.1007/s40142-013-0012-8 (2013). - DOI - PMC - PubMed
    1. Alkan, C., Coe, B. P. & Eichler, E. E. Genome structural variation discovery and genotyping. Nat. Rev. Genet.12(5), 363–376. 10.1038/nrg2958 (2011). - DOI - PMC - PubMed
    1. Scott, A. J., Chiang, C. & Hall, I. M. Structural variants are a major source of gene expression differences in humans and often affect multiple nearby genes. Genome Res.31(12), 2249–2257. 10.1101/gr.275488.121 (2021). - DOI - PMC - PubMed
    1. Sudmant, P. H. et al. An integrated map of structural variation in 2,504 human genomes. Nature526(7571), 75–81. 10.1038/nature15394 (2015). - DOI - PMC - PubMed
    1. Ashby, C. et al. Structural variants shape the genomic landscape and clinical outcome of multiple myeloma. Blood Cancer J.12(5), 85. 10.1038/s41408-022-00673-x (2022). - DOI - PMC - PubMed

LinkOut - more resources