Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep 30;25(1):898.
doi: 10.1186/s12864-024-10792-3.

Performance of somatic structural variant calling in lung cancer using Oxford Nanopore sequencing technology

Affiliations

Performance of somatic structural variant calling in lung cancer using Oxford Nanopore sequencing technology

Lingchen Liu et al. BMC Genomics. .

Abstract

Background: Lung cancer is a heterogeneous disease and the primary cause of cancer-related mortality worldwide. Somatic mutations, including large structural variants, are important biomarkers in lung cancer for selecting targeted therapy. Genomic studies in lung cancer have been conducted using short-read sequencing. Emerging long-read sequencing technologies are a promising alternative to study somatic structural variants, however there is no current consensus on how to process data and call somatic events. In this study, we preformed whole genome sequencing of lung cancer and matched non-tumour samples using long and short read sequencing to comprehensively benchmark three sequence aligners and seven structural variant callers comprised of generic callers (SVIM, Sniffles2, DELLY in generic mode and cuteSV) and somatic callers (Severus, SAVANA, nanomonsv and DELLY in somatic modes).

Results: Different combinations of aligners and variant callers influenced somatic structural variant detection. The choice of caller had a significant influence on somatic structural variant detection in terms of variant type, size, sensitivity, and accuracy. The performance of each variant caller was assessed by comparing to somatic structural variants identified by short-read sequencing. When compared to somatic structural variants detected with short-read sequencing, more events were detected with long-read sequencing. The mean recall of somatic variant events identified by long-read sequencing was higher for the somatic callers (72%) than generic callers (53%). Among the somatic callers when using the minimap2 aligner, SAVANA and Severus achieved the highest recall at 79.5% and 79.25% respectively, followed by nanomonsv with a recall of 72.5%.

Conclusion: Long-read sequencing can identify somatic structural variants in clincal samples. The longer reads have the potential to improve our understanding of cancer development and inform personalized cancer treatment.

Keywords: Benchmarking long read approaches; Long read sequencing; Small cell lung cancer; Somatic structural variants detection.

PubMed Disclaimer

Conflict of interest statement

John V. Pearson and Nicola Waddell are co-founders of genomiQa. LL and NW were funded by Oxford Nanopore to present work from this study at meetings. The remaining authors declare that there are no competing interests.

Figures

Fig. 1
Fig. 1
Overview of study design to benchmark alignment and somatic SV calling. The workflow includes multiple steps. a Sample collection. DNA was extracted from seven tumour lung samples collected from EBUS-TBNA and seven matching blood samples. b The normal and tumour DNA underwent whole genome sequencing (WGS) using ONT PromethION for long-read sequencing (LRS) and Illumina NovaSeq for short-read sequencing (SRS). The processing steps for identifying somatic SV events in both long-read and short-read data used different tools but followed a similar process: sequence base calling and alignment, followed by the application of various variant calling methods for SV detection. In LRS, three aligners (minimap2, Winnowmap, and NGMLR) were evaluated, in combination with two approaches (generic and somatic calling) for detecting somatic SVs. The four generic SV callers were cuteSV, DELLY (G), Sniffles2, and SVIM, which required manual subtraction to determine somatic SVs with Jasmine. The somatic callers were DELLY (S_Paired), DELLY (S_ConSet), nanomonsv, SAVANA, and Severus. c The performance of each approach in LRS was evaluated by comparing the somatic SVs identified to high-confidence somatic SV events (obtained from SRS of the same samples and called by two or more of these approaches: qSV, DELLY, and GRIDSS)
Fig. 2
Fig. 2
Assessment of three sequence aligners for LRS. a Processing time in hours (y-axis) for three aligners (x-axis). Each box represents seven normal samples (shown on left) and seven tumour samples (shown on right). All data points are shown. The lines between boxes indicate the same samples. b RAM usage in Gb (y-axis) for three aligners (x-axis). Each box represents seven normal samples (shown on left) and seven tumour samples (shown on right). All data points are shown. c Mapping rate (y-axis) for three aligners (x-axis). Four tumour samples (shown on the right) and their paired normal samples (shown on the left) which passed sample quality control are included in the plot. d Genome Coverage (y-axis) for three aligners (x-axis). Four tumour samples (shown on the right) and their paired normal samples (shown on the left) which passed sample quality control are included in the plot
Fig. 3
Fig. 3
Comparison of somatic structural variant (SV) detection approaches. a Bar charts display the counts of SVs (x-axis) with the four generic callers on the left (y-axis) and the five somatic SV callers on the right (y-axis). Denser colours on the chart signify somatic SV event counts, while lighter colours correspond to the subtracted germline SV events detected in the four tumour samples. Ridgeline plots showing distributions of b SV size and c SV supporting read counts on the x-axis for four generic and five somatic SV callers (y-axis). d The UpSet plot of SV events among different variant callers (y-axis) within each approach. The top section shows the counts of shared and unique somatic SV events among SV callers. The middle section displays the percentage distribution of detected SV types among unique and shared events, categorised and coloured into five distinct types: BND: translocations (dark green); DEL: deletions (green); DUP: duplicates (pale pink); INS: insertions (red); and INV: inversions (dark red). The bottom panel illustrates the matching variant callers. Unique caller events are represented by single dots, while overlaps are indicated by linked dots. Variant callers are assigned colours—green: SVIM; dark blue: Sniffles2; dark yellow: DELLY (G); and pink: cuteSV, for generic callers; gold for DELLY (S_Paired); light brown for DELLY (S_ConSet); light blue for nanomonsv; light salmon for SAVANA; and light aqua for Severus. The N values represent the count of somatic SV events from the total events discovered in four tumour samples. Median values are shown with Median under the name of each SV callers
Fig. 4
Fig. 4
Characterization of concordant and unique somatic SV events identified by long and short read sequencing in four lung cancer patients. For long-read sequencing, the data were aligned with minimap2 and somatic SV events identified with nanomonsv. a The counts of SV breakpoints that overlap with various genomic region types, represented by distinct colors: dark purple indicates high signal regions, light purple represents low mappability regions, dark green denotes telomere regions, light green corresponds to centromere regions, and other regions (regions outside of the problematic regions, which short reads can align with high confidence) are shown in light grey. b Circos plot showing the somatic SV events from four patients. The outer ring shows chromosomes (GRCh38), while the inner track shows somatic SV events that are categorized into three groups: LR and SR overlaps (red), LR-specific (yellow), and SR-specific (blue)

References

    1. WHO. Cancer World Health Organization (Fact sheets). 2022. Available from: https://www.who.int/news-room/fact-sheets/detail/cancer. Cited 2023 23rd Feb.
    1. Kim K-B, Dunn CT, Park K-S. Recent progress in mapping the emerging landscape of the small-cell lung cancer genome. Exp Mol Med. 2019;51(12):1–13. - PMC - PubMed
    1. Kris MG, Johnson BE, Berry LD, Kwiatkowski DJ, Iafrate AJ, Wistuba II, et al. Using multiplexed assays of oncogenic drivers in lung cancers to select targeted drugs. JAMA. 2014;311(19):1998–2006. - PMC - PubMed
    1. Herbst RS, Morgensztern D, Boshoff C. The biology and management of non-small cell lung cancer. Nature. 2018;553(7689):446–54. - PubMed
    1. Zhang T, Joubert P, Ansari-Pour N, Zhao W, Hoang PH, Lokanga R, et al. Genomic and evolutionary classification of lung cancer in never smokers. Nat Genet. 2021;53(9):1348–59. - PMC - PubMed

MeSH terms

LinkOut - more resources