Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jun 8;2(6):100139.
doi: 10.1016/j.xgen.2022.100139.

A multi-platform reference for somatic structural variation detection

Affiliations

A multi-platform reference for somatic structural variation detection

Jose Espejo Valle-Inclan et al. Cell Genom. .

Abstract

Accurate detection of somatic structural variation (SV) in cancer genomes remains a challenging problem. This is in part due to the lack of high-quality, gold-standard datasets that enable the benchmarking of experimental approaches and bioinformatic analysis pipelines. Here, we performed somatic SV analysis of the paired melanoma and normal lymphoblastoid COLO829 cell lines using four different sequencing technologies. Based on the evidence from multiple technologies combined with extensive experimental validation, we compiled a comprehensive set of carefully curated and validated somatic SVs, comprising all SV types. We demonstrate the utility of this resource by determining the SV detection performance as a function of tumor purity and sequence depth, highlighting the importance of assessing these parameters in cancer genomics projects. The truth somatic SV dataset as well as the underlying raw multi-platform sequencing data are freely available and are an important resource for community somatic benchmarking efforts.

Keywords: benchmarking; cancer; long sequencing read; short sequencing read; structural variant; truth set; whole-genome sequencing.

PubMed Disclaimer

Conflict of interest statement

A.M.W. is an employee and shareholder of Pacific Biosciences. W.P.K. is an employee and shareholder of Cyclomics B.V.

Figures

None
Graphical abstract
Figure 1
Figure 1
Overview of the COLO829 multi-technology genomic dataset (A and B) Sequencing depth (A) and log-scaled molecular analysis length (B) distributions per technology dataset for COLO829 (blue) and COLO829BL (red). Means are indicated by horizontal black lines. (C) Copy number profile of COLO829 calculated independently for each of the datasets.
Figure 2
Figure 2
Generation of a validated somatic SV truth set (A) State-of-the-art somatic SV calling pipelines were used independently for each technology dataset. The number of somatic SV candidates identified are indicated in boxes. Overlapping variant calls obtained by the different platforms were merged and independently validated using a combination of targeted enrichment with hybrid capture probes followed by next-generation sequencing, PCR, and Bionano genomics. Validated somatic SV candidates and calls supported by more than one dataset were manually curated, leaving a total of 68 somatic SVs in the truth set. (B and C) Intersections between the 68 somatic SVs in the truth set and the original SV call sets (B) and the validation results (C) are shown. 10X, 10× Genomics; BN, Bionano; ILL, Illumina HiseqX; MULT, support by multiple sequencing platforms; ONT, Oxford Nanopore; PB, PacBio.
Figure 3
Figure 3
Characterization of the somatic SV truth set (A) Distribution of different types of SVs in the COLO829 truth set, divided in size bins. Translocations (BND) are assigned a size of 0 bp. (B) Correlation between CNAs and somatic SVs in the COLO829 truth set. The circos plot shows copy number gains (green) and losses (red) and somatic SVs. Each copy number change is expected to be flanked by an SV event. Two complex breakage-fusion-bridge events are present in COLO829. (C) The first one occurs in chromosome 3 (blue), with templated insertions from chromosomes 6 (pink), 10 (green), and 12 (red) (see also Video S1 for an animation of the proposed mechanism shaping this event). (D) The second one occurs in chromosome 15, with templated insertions from chromosomes 6 (pink) and 20 (green). Breakpoints are indicated by vertical lines with arrowheads showing breakpoint orientations. Dashed lines indicate junctions between two breakpoints. Break junctions are labelled with truth set SV ID number (Table S3).
Figure 4
Figure 4
Recall and precision of somatic SV calling as function of tumor purity and sequencing depth effect Different tumor purities (0%, 10%, 20%, 25%, 50%, 75%, and 100%) were simulated by mixing data from COLO829 and COLO829BL for the ILL, ONT, and PB datasets. (A) Somatic SV calling was performed independently for each purity subset, and recall (left) and precision (right) were calculated against the COLO829 somatic SV truth set. Lines represent the median of independent triplicate measurements. (B) For each tumor purity subset in the ILL dataset, different sequencing depths (1×, 5×, 10×, 30×, 50×, and 98×) were sampled. Somatic SV calling was performed independently for each sequencing depth and tumor purity subset, and recall (left) and precision (right) were calculated against the COLO829 somatic SV truth set.

References

    1. Yang L., Luquette L.J., Gehlenborg N., Xi R., Haseley P.S., Hsieh C.H., Zhang C., Ren X., Protopopov A., Chin L., et al. Diverse mechanisms of somatic structural variations in human cancer genomes. Cell. 2014;157:1736. doi: 10.1016/j.cell.2014.05.020. - DOI - PMC - PubMed
    1. Li Y., Roberts N.D., Wala J.A., Shapira O., Schumacher S.E., Kumar K., Khurana E., Waszak S., Korbel J.O., Haber J.E., et al. Patterns of somatic structural variation in human cancer genomes. Nature. 2020;578:112–121. doi: 10.1038/s41586-019-1913-9. - DOI - PMC - PubMed
    1. Kloosterman W.P., Koster J., Molenaar J.J. Prevalence and clinical implications of chromothripsis in cancer genomes. Curr. Opin. Oncol. 2014;26:64–72. doi: 10.1097/cco.0000000000000038. - DOI - PubMed
    1. Cortés-Ciriano I., Lee J.J.K., Xi R., Jain D., Jung Y.L., Yang L., Gordenin D., Klimczak L.J., Zhang C.Z., Pellman D.S., PCAWG Structural Variation Working Group. Park P.J., PCAWG Consortium Comprehensive analysis of chromothripsis in 2,658 human cancers using whole-genome sequencing. Nat. Genet. 2020;52:331–341. doi: 10.1038/s41588-019-0576-7. - DOI - PMC - PubMed
    1. Zhang C.-Z., Spektor A., Cornils H., Francis J.M., Jackson E.K., Liu S., Meyerson M., Pellman D. Chromothripsis from DNA damage in micronuclei. Nature. 2015;522:179–184. doi: 10.1038/nature14493. - DOI - PMC - PubMed

LinkOut - more resources