Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Sep;30(9):1258-1273.
doi: 10.1101/gr.260497.119. Epub 2020 Sep 4.

Comprehensive analysis of structural variants in breast cancer genomes using single-molecule sequencing

Affiliations

Comprehensive analysis of structural variants in breast cancer genomes using single-molecule sequencing

Sergey Aganezov et al. Genome Res. 2020 Sep.

Abstract

Improved identification of structural variants (SVs) in cancer can lead to more targeted and effective treatment options as well as advance our basic understanding of the disease and its progression. We performed whole-genome sequencing of the SKBR3 breast cancer cell line and patient-derived tumor and normal organoids from two breast cancer patients using Illumina/10x Genomics, Pacific Biosciences (PacBio), and Oxford Nanopore Technologies (ONT) sequencing. We then inferred SVs and large-scale allele-specific copy number variants (CNVs) using an ensemble of methods. Our findings show that long-read sequencing allows for substantially more accurate and sensitive SV detection, with between 90% and 95% of variants supported by each long-read technology also supported by the other. We also report high accuracy for long reads even at relatively low coverage (25×-30×). Furthermore, we integrated SV and CNV data into a unifying karyotype-graph structure to present a more accurate representation of the mutated cancer genomes. We find hundreds of variants within known cancer-related genes detectable only through long-read sequencing. These findings highlight the need for long-read sequencing of cancer genomes for the precise analysis of their genetic instability.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Sample collection, sequencing, and alignment pipeline and statistics overview. (A) Biological data sample collection, sequencing, and alignment workflow for SKBR3 breast cancer cell line and 3D Matrigel-grown organoids for solid breast cancer tumor tissues obtained from two females, patient 51 and patient 48. (B) Yield and alignment coverage statistics for observed samples across WGS experiments various sequencing platforms. Suffixes T and N next to patient identifiers indicate tumor or matching normal tissue. Alignment values x (y) represent average read-depth x for aligned reads with (y) representing average read depth when all unresolved Ns in the reference are also taken into consideration. (C) Length distribution for reads of length 1.5+ kbp from PacBio and ONT sequencing runs for patient 51: (raw-yield) lengths of raw sequenced reads; (raw-aligned) lengths of raw reads that had any alignment inferred for them; (aligned) lengths of aligned parts of sequenced reads.
Figure 2.
Figure 2.
Structural variation inference across Illumina/10xG, ONT, and PacBio sequencing platforms for sample 51. (A) Ensemble workflow for SV inference, with multiple methods and technologies used to infer SVs, subsequent merging of, first method-specific results, and then technology-specific results, with size and support restrictions applied. (B) SV inference comparison across SVs inferred from platform (x) sequencing experiments, in which “platform” corresponds to sequencing technology, and (x) determines the average alignment read-depth coverage in the tumor sample. Methods-specific breakdown is provided for every sequencing technology. SVs detected in the normal sample are in parentheses. (C) Size distribution for SVs in sample 51T with SVs being either exclusively inferred from either long reads (either ONT, or PacBio, or both), or exclusively from Illumina/10xG short reads, or supported by both long and short reads.
Figure 3.
Figure 3.
Structural variation inference on down-sampled long-read data sets. (A) Workflow for down-sampling full long-read data set and computing concordance between down-sampled and full coverage data sets with distinct minimum fractional x/y read support for an SV to be considered. (B) Precision and recall for SVs inferred on down-sampled ONT and PacBio data for sample 51T. SVs inferred on the full coverage data set at the matching read support threshold are used as the ground truth.
Figure 4.
Figure 4.
Integration of SVs and CNVs for cancer genomes via karyotype-graph integration. (A) Haplotype constraint groups determined via uninterrupted SVs (uSVs) and long ONT and/or PacBio reads spanning multiple SVs. Distribution over the number of haplotype constraint groups inferred with only uSVs, and various combinations of uSVs and short/long reads in patient 51. (B) Workflow of the reconstruction of haplotype-specific cancer karyotype graphs (RCK) method with allele-specific copy number profiles on large fragments, resolved SV call set, and inferred haplotype constraint groups as inputs. (C) Circos plot of the CNVs and SVs from karyotype graph inferred by RCK for patient 51 with HATCHet segment copy number (CN) input. The top two tracks correspond to fractions x/y of the total length x of either amplified (CN ≥ 1) or deleted (CN = 0) fragments over the y = 5 × 106 long windows. Breakend track shows the total number (with 590 being the maximum value shown) of breakends inferred by RCK as being present.
Figure 5.
Figure 5.
Structural and copy number variants in COSMIC census genes. (A) Comparison of the number of COSMIC census genes containing SVs, as well as the number of SVs within COSMIC census genes, across inferred SV call set in 51T and N (parenthetical), SKBR3, and 48T, and SVs reported by RCK as being present in the karyotype graphs reconstructed with either HATCHet or TitanCNA copy number profiles in 51T. (B) Comparison of the number of COSMIC census genes with either allele-specific deletions or amplifications between copy number profiles from HATCHet, RCK + HATCHet, TitanCNA, and RCK + TitanCNA in 51T.
Figure 6.
Figure 6.
SVs identified in cancer-related COSMIC census genes in patient 51. All presented SVs are identified with both ONT and PacBio reads. Superscripts indicate the following: (*) marked SVs within known exons; (+) found as rare in 1KGP samples; and (s) identified by short-read SV inference methods, respectively. (A) An insertion in the BRCA1 gene identified in <1% of samples in 1KGP samples. (B) An insertion in the CHEK2 gene. (C) An insertion/duplication, deletion, and two duplications in the NOTCH1 gene, with deletion also found with short reads. All four SVs belong to the same haplotype as indicated by multiple long (both ONT and PacBio) reads spanning all of them at the same time. (D) An insertion and a deletion in the ZNF331 gene, with the later deletion within an exon in the NM_001317121 transcript and genotyped in <1% of 1KGP project samples. Both SVs belong to the same haplotype as indicated by long reads spanning all of them at the same time.

References

    1. Aganezov S, Raphael BJ. 2020. Reconstruction of clone- and haplotype-specific cancer genome karyotypes from bulk tumor samples. Genome Res (this issue). 10.1101/gr.256701.119 - DOI - PMC - PubMed
    1. Aganezov S, Zban I, Aksenov V, Alexeev N, Schatz MC. 2019. Recovering rearranged cancer chromosomes from karyotype graphs. BMC Bioinformatics 20: 641 10.1186/s12859-019-3208-4 - DOI - PMC - PubMed
    1. Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio SAJR, Behjati S, Biankin AV, Bignell GR, Bolli N, Borg A, Børresen-Dale AL, et al. 2013. Signatures of mutational processes in human cancer. Nature 500: 415–421. 10.1038/nature12477 - DOI - PMC - PubMed
    1. Audano PA, Sulovari A, Graves-Lindsay TA, Cantsilieris S, Sorensen M, Welch AME, Dougherty ML, Nelson BJ, Shah A, Dutcher SK, et al. 2019. Characterizing the major structural variant alleles of the human genome. Cell 176: 663–675.e19. 10.1016/j.cell.2018.12.019 - DOI - PMC - PubMed
    1. Baudino TA. 2015. Targeted cancer therapy: the next generation of cancer treatment. Curr Drug Discov Technol 12: 3–20. 10.2174/1570163812666150602144310 - DOI - PubMed

Publication types