Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct 11;34(9):1355-1364.
doi: 10.1101/gr.279123.124.

Reconstructing extrachromosomal DNA structural heterogeneity from long-read sequencing data using Decoil

Affiliations

Reconstructing extrachromosomal DNA structural heterogeneity from long-read sequencing data using Decoil

Mădălina Giurgiu et al. Genome Res. .

Abstract

Circular extrachromosomal DNA (ecDNA) is a form of oncogene amplification found across cancer types and associated with poor outcome in patients. ecDNA can be structurally complex and can contain rearranged DNA sequences derived from multiple chromosome locations. As the structure of ecDNA can impact oncogene regulation and may indicate mechanisms of its formation, disentangling it at high resolution from sequencing data is essential. Even though methods have been developed to identify and reconstruct ecDNA in cancer genome sequencing, it remains challenging to resolve complex ecDNA structures, in particular amplicons with shared genomic footprints. We here introduce Decoil, a computational method that combines a breakpoint-graph approach with LASSO regression to reconstruct complex ecDNA and deconvolve co-occurring ecDNA elements with overlapping genomic footprints from long-read nanopore sequencing. Decoil outperforms de novo assembly and alignment-based methods in simulated long-read sequencing data for both simple and complex ecDNAs. Applying Decoil on whole-genome sequencing data uncovered different ecDNA topologies and explored ecDNA structure heterogeneity in neuroblastoma tumors and cell lines, indicating that this method may improve ecDNA structural analyses in cancer.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Decoil algorithm overview and an ecDNA ranking system based on its structural diversity. (A) Schematic of the Decoil algorithm depicting the major steps: (#1) genome fragmentation, (#2) graph encoding, (#3) search simple circles, (#4) circle quantification, (#5) candidate selection, (#6) output, and (#7) visualization. Step #7, visualization, is performed by the Decoil-viz module (see Methods). (B) ecDNA diversity. The x-axis displays the seven ecDNA topologies (e.g., simple circularization, multiregion, multichromosomal) with increasing computational complexity as defined in this paper. The y-axis displays different scenarios of ecDNA composition per sample, that is, singleton (presence of a single ecDNA structure), co-occurrence (presence of different ecDNA species, with nonoverlapping genomic regions), and heterogeneity (presence of different ecDNA species, with overlapping genomic regions). The gradient matrix depicts schematically the ecDNA reconstruction difficulty levels for the different scenarios (y-axis) and topologies (x-axis), which are addressed by Decoil algorithm. Light gray means low difficulty; black, increased difficulty. (C) Computational challenge formulation. The left panel displays a heterogeneity scenario, in which two different ecDNA elements (ABC, BD) share the genomic footprint (B fragment); the right panel displays a single large structure (ABDBC) containing interspersed-duplication rearrangement (B fragment duplicated on ecDNA). Both scenarios lead to the same SV breakpoint profile. To infer the likely conformation, we perform step #4 in A. Created with BioRender (https://www.biorender.com/).
Figure 2.
Figure 2.
Decoil reconstructs complex ecDNA elements with high fidelity from simulated data. (A) Simulation strategy for generating individual ecDNA templates consists of the following steps: (1) choose genomic position, (2) simulate small deletions (DELs), (3) simulate inversion (INV), (4) simulate tandem-duplication (DUP), and (5) generate DNA sequence template (FASTA). The example depicts an ecDNA template harboring 1 × DEL (yellow), 1 × DUP (purple), and 1 × INV (green). (B) Pipeline for generating in silico long reads based on one or more ecDNA templates, at different depths of coverage. (C) The ecDNA topologies. Examples of ecDNA reconstructions performed by Decoil for simulated ecDNA elements, for the seven different topologies. The gray track represents the coverage of the aligned reads. The right column shows the Shasta de novo assembly (x-axis) against the true structure (y-axis). (D) Decoil and Shasta assembly contiguity for simple (i–v) and complex topologies (vi,vii). The y-axis represents the fraction of reconstructions with a specific contiguity (x-axis). The x-axis represents the larger contig normalized by the true structure length. One indicates a good reconstruction, zero poor reconstruction. Values greater than one refer to reconstructions larger than the true structure. The gray horizontal lines are at the 0.5 and 0.7 fractions.
Figure 3.
Figure 3.
Decoil captures the ecDNA structure complexity and heterogeneity in neuroblastoma cell lines. (A) STA-NB-10DM ecDNA reconstruction by Decoil (top), coverage track (middle) of the aligned reads to reference genome GRCh38/hg38, and GENCODE V42 annotation (bottom). The gray highlighted region Chr 2: 17,221,081–17,538,185 (GRCh38/hg38) represents an interspersed duplication. (B) ecDNA elements co-occurrence reconstructed by Decoil in TR14 (top four tracks), the coverage track (middle), and GENCODE V42 annotation (bottom). (C) In silico dilution strategy, in which two samples, S1 (green) and S2 (yellow), are mixed at different ratios to generate mixture of ecDNAs that overlap in the genomic space. (D) ecDNA breakpoint recall (y-axis) for the in silico cell line mixtures, split by the dilution ratio (x-axis). An ecDNA element harboring MYCN is present in every one of the three cell lines, that is, CHP212, TR14, and STA-NB-10DM, and is composed of 10, eight, and 14 breakpoints, respectively. The other co-occurring ecDNA elements in TR14 are also added to the analysis and have four (ODC1-), two (MDM2-), and six (SMC6-amplicon) breakpoints. (E) ecDNA reconstruction visualization using Decoil-viz for the in silico ecDNA mixtures. (iiv) The reconstructed ecDNA structures by Decoil in cell line mixtures (green, TR14; yellow, CHP212; and orange, STA-NB-10DM) overlap in the genomic space at the MYCN locus (gray highlight). (v) Coverage track for pure (100%) TR14, CHP212, and STA-NB-10DM cell lines. Misassemblies are depicted in gray.
Figure 4.
Figure 4.
Decoil recovers structurally complex ecDNA elements in primary cancers. Examples of ecDNA structure reconstruction of Simple SVs (A), Multiregions (B,D), Foldbacks (C), and Duplications/Foldbacks topologies (E) in patient samples. (AE) The tracks represent the Decoil reconstruction (top), coverage of the aligned Nanopore reads to reference genome GRCh38/hg38 (middle), and GENCODE V42 annotation (bottom). The top three reconstructions were included if labeled as ecDNA and had estimated proportions 30 or more copies (AE). E1–E9 are the IDs for each reconstruction (Supplemental Table S4). (F) The topology spectrum of the reconstructed ecDNA structures by Decoil for the five cell lines and nine patient samples. (G) ecDNA reconstruction total size (x-axis) distribution (y-axis) for all data (five cell lines, nine primary tumor samples). (H) ecDNA fragment size distribution split for simple (Simple circularization, Simple SVs, Multiregion, Multichromosomal) or complex (Duplications, Foldbacks) topologies. (I) ecDNA reconstruction total size (x-axis) against estimated proportions (y-axis) computed by Decoil. (H,I) t-Test statistics were applied to test the significance of the ecDNA proportions between simple and complex topologies. All reconstructions labeled as ecDNA and with estimated proportions of 30 or more copies were included in panels FI. Box plot shows Q1 (25%), Q2 (median), Q3 (75%), and interquartile range IQR = Q3–Q1; whiskers are 1.5 × IQR. The colors in AF,H correspond to the legend in I.

References

    1. Chamorro González R, Conrad T, Stöber MC, Xu R, Giurgiu M, Rodriguez-Fos E, Kasack K, Brückner L, van Leen E, Helmsauer K, et al. 2023. Parallel sequencing of extrachromosomal circular DNAs and transcriptomes in single cancer cells. Nat Genet 55: 880–890. 10.1038/s41588-023-01386-y - DOI - PMC - PubMed
    1. Deshpande V, Luebeck J, Nguyen NPD, Bakhtiari M, Turner KM, Schwab R, Carter H, Mischel PS, Bafna V. 2019. Exploring the landscape of focal amplifications in cancer using AmpliconArchitect. Nat Commun 10: 392. 10.1038/s41467-018-08200-y - DOI - PMC - PubMed
    1. Helmsauer K, Valieva ME, Ali S, Chamorro González R, Schöpflin R, Röefzaad C, Bei Y, Dorado Garcia H, Rodriguez-Fos E, Puiggròs M, et al. 2020. Enhancer hijacking determines extrachromosomal circular MYCN amplicon architecture in neuroblastoma. Nat Commun 11: 5823. 10.1038/s41467-020-19452-y - DOI - PMC - PubMed
    1. Hung KL, Yost KE, Xie L, Shi Q, Helmsauer K, Luebeck J, Schöpflin R, Lange JT, Chamorro González R, Weiser NE, et al. 2021. ecDNA hubs drive cooperative intermolecular oncogene expression. Nature 600: 731–736. 10.1038/s41586-021-04116-8 - DOI - PMC - PubMed
    1. Hung KL, Luebeck J, Dehkordi SR, Colón CI, Li R, Wong ITL, Coruh C, Dharanipragada P, Lomeli SH, Weiser NE, et al. 2022. Targeted profiling of human extrachromosomal DNA by CRISPR-CATCH. Nat Genet 54: 1746–1754. 10.1038/s41588-022-01190-0 - DOI - PMC - PubMed

LinkOut - more resources