Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov;56(11):2447-2454.
doi: 10.1038/s41588-024-01949-7. Epub 2024 Oct 14.

Mapping extrachromosomal DNA amplifications during cancer progression

Affiliations

Mapping extrachromosomal DNA amplifications during cancer progression

Hoon Kim et al. Nat Genet. 2024 Nov.

Abstract

To understand the role of extrachromosomal DNA (ecDNA) amplifications in cancer progression, we detected and classified focal amplifications in 8,060 newly diagnosed primary cancers, untreated metastases and heavily pretreated tumors. The ecDNAs were detected at significantly higher frequency in untreated metastatic and pretreated tumors compared to newly diagnosed cancers. Tumors from chemotherapy-pretreated patients showed significantly higher ecDNA frequency compared to untreated cancers. In particular, tubulin inhibition associated with ecDNA increases, suggesting a role for ecDNA in treatment response. In longitudinally matched tumor samples, ecDNAs were more likely to be retained compared to chromosomal amplifications. EcDNAs shared between time points, and ecDNAs in advanced cancers were more likely to harbor localized hypermutation events compared to private ecDNAs and ecDNAs in newly diagnosed tumors. Relatively high variant allele fractions of ecDNA localized hypermutations implicated early ecDNA mutagenesis. Our findings nominate ecDNAs to provide tumors with competitive advantages during cancer progression and metastasis.

PubMed Disclaimer

Conflict of interest statement

R.G.W.V. is a cofounder of, holds equity in and has received research funds from Boundless Bio. H.K. has received research funds from JW Pharmaceutical. J.L. receives compensation as a part-time consultant for Boundless Bio. V.B. is a cofounder, paid consultant and Scientific Advisory Board member, and has an equity interest in Boundless Bio and Abterra. The terms of this arrangement have been reviewed and approved by the University of California, San Diego, in accordance with its conflict-of-interest policies. All other authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Sample classification.
a, Schematic dataset overview. b, Overview of sample classification for 1,490 patients in the primary cancer cohort and 2,440 patients in the advanced cancer. Only tumor types with at least 20 patients in each cohort were included. c, Average number of ecDNA and ChrAmp amplicons detected per ecDNA patient and ChrAmp patient, respectively. Tumor lineages represented by at least 20 tumors in both cancer cohorts are included. Numbers in parentheses indicate the number of patients. Points represent mean values, and error bars show a 95% CI. P values were computed using a two-sided Mann–Whitney U test. d, Percentage of ecDNA samples. e, The average number of distinct ecDNA amplicons per sample in primary and advanced cancer cohorts, showing tumor lineage represented by at least 20 tumors in both cohorts. P values were computed using a one-sided binomial test with the ecDNA-carrying tumor fraction in the primary cancer cohort as a null probability in d and using a one-sided Mann–Whitney U test in e where not significant unless noted otherwise. f, Number of kataegis events normalized by the number of intervals present on ecDNA or ChrAmp amplicons in the primary and advanced cohorts, respectively. Numbers indicate the number of amplicons. Bars represent mean values, and error bars show 95% CIs. P values were computed using a two-sided Mann–Whitney U test. Asterisks indicate level of significance: *1.00 × 102 < P ≤ 5.00 × 10−2, **1.00 × 10−3 < P ≤ 1.00 × 10−2, ***1.00 × 10−4 < P ≤ 1.00 × 10−3 and ****P ≤ 1.00 × 10−4. NS, not significant; GBM, glioblastoma multiforme; SARC, sarcoma; KIRC, kidney renal clear cell carcinoma; PACA, pancreatic cancer; PAEN, pancreatic cancer endocrine neoplasms; BLCA, bladder urothelial carcinoma; LUAD, lung adenocarcinoma; LICA, liver cancer; COADREAD, colorectal cancer; PRAD, prostate adenocarcinoma; HNSC, head and neck squamous cell carcinoma; ESCA, esophageal carcinoma; BRCA, breast invasive carcinoma; STAD, stomach adenocarcinoma; OV, ovarian serous cystadenocarcinoma; UCEC, uterine corpus endometrial carcinoma.
Fig. 2
Fig. 2. Clinical associations.
a, Five-year Kaplan–Meier survival curves by amplification category using patients. The P value derived from comparing the survival curves was based on a log-rank test in the primary and advanced cohorts, separately. b, Distribution of the number of distinct ecDNA and ChrAmp amplicons by pretreatment status across primary, untreated advanced cancers and pretreated advanced cancer tumors. Pretreated advanced cancer tumors show a significantly higher number of distinct ecDNAs and ChrAmps per tumor compared to primary cancer or untreated advanced cancer tumors (two-sided Mann–Whitney U test). Y axis represents the number of distinct ecDNA and ChrAmp amplicons detected per tumor. Numbers indicate patient counts. All tumors with available pretreatment information were included in the analysis. Points represent mean values, and error bars show 95% CIs. c, Distribution of the number of distinct ecDNA and ChrAmp amplicons by the number of pretreatments received across pretreated HMF advanced cancers. P value was calculated using a two-sided Mann–Kendall trend test. Points represent mean values, and error bars show a 95% CI. Only patients with available clinical information were included. Numbers indicate the number of patients. d, Distribution of the number of distinct ecDNA and ChrAmp amplicons by different prebiopsy treatment types in the advanced cancer cohort. ‘Untreated’ category only includes tumors from the advanced cohort. Number of patients per category is shown on the bottom. Only treatment types used in more than 50 patients are shown. P values were calculated using a two-sided Mann–Whitney U test. Points represent mean values, and error bars show a 95% CI.
Fig. 3
Fig. 3. Longitudinal amplicon analysis.
a, Sankey plot showing amplicon classification over time. Only amplicon pairs with statistically significant similarity were included (n = 91). Colors reflect amplicon classification, and numbers indicate the number of amplicons retained between two time points over all amplicons from the first tumor in the corresponding amplicon category. b, The fraction of ecDNA and ChrAmp amplicon pairs retained between the first and the second tumor. Numbers in parentheses indicate the numbers of first tumor amplicons also detected in the second tumor, over the number of all first tumor amplicons. P value was calculated using the chi-square test for tumors 1 and 2. OR, odds ratio.
Fig. 4
Fig. 4. Clustered mutation events by amplicon category.
a, The fraction and the number of ecDNA and ChrAmp amplicons with overlapping clustered mutation events in the T1 tumor. P values were computed using a binomial test (two-sided) with the fraction in the private category as a null probability for ecDNA and ChrAmp, respectively. b, The fraction and number of ecDNA and ChrAmp amplicons with overlapping clustered mutation events in the T2 tumor. P values were computed using a binomial test (two-sided) with the fraction in the private category as a null probability for ecDNA and ChrAmp, respectively.
Fig. 5
Fig. 5. Variant allele fraction by mutational category.
a,b, Comparison of (a) VAFs and (b) CCFs of different mutational categories detected on longitudinally shared or private ecDNA amplicons. Boxplots represent minimum (0th percentile), maximum (100th percentile), first and third quartiles and median with outliers excluded. P values were calculated using a two-sided Mann–Whitney U test. VAFs, variant allele fractions; CCFs, cancer cell fractions.
Extended Data Fig. 1
Extended Data Fig. 1. Overview of sample selection criteria.
a, Comparison of extrachromosomal DNA (ecDNA) count by cohort and average sequence coverage. P-values are derived from a two-sided Mann–Whitney U test. Tissues are matched across the Cancer Genome Atlas (TCGA), the Pan-Cancer Analysis of Whole Genomes (PCAWG) and the Hartwig Medical Foundation (HMF; at least 20 samples in each cohort). Numbers on the bar indicate the number of samples. Boxplots represent minimum (0th percentile), maximum (100th percentile), 1st and 3rd quartiles and median with outliers not shown. b, Comparison of ecDNA count by cohort and tumor purity bin for samples whose coverage is higher or equal to 10×. P-values are derived from a two-sided Mann–Whitney U test. TCGA includes all samples above the coverage cutoff. Tissues were only matched between PCAWG and HMF (at least 20 samples in both) because the TCGA sample size after coverage filtering was too small. Numbers on the bar indicate the sample number. Boxplots represent minimum (0th percentile), maximum (100th percentile), 1st and 3rd quartiles and median with outliers not shown. c, Cohort and sample selection overview for single time point analysis. d, Cohort and sample selection overview for multitime point analysis. Abbreviations are defined as follows: AA, AmpliconArchitect tool; ICGC, International Cancer Genome Consortium; AML, acute myeloid leukemia; SKCM, skin cutaneous melanoma; T1, first time point tumor; T2, second time point tumor; GLASS, the Glioma Longitudinal Analysis Consortium.
Extended Data Fig. 2
Extended Data Fig. 2. Additional data to sample and amplicon classification.
a, Overview of sample classification for the 2,071 primary and 3,170 advanced patients whose tumor sequencings are above purity and coverage cutoff, including all tumor types. Numbers in parentheses indicate number of tumor samples. b, Resampling analysis with replacement was repeated 1,000 times while maintaining sample count per tumor-type identical between primary cancer and advanced cancer cohorts in each resampled dataset to compare classification distributions shows a significant increase in the number of samples classified as ecDNA and ChrAmp, respectively, in the advanced cancer cohort, independent of tumor-type distribution. Empirical cumulative distributions (ECDF) of sample classification percentage using 1,000 re-sampled datasets. D represents Kolmogorov–Smirnov statistic. c,d, Percentage of ChrAmp samples (c) and the average number of distinct ChrAmp amplicons per sample (d) in primary and advanced cancer cohorts, showing tumor lineage represented by at least 20 tumors in both cohorts. P-values were computed using a one-sided binomial test with the ChrAmp-carrying tumor fraction in the primary cancer cohort as a null probability in c and using a one-sided Mann–Whitney U test in d. Not significant unless noted otherwise. Asterisks indicate level of significance: *1.00e−02 < p ≤ 5.00e−02; **1.00e−03 < p ≤ 1.00e−02; ***1.00e−04 < p ≤ 1.00e-03; ****p ≤ 1.00e−04. e, Distribution of primary and advanced sample classification stratified by tumor lineages each of which includes at least 20 tumors. Numbers in parentheses indicate the number of ecDNA samples and the total number of samples of that lineage.
Extended Data Fig. 3
Extended Data Fig. 3. Amplicon properties by amplicon class and oncogene presence.
a, Box plot showing amplicon complexity. b, Box plot showing amplicon DNA copy number. c, Box plot showing amplicon size. Numbers indicate number of amplicons. P-values were computed using a two-sided Mann–Whitney U test. Boxplots represent minimum (0th percentile), maximum (100th percentile), 1st and 3rd quartiles and median. For Extended Data Fig. 3b, outliers are not plotted. dg, Comparison of ecDNA amplicon count per patient between primary and advanced cohorts when further grouping patients according to measures of genomic instability, (d) including binned ploidy; (e) whole-genome duplication status; (f) microsatellite instability status; and (g) homologous recombination (HR) status. Numbers indicate number of patients. P-values were computed using a two-sided Mann–Whitney U test. Boxplots represent minimum (0th percentile), maximum (100th percentile), 1st and 3rd quartiles and median with outliers not shown. MSS, microsatellite stable; MSI, microsatellite Instable.
Extended Data Fig. 4
Extended Data Fig. 4. Comparison of ecDNA patient fractions when further grouping patients according to measures of genomic instability.
ad, Binned ploidy (a), whole-genome duplication status (b), microsatellite instability status (c) and homologous recombination status (d). Numbers in parentheses represent number of patients carrying ecDNA over all patients in the category. P values were calculated using a two-sided binomial with the ecDNA-carrying tumor category in the primary cohort as a null probability. e, Number of kataegis events normalized by the number of intervals present on ecDNA or ChrAmp amplicons between primary cancer and advanced cancer cohorts. Plots show log plus one transformed value on the y-axis. P values were calculated using a two-sided Mann–Whitney U test. f, Same but breast cancer samples only. Boxplots represent minimum (0th percentile), maximum (100th percentile), 1st and 3rd quartiles and median.
Extended Data Fig. 5
Extended Data Fig. 5. Additional data to clinical associations.
a, Multivariate Cox proportional hazards model, incorporating primary tumor locations, sex, age, whole-genome doubling status, microsatellite instability (MSI) status, homologous recombination (HR) status and tumor stage in primary and advanced cancer cohorts, showing that extrachromosomal DNA amplification resulted in the highest hazard ratio. The error bars represent the 95% confidence intervals of the hazard ratios. Asterisks indicate level of significance: *1.00 × 102 < p ≤ 5.00 × 10−2; **1.00 × 10−3 < p ≤ 1.00 × 10−2; ***1.00 × 10−4 < p ≤ 1.00 × 10−3. b, Distribution of primary, advanced untreated and advanced treated cohorts into ecDNA/ChrAmp/NoAmp categories. All tumors with available pretreatment information were included in the analysis. Y-axis represents category fractions. Numbers indicate patient counts. P-values were computed using a two-sided binomial test with the ecDNA-carrying tumor fraction in the primary cancer cohort as a null probability when comparing primary vs advanced untreated/treated and that in the advanced untreated cohort as a null probability when comparing advanced untreated vs advanced treated. c, Resampling analysis with replacement was repeated 1,000 times while maintaining sample count per tumor-type identical between primary cancer and advanced cancer untreated and advanced cancer treated cohorts, in each resampled dataset, to compare classification distributions. Empirical cumulative distributions of sample classification percentage using 1,000 re-sampled datasets. D represents Kolmogorov–Smirnov statistic (two-sided).
Extended Data Fig. 6
Extended Data Fig. 6. Effects of pretreatments on distributions of sample and amplicon classifications.
a, Distribution of ecDNA/ChrAmp/NoAmp tumors across the number of pretreatment a patient received. Numbers in parentheses indicate tumors with ecDNA/all tumors. P value was calculated using a two-sided Mann–Kendall trend test. b, Distribution of the number of distinct ecDNA amplicons pretreatment count (advanced cancers only). P value was calculated using a two-sided Mann–Kendall trend test. Points represent mean values and error bars show a 95% confidence interval. Only patients with available clinical information were included. Numbers indicate the number of patients. c, Distribution of ecDNA/ChrAmp/NoAmp tumors by consolidated pretreatment categories. Numbers in parentheses indicate tumors with ecDNA/all tumors. Only treatment types >50 patients are shown. P values were calculated using a two-sided binomial with the ecDNA-carrying tumor category in the untreated group as a null probability. d, Odds of tumors treated with targeted inhibitors to contain target oncogene on an ecDNA compared to tumors treated with targeted inhibitors lacking the amplified target, when compared to the background distribution calculated with the untreated primary tumors. e, EcDNA or ChrAmp amplicons by pretreatment mechanisms. Only treatments used in ≥10 patients were included. Samples were categorized solely based on whether they received chemotherapy of a specific mechanism, without considering other treatments including radiation. The points on the graph represent the mean, and the error bars indicate the standard error of the mean. The numbers shown at the bottom of the figure are sample sizes. P-values were calculated with two-sided Mann–Whitney U test. f, Sample classification (ecDNA, ChrAmp, NoAmp) in the advanced cohort by different pretreatment chemotherapy mechanisms. Only treatments used in ≥10 patients were included. Samples were categorized solely based on whether they received chemotherapy of a specific mechanism, without considering other treatments including radiation. As a result, the samples might have received multiple types of treatments. The p-value was calculated using a two-sided binomial test, with untreated samples serving as the reference for each chemotherapy mechanism. n.s., not significant.
Extended Data Fig. 7
Extended Data Fig. 7. Longitudinal analysis of sample classification.
a, Sankey plot showing sample classification based on amplicon status, over time. Color reflects amplicon-based sample classification and numbers indicate the number of samples. bf, Amplicon structure of five amplicons classified as ecDNA at tumor 1 (T1), and ChrAmp at tumor 2 (T2). All amplicon pairs showed a significant similarity score between T1 and T2, with T1 classified as ecDNA and T2 classified as ChrAmp. BFB, breakage fusion bridge.
Extended Data Fig. 8
Extended Data Fig. 8. Genomic characteristics of longitudinally retained amplicons.
ac, Complexity (a), DNA copy number (b) and amplicon size (c). P-values were computed using a two-sided Wilcoxon paired test. T1 and T2 represent a patient’s first time point tumor and second-time point tumor, respectively. Boxplots represent minimum (0th percentile), maximum (100th percentile), 1st and 3rd quartiles and median. n.s., not significant. d, The number of kataegis events is significantly higher in ecDNA amplicons compared to ChrAmp amplicons, at both time points. Numbers in parentheses indicate numbers of ecDNA or ChrAmp amplicons. Error bars represent the standard error (95% confidence interval) of the mean. P values were calculated using a two-sided Mann–Whitney U test.
Extended Data Fig. 9
Extended Data Fig. 9. Additional data to longitudinal amplicon analysis.
a, The fraction of ecDNA and ChrAmp amplicons with overlapping clustered mutations in the 1st tumor. Clustered mutations were further classified into ‘shared clustered mutations’ when two or more mutations in the clustered mutation event were retained in the 2nd tumor, ‘private clustered mutations’ when the clustered mutation event was detected in the 2nd tumor, and ‘no clustered mutations’ when no T1 clustered mutations were recovered in the T2 amplicon. b, The fraction of ecDNA and ChrAmp amplicons with overlapping clustered mutations in the 2nd tumor. Clustered mutations were further classified into ‘shared clustered mutations’ when two or more mutations in the clustered mutation event were retained in the 1st tumor, ‘private clustered mutations’ when the clustered mutation event was detected in the 1st tumor and ‘no clustered mutations’ when no T2 clustered mutations were recovered in the T1 amplicon. For a and b, statistical significance was assessed with chi-squared test for retained vs all others. c, The fraction of ecDNA and ChrAmp amplicons with overlapping clustered mutations in the 1st tumor. Numbers in parentheses indicate numbers of 1st tumor amplicon overlapping clustered mutations. d, The fraction of ecDNA and ChrAmp amplicons with overlapping clustered mutations in the 2nd tumor. Numbers in parentheses indicate numbers of 2nd tumor amplicon overlapping clustered mutations. P-values were computed using a chi-square test. n.s., not significant.
Extended Data Fig. 10
Extended Data Fig. 10. Additional data to variant allele fraction by mutational category.
a,b, Comparison of (a) variant allele fractions and (b) cancer cell fractions (of different mutational categories detected on longitudinally retained (shared) or disappeared/acquired (private) ChrAmp amplicons). Boxplots represent minimum (0th percentile), maximum (100th percentile), 1st and 3rd quartiles and median with outliers excluded. P values were calculated using a two-sided Mann–Whitney U test. n.s., not significant.

References

    1. Hanahan, D. Hallmarks of cancer: new dimensions. Cancer Discov.12, 31–46 (2022). - PubMed
    1. Seyfried, T. N. & Huysentruyt, L. C. On the origin of cancer metastasis. Crit. Rev. Oncog.18, 43–73 (2013). - PMC - PubMed
    1. Nguyen, B. et al. Genomic characterization of metastatic patterns from prospective clinical sequencing of 25,000 patients. Cell185, 563–575 (2022). - PMC - PubMed
    1. Martinez-Jimenez, F. et al. Pan-cancer whole-genome comparison of primary and metastatic solid tumours. Nature618, 333–341 (2023). - PMC - PubMed
    1. Albertson, D. G. Gene amplification in cancer. Trends Genet.22, 447–455 (2006). - PubMed

LinkOut - more resources