Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Feb 19;15(1):1515.
doi: 10.1038/s41467-024-45479-6.

Machine learning-based extrachromosomal DNA identification in large-scale cohorts reveals its clinical implications in cancer

Affiliations

Machine learning-based extrachromosomal DNA identification in large-scale cohorts reveals its clinical implications in cancer

Shixiang Wang et al. Nat Commun. .

Abstract

The clinical implications of extrachromosomal DNA (ecDNA) in cancer therapy remain largely elusive. Here, we present a comprehensive analysis of ecDNA amplification spectra and their association with clinical and molecular features in multiple cohorts comprising over 13,000 pan-cancer patients. Using our developed computational framework, GCAP, and validating it with multifaceted approaches, we reveal a consistent pan-cancer pattern of mutual exclusivity between ecDNA amplification and microsatellite instability (MSI). In addition, we establish the role of ecDNA amplification as a risk factor and refine genomic subtypes in a cohort from 1015 colorectal cancer patients. Importantly, our investigation incorporates data from four clinical trials focused on anti-PD-1 immunotherapy, demonstrating the pivotal role of ecDNA amplification as a biomarker for guiding checkpoint blockade immunotherapy in gastrointestinal cancer. This finding represents clinical evidence linking ecDNA amplification to the effectiveness of immunotherapeutic interventions. Overall, our study provides a proof-of-concept of identifying ecDNA amplification from cancer whole-exome sequencing (WES) data, highlighting the potential of ecDNA amplification as a valuable biomarker for facilitating personalized cancer treatment.

PubMed Disclaimer

Conflict of interest statement

R.H.X, Q.Z. and S.W. declare patent applications for “Gene-level focal amplification modeling and cancer typing for extrachromosomal DNA characterization” (P. R. China application serial number 202211067952.6). All other authors declare no potential competing interests.

Figures

Fig. 1
Fig. 1. Framework of whole-exome sequencing data-oriented cancer extrachromosomal DNA amplification identification, evaluation, analysis, and application.
a Schematic diagram of the study. b Features and their importance of final constructed XGBOOST model for ecDNA cargo gene prediction. XGBOOST modeling with 11 features was repeated 1000 times independently to determine the final hyperparameters. c Performance estimation (auPRC, area under precision-recall curve; data are presented as mean +/– SD) for final ecDNA cargo gene prediction model under training and evaluation processes with stratified group k-fold cross-validation (k is 10 here). The dotted line indicates the stop iteration by early stopping approach (the performance does not improve for 10 rounds afterwards). Tumor sample size n = 386. d Performance scores auPRC, auROC (area under receiver operating characteristic curve), precision, sensitivity, and specificity of sample level ecDNA amplification identification. Source data are provided as a Source Data file. XGBOOST, eXtreme Gradient Boosting. total_cn, total copy number. minor_cn, copy number of minor allele. cna_burden, copy number alteration burden. pLOH genome percentage with loss of heterozygosity. AScore aneuploidy score.
Fig. 2
Fig. 2. Evaluation of extrachromosomal DNA amplification identification on cancer cell line genomes.
GCAP validation in two known ecDNA+ cancer cell lines a SNU16 and b PC3. The top panels show probe settings and result images of DNA metaphase FISH experiments targeting genes MYC and FGFR2. FISH result of FGFR2 in PC3 represents a naturally negative control. The scale bar used in the figure is 10 micrometers. The middle panels show structural variant view of AmpliconArchitect (AA) reconstructions from WGS data of SNU16 and PC3. The bottom panels show Circle-Seq read density (measured as the number of reads overlapping every one-megabase window) in corresponding chromosomes. c MYC and FGFR2 gene copy number in SNU16 and PC3 by qRT-PCR. d Concordance of copy number estimation by qRT-PCR and WES with six selected genes in SNU16 and PC3. Linear regression lines, point estimates of two-sided Pearson correlation coefficient test and their 95% confidence level intervals are presented. e Copy number profiles and extrachromosomal DNA segment links of a gastric cancer. For better visualization, only chr6 and chr17, which show ecDNA amplifications, are plotted in the Circos plot. The first and second tracks represent the total copy number of tumor tissue and patient-derived xenograft model samples. The inner track represents the extrachromosomal DNA segment links. f, g Comparison between AmpliconArchitect (AA) and GCAP for extrachromosomal DNA amplification on WGS data of two cancer cell line batches. Source data are provided as a Source Data file.
Fig. 3
Fig. 3. Consistent recapture of ecDNA associated survival outcomes and genomic alteration patterns in pan-cancer databases.
a Kaplan-Meier overall survival curve comparison between different TCGA focal amplification subtypes. The exact log-rank test P value is 2.56e-50. Forest plots of multivariable (b) overall survival and (c) progression free survival Cox regression analysis for focal amplification subtypes with cancer type as confounding factor in TCGA. The point estimations of hazard ratio derived from Cox regression test and their corresponding 95% confidence level intervals (error bars) are presented. d Kaplan-Meier overall survival curve comparison between different PCAWG focal amplification subtypes. The exact log-rank test P value is 5.08e-38. e Forest plot of multivariable overall survival Cox regression analysis for focal amplification subtypes with cancer type as confounding factor in PCAWG. The point estimations of hazard ratio derived from Cox regression test and their corresponding 95% confidence level intervals (error bars) are presented. Comparison of APOBEC associated mutations between different (f) TCGA and (g) PCAWG focal amplification subtypes. Comparison of copy number signature CN8 contributions between different h TCGA and i PCAWG focal amplification subtypes. Comparison of copy number segments between different (j) TCGA and (k) PCAWG focal amplification subtypes. Comparison of tumor purity between different (l) TCGA and (m) PCAWG focal amplification subtypes. The P values of comparisons in fm were evaluated by two-sided Mann–Whitney test, with multiple comparison adjusted by FDR approach. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. Genome-wide distribution of focal amplifications and extrachromosomal DNA associated oncogenes.
Genome-wide distribution of amplification peaks by focal amplification class in (a) TCGA and (b) PCAWG. Genomic cytobands with higher frequent circular amplification are highlighted by at most three representative oncogenes, with the count of circular amplification occurrence shown in parentheses. Here, to visualize genome-scale signals, we calculated the average signals within 1MB windows. When a spot exhibits both circular and noncircular signals, one signal may appear to be shadowed by the other. Consequently, a region within a window can only be colored in either red or blue, or grey (for none), but not simultaneously in both red and blue. To differentiate between circular and noncircular signals in such cases, readers should refer to the two bar plots on the right side of the heatmap. c Copy number of oncogenes versus the fold change in TPM (transcript per million) upper quartile for all oncogenes on circular and noncircular amplification types. The fold change in TPM upper quartile is computed as the oncogene’s TPM upper quartile + 1 divided by the average of TPM upper quartile + 1 for the same oncogene in all other tumor samples from the same cohort for which the oncogene was not amplified. Linear regression lines, using fold change=m × CN + b, point estimates of two-sided Pearson correlation coefficient test and their 95% confidence level intervals (in gray) are shown for each focal amplification class. This calculation is same as previously described. The oncogene list was derived from Oncogene database (http://ongene.bioinfo-minzhao.org/). The two constructed linear models were compared by ANOVA analysis with F test. d Gene expression distribution versus extrachromosomal DNA amplification or not for top 50 ecDNA associated oncogenes. Source data are provided as a Source Data file.
Fig. 5
Fig. 5. Focal amplification typing on colorectal cancer predicts patient overall survival and yields refined genomic subtypes with distinct mutational processes.
a Kaplan-Meier overall survival curve comparison between different focal amplification subtypes. Log-rank test P value is shown. b Forest plot of multivariable overall survival Cox regression analysis for focal amplification subtypes with reported SYSUCC subtypes and other common clinical variables as confounding factors. The point estimations of hazard ratio derived from Cox regression test and their corresponding 95% confidence level intervals (error bars) are presented. c Combination table of existing SYSUCC genomic subtypes and focal amplification subtypes. d Forest plot and hazard ratios of univariable overall survival Cox regression analysis for six genomic subtypes with hypermutation group (HM) as reference. The point estimations of hazard ratio derived from Cox regression test and their corresponding 95% confidence level intervals (error bars) are presented. e Mutation ratio of TP53 in the newly established SYSUCC genomic subtypes. The P value estimated by Chi-squared test is reported here for showing the association between TP53 mutation status and SYSUCC genomic subtypes. f Comparing APOBEC-associated mutations among the newly established SYSUCC genomic subtypes. g Comparing copy number signature CN8 contributions among the newly established SYSUCC genomic subtypes. The P values for comparisons between the CIN-HR&Circ group and other groups in f, g were evaluated by two-sided Mann–Whitney test, with multiple comparison adjusted by FDR approach. HM HyperMutated, GS Genome Stable, CIN-LR Chromosomal INstability with Low survival Risk, CIN-HR Chromosomal INstability with High survival Risk, CIN-Mild chromosomal instability with mild risk, CN-Quiet copy number quiet, Non-Circ noncircular amplification dominant, CIN-HR|Circ either chromosomal instability with high risk or circular amplification presents, CIN-HR&Circ both chromosomal instability with high risk and circular amplification present. Source data are provided as a Source Data file.
Fig. 6
Fig. 6. Extrachromosomal DNA amplification is prognostic of overall survival in anti-PD1 monotherapy.
Kaplan-Meier overall survival (OS) curve comparisons between patients with ecDNA amplification and patients without ecDNA amplification in a the SYSUCC advanced gastric cancer cohort and (b) the SYSUCC nasopharyngeal carcinoma cohort. Patients in both two cancer cohorts are treated with anti-PD-1 drug toripalimab. Log-rank test P value and hazard ratio are shown. c–d Forest plots of multivariable overall survival Cox regression analysis for focal amplification subtypes with known immunotherapy biomarkers and other variables as control factors in (a) the SYSUCC nasopharyngeal carcinoma cohort and b the SYSUCC nasopharyngeal carcinoma cohort. The point estimations of hazard ratio derived from Cox regression test and their corresponding 95% confidence level intervals (error bars) are presented. TMB_status tumor mutation burden status. Here median as a cutoff is adopted for classifying TMB and aneuploidy into high and low groups. Source data are provided as a Source Data file.

References

    1. Ha L, Jh S. The chromosomal complement of human solid tumors. II. Karyotypes of glial tumors. J. Neurosurg. 1965;22:160–168. doi: 10.3171/jns.1965.22.2.0160. - DOI - PubMed
    1. D C, C Y, Ai S. Minute chromatin bodies in malignant tumours of childhood. Lancet Lond. Engl. 1965;1:55–58. - PubMed
    1. Wu S, Bafna V, Chang HY, Mischel PS. Extrachromosomal DNA: An emerging hallmark in human cancer. Annu. Rev. Pathol. Mech. Dis. 2022;17:854744578. doi: 10.1146/annurev-pathmechdis-051821-114223. - DOI - PMC - PubMed
    1. Bafna V, Mischel PS. Extrachromosomal DNA in Cancer. Annu. Rev. Genomics Hum. Genet. 2022;23:annurev-genom-120821–100535. doi: 10.1146/annurev-genom-120821-100535. - DOI - PMC - PubMed
    1. Bergstrom, E. N. et al. Mapping clustered mutations in cancer reveals APOBEC3 mutagenesis of ecDNA. Nature (2022) 10.1038/s41586-022-04398-6. - PMC - PubMed