Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2026 Jan;649(8095):152-160.
doi: 10.1038/s41586-025-09764-8. Epub 2025 Nov 19.

Genetic elements promote retention of extrachromosomal DNA in cancer cells

Affiliations

Genetic elements promote retention of extrachromosomal DNA in cancer cells

Venkat Sankar et al. Nature. 2026 Jan.

Abstract

Extrachromosomal DNA (ecDNA) is a prevalent and devastating form of oncogene amplification in cancer1,2. Circular megabase-sized ecDNAs lack centromeres, stochastically segregate during cell division3-6 and persist over many generations. It has been more than 40 years since ecDNAs were first observed to hitchhike on mitotic chromosomes into daughter cell nuclei, but the mechanism underlying this process remains unclear3,7. Here we identify a family of human genomic elements, termed retention elements, that tether episomes to mitotic chromosomes to increase ecDNA transmission to daughter cells. Using Retain-seq, a genome-scale assay that we developed, we reveal thousands of human retention elements that confer generational persistence to heterologous episomes. Retention elements comprise a select set of CpG-rich gene promoters and act additively. Live-cell imaging and chromosome conformation capture show that retention elements physically interact with mitotic chromosomes at regions that are mitotically bookmarked by transcription factors and chromatin proteins. This activity intermolecularly recapitulates promoter-enhancer interactions. Multiple retention elements are co-amplified with oncogenes on individual ecDNAs in human cancers and shape their sizes and structures. CpG-rich retention elements are focally hypomethylated. Targeted cytosine methylation abrogates retention activity and leads to ecDNA loss, which suggests that methylation-sensitive interactions modulate episomal DNA retention. These results highlight the DNA elements and regulatory logic of mitotic ecDNA retention. Amplifications of retention elements promote the maintenance of oncogenic ecDNA across generations of cancer cells, and reveal the principles of episome immortality intrinsic to the human genome.

PubMed Disclaimer

Conflict of interest statement

Competing interests: H.Y.C. is an employee and stockholder of Amgen as of 16 December 2024. H.Y.C. is a co-founder of Accent Therapeutics, Boundless Bio, Cartography Biosciences and Orbital Therapeutics, and was an advisor of Arsenal Biosciences, Chroma Medicine, Exai Bio and Vida Ventures until 15 December 2024. P.S.M. is a co-founder and advisor of Boundless Bio. A.G.H. is a founder and shareholder of Econic Biosciences. M.G.J. is a consultant for and holds equity in Vevo Therapeutics. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Identification of genetic elements that promote episomal DNA retention.
a, Proposed mechanism of mitotic retention of ecDNAs in cancer cells through chromosome hitchhiking. b, Representative image of tethered (bottom arrowhead) and untethered (top arrowhead) ecDNA foci in mitotic PC3 cells (n = 92 daughter cell pairs). Scale bar, 10 µm. c, Representative live-cell images (n = 10 fields of view) showing ecDNA (labelled with TetR-mNeonGreen) colocalization with chromosomes during cancer cell division. Scale bar, 10 µm. d, Fractions of ecDNA with various oncogenes colocalizing with mitotic chromosomes in the following cancer cell lines: GBM39 glioblastoma cells, EGFR ecDNA from chromosome 7; PC3 prostate cancer cells, ecMYC from chromosome 8; SNU16 gastric cancer cells, ecMYC and FGFR2 ecDNA from chromosome 8 and chromosome 10, respectively; COLO320DM colorectal cancer cells, ecMYC. Raw images were obtained from a previous publication of IF–DNA-FISH of anaphase cells. e, Schematic of Retain-seq. f, Retain-seq enrichment of a known EBV sequence that promotes viral retention. EBNA1 ChIP–seq data in EBV-transformed GM12878 cells are shown at the bottom. g, Retain-seq signals at three representative enriched genomic loci. Red tracks represent loci that were significantly enriched in Retain-seq screens in the corresponding cell line, thus marking these loci as retention elements in that line; black tracks indicate that the sequence was not identified as a retention element in the corresponding experiment. h, Principal component analysis of Retain-seq in various cell lines at different time points. i, Individual validation by quantitative PCR (qPCR) of six episomally retained elements (RE-A–RE-F) identified by Retain-seq experiments in the K562 cell line and amplified on COLO320DM (RE-C) and GBM39 (others) ecDNAs. Each line in the plot for a given retention element represents a single replicate. The empty vector control is the pUC19 plasmid alone, whereas the random insert control comprises the pUC19 plasmid with random insert sequences from the genome of the human GM12878 cell line. P values were calculated using one-sided t-tests.
Fig. 2
Fig. 2. Sequence features of retention elements.
a, Analyses of sequence features of retention elements. b, Input-normalized Retain-seq signals across annotated gene sequences. TTS, transcription termination site. c, Sequence annotations that overlap with retention elements identified in K562 cells. Percentages represent the proportion of retention elements that overlap with a given annotation class. d, ENCODE candidate cis-regulatory elements (cCREs) that overlap with retention elements identified in K562 cells. Fractions represent the proportion of retention elements that overlap with a given cCRE class. e, ENCODE ChIP–seq signals of the indicated histone marks and RNA polymerases II and III in K562 cells that surround retention elements identified in the same cell line. f, CpG density surrounding the combined set of retention elements. g, Number of CpG sites in genomic bins that overlap with retention elements (n = 18,494) compared with those that do not (n = 2,543,727). Box centre, line median; limits, upper and lower quartiles; whiskers, 1.5× the interquartile range. h, Fraction of origins of replication (identified by SNS-seq in K562 cells) that overlap with retention elements identified in K562 cells and random genomic intervals. i, Retention of plasmids that contain one, two or three copies of a retention element (RE-C; red segments in schematic) in COLO320DM cells, analysed by qPCR. Fold changes were computed using plasmid levels at day 14 after transfection, normalizing to levels at day 2 to adjust for different transfection efficiencies across conditions (three biological replicates). j, Left, schematic of transfection of plasmids with a CMV promoter and/or a retention element (RE-C) into COLO320DM cells. Right, retention of plasmids that contain a CMV promoter and/or a retention element in COLO320DM cells, assessed by qPCR (three biological replicates). Data for two different plasmid backbones, pUC19 and pGL4, are shown. P values were computed using two-sided Wilcoxon rank-sum tests (g), one-sided hypergeometric tests (h) or one-sided t-tests (i,j). NS, not significant.
Fig. 3
Fig. 3. Retention elements promote extrachromosomal interactions with chromosomes during mitosis.
a, Schematic of the live-cell imaging experiment. b, Representative live-cell time-lapse images of dividing COLO320DM cells with labelled ecMYC after transfection with a plasmid containing a retention element or an empty vector control. Scale bar, 10 µm. c, Fraction of DNA signals not colocalizing with mitotic chromosomes during anaphase. n = 51 (control), n = 83 cells (retention element). Box plot parameters are as described in Fig. 2. d, Individual (left) and mean (right) cell trajectories of DNA signal colocalization with chromosomes throughout mitosis. n = 42 (control), n = 45 (retention element) cells. Mean cell trajectories include all time points with >3 cells. Error bars show the s.e.m. Vertical dashed lines indicate anaphase. e, Hi-C interaction maps in asynchronous or mitotically arrested COLO320DM cells. Numbers at bottom right below far right plots indicate maximum count values in corresponding color scales. Density plots show flow cytometry analyses of DNA content. f,g, APA of Hi-C data of asynchronous (f) and mitotically arrested (g) COLO320DM cells. Heatmaps are summed percentile matrices of pairwise interactions between chromosome bookmarked regions and a combined set of ecMYC retention elements with 5-kb resolution. h, Hi-C heatmap of pairwise interactions in mitotically arrested COLO320DM cells between ecMYC retention elements and chromosome bookmarked regions with ENCODE cCRE annotations. i, Mitotically bookmarked regions that overlap with retention elements or matched-size random genomic intervals. j, Cumulative distribution of retention elements that contain binding sites of bookmarking factors, ordered by factor enrichment relative to random genomic intervals. k, ecDNAchromosome interactions recapitulate enhancer–promoter interactions. Gene expression in interphase cells is activated by an interaction between enhancer (blue) and promoter (red) sequences on the same chromosome. We propose that ecDNA retention in mitotic cells is mediated by an analogous intermolecular contact between promoter-like retention elements (red) on ecDNA and enhancer-like, or less commonly, promoter-like bookmarked sites (blue) on the chromosome. P values were calculated using two-sided Wilcoxon rank-sum tests (c), two-sided paired t-tests (d) or two-sided Fisher’s exact tests (i).
Fig. 4
Fig. 4. Retention elements enable selection of oncogene-containing ecDNAs in cancer.
a, Mean frequency (>10 independent replicates) of cells with ≥1 ecDNA in simulations. Shaded area, s.e.m. b, Analysis of retention element co-amplification with oncogenes on ecDNA in patient tumours. c, ecDNA amplicons that contain retention elements and/or oncogenes. d, Top, schematic of an ecDNA segment without retention elements co-amplified with a retention element. Bottom, frequency of co-amplification with retention elements in BFB, ecDNA or linear amplicons for genomic segments without retention elements. e, Top to bottom, oncogene sizes on ecDNA, frequency of genomic segments that contain retention elements sorted by size, and total ecDNA amplicon sizes. f, Schematic of experiment to analyse the distribution of retention element numbers among ecDNAs. g, Correlation (Pearson’s R with 95% confidence intervals) between local density of retention elements (Methods) and amplicon size. The plot shows the linear fit using ordinary least squares with 95% confidence intervals. h, Circular microDNAs in five human cell lines that overlap with retention elements or matched-sized random genomic intervals detected using circle-seq. i, Increased WGS coverage of EGFR ecDNA in GBM39 cells and retention element positions. j, 5mC CpG methylation of retention elements (n = 9 segments) compared with matched-sized sequence intervals (n = 1,235 segments) in GBM39 ecDNA. k, 5mC methylation (Me+ or Me) and density of CpG sites surrounding a retention element on GBM39 ecDNA. l, Site-specific methylation of retention elements by CRISPRoff. m, Frequency of GBM39 cells that contain untethered ecDNA foci 5 days after transfection. n = 60 (nontargeting) and n = 50 (targeting) visual fields. Box plot parameters are as described in Fig. 2. n, Plasmid retention after methylation in COLO320DM cells, as assessed by qPCR (three biological replicates). o, Retention elements and oncogenes on ecDNA (left) confer retention and selection, respectively, two processes that shape the evolution of cancer cell lineages (right). P  values were calculated using one-sided tests of equal proportions (d), two-sided Fisher’s z-tests (g), two-sided Fisher’s exact tests (h), two-sided Wilcoxon rank-sum tests (j), two-sided Mann–Whitney–Wilcoxon tests (m) or one-sided t-tests (n).
Extended Data Fig. 1
Extended Data Fig. 1. Optimization of Retain-seq library preparation.
(a) Insert size distribution of genomic fragments included in the input mixed episome library. (b) Genome-wide coverage of sequenced reads derived from input episome library. (c) Left: Representative quantitative PCR amplification curves across varying amounts of episome library as PCR input. Right: Log-transformed mean normalized read counts of genomic bins ranked by percentile. Inset is a zoom-in of the higher-percentile genomic bins, in which a 100-fold range of DNA amounts from 0.1 ng – 10 ng of input showed highly comparable representation (despite some library dropout at 0.1 ng of input DNA) while 0.01 ng PCR input showed substantial library dropout and signs of skewing and was used to set the quality threshold for all library preparations. See Methods. (d) Log-transformed mean normalized read counts of genomic bins ranked by percentile. Inset is a zoom-in of the higher-percentile genomic bins showing that increasing PCR cycles during library preparation alters skewing of sequencing reads.
Extended Data Fig. 2
Extended Data Fig. 2. Distribution of Retain-seq reads across the genome and experimental replicates.
(a) Log-transformed mean normalized read counts of genomic bins ranked by percentile. Inset is a zoom-in of higher-percentile genomic bins showing that transfection, represented by the day 2 episome library, results in minimal dropout that does not substantially skew the sequence representation compared to the input episomal library. (b) Loss of genome-wide representation in episomal insert sequences relative to the input library over time in four cell lines assayed with Retain-seq. (c) Correlations between experimental replicates of Retain-seq across time points from different cell lines. (d) Correlation (Pearson’s R; error bands represent 95% confidence intervals) between the numbers of episomally retained elements and the sizes of their chromosomes of origin in experiments performed in various cell lines. (e) Correlation (Pearson’s R; error bands represent 95% confidence intervals) between the numbers of episomally retained elements and the sizes of their chromosomes of origin across all cell lines. (f) Distribution of genomic bin sizes containing retention elements (median 1 kb; s.d. 0.604 kb). (g) Retention of plasmids containing random genomic inserts, the EBV tethering sequence alone, or the entire EBV origin (containing both tethering and replicative sequences) compared to pUC19 in GM12878 cells (three biological replicates). Fold changes were computed using plasmid levels at day 14 post-transfection, normalizing to levels at day 2 to adjust for differential transfection efficiency across conditions. P-values computed by one-sided t-test.
Extended Data Fig. 3
Extended Data Fig. 3. Chromosomal integration events of transfected plasmids containing a retention element are stochastic and occur at near-background levels.
Genome-wide read coverage (non-overlapping 50 kb bins) and detection of chromosomal integration events (events per bin) of transfected plasmids in single-molecule long-read nanopore sequencing from cells transfected with either an empty plasmid vector (pUC19; top) or plasmid containing a retention element (pUC19_RE-C; bottom).
Extended Data Fig. 4
Extended Data Fig. 4. Many, but not all retention elements represent sites of active nascent transcription.
(a) Histograms and heatmaps of COLO320DM GRO-seq signal from biological replicate 1, computed over 50 bp bins within 3 kb of the midpoints of retention elements located within the genomic coordinates of the COLO320DM ecDNA. Retention elements were divided into 3 categories based on overlap with genomic annotations: those that overlap with coding gene promoters, other portions of coding genes, or noncoding regions. X-axis directionality is consistent for both strands. (b) Heatmap of COLO320DM GRO-seq signal from biological replicate 2 within 3 kb of the midpoints of retention elements located within the genomic coordinates of the COLO320DM ecDNA.
Extended Data Fig. 5
Extended Data Fig. 5. Additional sequence features of retention elements.
(a) ENCODE ChIP-seq signals of the indicated proteins in K562 cells surrounding retention elements identified in the same cell line. (b) ENCODE ChIP-seq signals of components of the replication licensing complex in K562 cells surrounding retention elements identified in the same cell line. (c) Motif enrichment (log2 fold change) of transcription factor motifs in retention element intervals identified in COLO320DM, GBM39, and K562 cells relative to random genomic intervals. (d) Episomal retention of plasmids containing 8 overlapping 500-bp tiles of a retention element (RE-C) in COLO320DM cells measured by quantitative PCR (six biological replicates for empty vector and retention element conditions, three for others). P-values computed by one-sided t-test.
Extended Data Fig. 6
Extended Data Fig. 6. Summary of COLO320DM live cell imaging line.
(a) Fraction of MYC ecDNA foci with overlapping TetO foci for each metaphase cell, indicating the percentage of labeled ecDNAs per cell (n = 20 cells). Box plot parameters as in Fig. 2. (b) Frequency of cells containing plasmid foci (either control or retention element plasmids) that colocalize with TetO-labeled ecDNA foci. n = 38 (control) and n = 46 (retention element) cells. P-value determined by one-sided hypergeometric test. (c) Percentages of plasmid foci area (either control or retention element plasmids) that colocalize with TetO-labeled ecDNA foci. n = 10 (control) and n = 12 (retention element) cells; only the subset of cells with plasmid foci that at least partially overlap with ecDNA foci are plotted here. Box plot parameters as in Fig. 2. P-value computed using a two-sample Wilcoxon test.
Extended Data Fig. 7
Extended Data Fig. 7. Chromatin interactions and functional annotations of chromosome bookmarked regions and ecMYC retention elements.
(a-b) Aggregated peak analysis (APA) of Hi-C data of asynchronous (a) and mitotically arrested (b) COLO320DM cells. Heatmaps are summed percentile matrices of pairwise interactions between previously reported chromosome bookmarked regions (Methods) and a combined set of retention elements identified on the MYC ecDNA with 5-kb resolution, in which the chromosome bookmarked regions and/or the ecMYC retention elements are randomized. (c) Chromosome bookmarked regions or ecMYC retention elements with the indicated ENCODE cCRE annotations. (d) Hi-C heatmap of pairwise interactions between the MYC ecDNA retention elements and chromosome bookmarked regions with the indicated ENCODE cCRE annotations in asynchronous cells. Hi-C counts are normalized to number of interactions as well as bin sizes. (e) APA of Hi-C data of asynchronous GBM39 cells. (f) Importance scores (error bars show s.e.m.) indicating the relative contribution of each bookmarking factor to the cumulative distribution of retention elements. Scores represent the mean incremental number of retention elements containing binding sites for each factor over 1000 randomized cumulative distributions of the 20 bookmarking factors shown. Bookmarking factors are displayed in order of ChIP-seq peak enrichment within retention elements relative to random genomic intervals. (g) Fraction of tethered ecDNAs following CRISPR/Cas9 knockouts of selected bookmarking factors in mitotic COLO320DM cells. Box plot parameters as in Fig. 2. n = 55 (SMARCE1 NTC1), n = 42 (SMARCE1 KO1), n = 39 (SMARCE1 KO2), n = 34 (HEY1 NTC2), n = 33 (HEY1 KO1), n = 8 (CHD1 NTC1), n = 36 (CHD1 KO1) cells. (h) Mean immunofluorescence intensity of selected bookmarking factors in cells receiving targeting guide RNAs or non-targeting control (NTC) guides. n = 1874 (SMARCE1 NTC1), n = 2217 (SMARCE1 KO1), n = 1371 (SMARCE1 KO2), n = 1459 (HEY1 NTC2), n = 1976 (HEY1 KO1), n = 316 (CHD1 NTC1), n = 2730 (CHD1 KO1) cells. Box plot parameters as in Fig. 2.
Extended Data Fig. 8
Extended Data Fig. 8. Evolutionary modeling of ecDNA retention and selection in growing cancer cell populations.
(a) Time-resolved simulated trajectories of ecDNA frequency and mean copy number (95% confidence intervals shaded) across 25 simulated time units with various selection and retention values. (b) Time-resolved simulated trajectories of ecDNA frequency and mean copy number (95% confidence intervals shaded) across 25 simulated time units stratified by the number of initial ecDNA copies present in the parental cell. Trajectories are reported for various levels of retention. Selection is fixed at 0.5.
Extended Data Fig. 9
Extended Data Fig. 9. Summary statistics of DNA amplifications identified in WGS data of patient tumor samples.
(a) Patient samples analyzed and classification of amplicons identified. (b) Number of genomic intervals implicated in each amplicon (i.e., degree of genomic rearrangement within an amplicon) across amplicon classes. n = 364 (BFB), n = 759 (ecDNA), and n = 1295 (linear) amplicons. Box plot parameters as in Fig. 2. P-values computed using two-sample Wilcoxon tests. (c) Amplicon widths (in bp) across amplicon classes. n = 364 (BFB), n = 759 (ecDNA), and n = 1295 (linear) amplicons. Box plot parameters as in Fig. 2. P-values computed using two-sample Wilcoxon tests. (d) Frequency of amplicons (left) or amplicon intervals (segments; right) containing at least one retention element across classes. P-values determined by one-sided hypergeometric test. (e) Top 10 oncogenes most frequently amplified as ecDNAs in analyzed patient samples. (f) Frequency of co-amplification of CDK4 (left) or EGFR (right) with neighboring retention elements (within 250 kb of gene midpoint) in observed ecDNA amplicons (below each plot) reconstructed from patient samples relative to corresponding oncogene-containing random genomic intervals drawn from an equivalent size distribution.
Extended Data Fig. 10
Extended Data Fig. 10. Hypomethylated CpG state is essential to retention element function.
(a) 5mC methylation status of individual CpG sites and their density within and surrounding retention elements on the EGFR ecDNA in GBM39 cells as measured in single-molecule long-read nanopore sequencing. (b) Viability of cells expressing CRISPRoff and a targeting guide cargo or non-targeting control over time. Cells were sorted at day 2 post-transfection and tracked until day 12, when no live targeted cells remained. Each line represents an independent biological replicate. (c) Counts of cells expressing CRISPRoff and a targeting guide cargo or non-targeting control guide RNA over time. Cells were sorted at day 2 post-transfection and tracked until day 12, when no live targeted cells remained. Each line represents an independent biological replicate. (d) Abundance of ecDNA following CpG methylation of retention elements by CRISPRoff at 5 days post-transfection compared to cells expressing a non-targeting control guide RNA in WGS coverage. (e) Representative image showing ecDNA foci lost from the nucleus in an interphase GBM39 cell 5 days after transfection with CRISPRoff and a guide cargo targeting retention elements (n = 50 image positions). Scale bar, 10 µm. (f) Abundance of nuclear ecDNA measured by nuclear EGFR DNA FISH signal at 5 days after transfection of CRISPRoff and guide cargo targeting retention elements compared to cells expressing a non-targeting control guide RNA. P-value computed using two-sided two-sample Kolmogorov-Smirnov test. (g) Mean cell trajectories of methylated retention element plasmid (n = 51 cells) or ecMYC DNA signal colocalization with chromosomes throughout mitosis. Mean cell trajectories include all time points with more than 3 cells. Measurements for the control and unmethylated retention element plasmid conditions are reproduced from Fig. 3d. Error bars show s.e.m. P-values determined by two-sided paired t-test of the means.

Update of

References

    1. Yan, X., Mischel, P. & Chang, H. Extrachromosomal DNA in cancer. Nat. Rev. Cancer24, 261–273 (2024). - DOI - PubMed
    1. Ilić, M., Zaalberg, I. C., Raaijmakers, J. A. & Medema, R. H. Life of double minutes: generation, maintenance, and elimination. Chromosoma131, 107–125 (2022). - DOI - PMC - PubMed
    1. Levan, A. & Levan, G. Have double minutes functioning centromeres? Hereditas88, 81–92 (1978). - DOI - PubMed
    1. Lundberg, G. et al. Binomial mitotic segregation of MYCN-carrying double minutes in neuroblastoma illustrates the role of randomness in oncogene amplification. PLoS ONE3, e3099 (2008). - DOI - PMC - PubMed
    1. Lange, J. T. et al. The evolutionary dynamics of extrachromosomal DNA in human cancers. Nat. Genet.54, 1527–1533 (2022). - DOI - PMC - PubMed

LinkOut - more resources