Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Oct 12:2025.10.10.681495.
doi: 10.1101/2025.10.10.681495.

Genetic elements promote retention of extrachromosomal DNA in cancer cells

Affiliations

Genetic elements promote retention of extrachromosomal DNA in cancer cells

Venkat Sankar et al. bioRxiv. .

Update in

Abstract

Extrachromosomal DNA (ecDNA) is a prevalent and devastating form of oncogene amplification in cancer1,2. Circular megabase-sized ecDNAs lack centromeres and segregate stochastically during cell division3-6 yet persist over many generations. EcDNAs were first observed to hitchhike on mitotic chromosomes into daughter cell nuclei over 40 years ago with unknown mechanism3,7. Here we identify a family of human genomic elements, termed retention elements, that tether episomes to mitotic chromosomes to increase ecDNA transmission to daughter cells. We develop Retain-seq, a genome-scale assay that reveals thousands of human retention elements conferring generational persistence to heterologous episomes. Retention elements comprise a select set of CpG-rich gene promoters and act additively. Live-cell imaging and chromatin conformation capture show that retention elements physically interact with mitotic chromosomes at regions which are mitotically bookmarked by transcription factors and chromatin proteins, intermolecularly recapitulating promoter-enhancer interactions. Multiple retention elements are co-amplified with oncogenes on individual ecDNAs in human cancers and shape their sizes and structures. CpG-rich retention elements are focally hypomethylated; targeted cytosine methylation abrogates retention activity and leads to ecDNA loss, suggesting that methylation-sensitive interactions modulate episomal DNA retention. These results highlight the DNA elements and regulatory logic of mitotic ecDNA retention. Amplifications of retention elements promote the maintenance of oncogenic ecDNA across generations of cancer cells, revealing the principles of episome immortality intrinsic to the human genome.

PubMed Disclaimer

Conflict of interest statement

Competing Interests H.Y.C. is an employee and stockholder of Amgen as of Dec. 16, 2024. H.Y.C. is a co-founder of Accent Therapeutics, Boundless Bio, Cartography Biosciences, Orbital Therapeutics, and was an advisor of Arsenal Biosciences, Chroma Medicine, Exai Bio and Vida Ventures until Dec. 15, 2024. P.S.M. is a co-founder and advisor of Boundless Bio. A.G.H. is a founder and shareholder of Econic Biosciences. M.G.J. is a consultant for and holds equity in Vevo Therapeutics. The remaining authors declare no competing interests.

Figures

Extended Data Figure 1.
Extended Data Figure 1.. Optimization of Retain-seq library preparation.
(a) Insert size distribution of genomic fragments included in the input mixed episome library. (b) Genome-wide coverage of sequenced reads derived from input episome library. (c) Left: Representative quantitative PCR amplification curves across varying amounts of episome library as PCR input. Right: Log-transformed mean normalized read counts of genomic bins ranked by percentile. Inset is a zoom-in of the higher-percentile genomic bins, in which a 100-fold range of DNA amounts from 0.1 ng – 10 ng of input showed highly comparable representation (despite some library dropout at 0.1 ng of input DNA) while 0.01 ng PCR input showed substantial library dropout and signs of skewing and was used to set the quality threshold for all library preparations. See Methods. (d) Log-transformed mean normalized read counts of genomic bins ranked by percentile. Inset is a zoom-in of the higher-percentile genomic bins showing that increasing PCR cycles during library preparation alters skewing of sequencing reads.
Extended Data Figure 2.
Extended Data Figure 2.. Distribution of Retain-seq reads across the genome and experimental replicates.
(a) Log-transformed mean normalized read counts of genomic bins ranked by percentile. Inset is a zoom-in of higher-percentile genomic bins showing that transfection, represented by the day 2 episome library, results in minimal dropout that does not substantially skew the sequence representation compared to the input episomal library. (b) Loss of genome-wide representation in episomal insert sequences relative to the input library over time in four cell lines assayed with Retain-seq. (c) Correlations between experimental replicates of Retain-seq across time points from different cell lines. (d) Correlation (Pearson’s R; error bands represent 95% confidence intervals) between the numbers of episomally retained elements and the sizes of their chromosomes of origin in experiments performed in various cell lines. (e) Correlation (Pearson’s R; error bands represent 95% confidence intervals) between the numbers of episomally retained elements and the sizes of their chromosomes of origin across all cell lines. (f) Distribution of genomic bin sizes containing retention elements (median 1 kb; s.d. 0.604 kb). (g) Retention of plasmids containing random genomic inserts, the EBV tethering sequence alone, or the entire EBV origin (containing both tethering and replicative sequences) compared to pUC19 in GM12878 cells (three biological replicates). Fold changes were computed using plasmid levels at day 14 post-transfection, normalizing to levels at day 2 to adjust for differential transfection efficiency across conditions. P-values computed by one-sided t-test.
Extended Data Figure 3.
Extended Data Figure 3.. Chromosomal integration events of transfected plasmids containing a retention element are stochastic and occur at near-background levels.
Genome-wide read coverage (non-overlapping 50 kb bins) and detection of chromosomal integration events (events per bin) of transfected plasmids in single-molecule long-read nanopore sequencing from cells transfected with either an empty plasmid vector (pUC19; top) or plasmid containing a retention element (pUC19_RE-C; bottom).
Extended Data Figure 4.
Extended Data Figure 4.. Many, but not all retention elements represent sites of active nascent transcription.
(a) Histograms and heatmaps of COLO320DM GRO-seq signal from biological replicate 1, computed over 50 bp bins within 3 kb of the midpoints of retention elements located within the genomic coordinates of the COLO320DM ecDNA. Retention elements were divided into 3 categories based on overlap with genomic annotations: those that overlap with coding gene promoters, other portions of coding genes, or noncoding regions. X-axis directionality is consistent for both strands. (b) Heatmap of COLO320DM GRO-seq signal from biological replicate 2 within 3 kb of the midpoints of retention elements located within the genomic coordinates of the COLO320DM ecDNA.
Extended Data Figure 5.
Extended Data Figure 5.. Additional sequence features of retention elements.
(a) ENCODE ChIP-seq signals of the indicated proteins in K562 cells surrounding retention elements identified in the same cell line. (b) ENCODE ChIP-seq signals of components of the replication licensing complex in K562 cells surrounding retention elements identified in the same cell line. (c) Motif enrichment (log2 fold change) of transcription factor motifs in retention element intervals identified in COLO320DM, GBM39, and K562 cells relative to random genomic intervals. (d) Episomal retention of plasmids containing 8 overlapping 500-bp tiles of a retention element (RE-C) in COLO320DM cells measured by quantitative PCR (six biological replicates for empty vector and retention element conditions, three for others). P-values computed by one-sided t-test.
Extended Data Figure 6.
Extended Data Figure 6.. Summary of COLO320DM live cell imaging line.
(a) Fraction of MYC ecDNA foci with overlapping TetO foci for each metaphase cell, indicating the percentage of labeled ecDNAs per cell (n = 20 cells). Box plot parameters as in Fig. 2. (b) Frequency of cells containing plasmid foci (either control or retention element plasmids) that colocalize with TetO-labeled ecDNA foci. n = 38 (control) and n = 46 (retention element) cells. P-value determined by one-sided hypergeometric test. (c) Percentages of plasmid foci area (either control or retention element plasmids) that colocalize with TetO-labeled ecDNA foci. n = 10 (control) and n = 12 (retention element) cells; only the subset of cells with plasmid foci that at least partially overlap with ecDNA foci are plotted here. Box plot parameters as in Fig. 2. P-value computed using a two-sample Wilcoxon test.
Extended Data Figure 7.
Extended Data Figure 7.. Chromatin interactions and functional annotations of chromosome bookmarked regions and ecMYC retention elements.
(a-b) Aggregated peak analysis (APA) of Hi-C data of asynchronous (a) and mitotically arrested (b) COLO320DM cells. Heatmaps are summed percentile matrices of pairwise interactions between previously reported chromosome bookmarked regions (Methods) and a combined set of retention elements identified on the MYC ecDNA with 5-kb resolution, in which the chromosome bookmarked regions and/or the ecMYC retention elements are randomized. (c) Chromosome bookmarked regions or ecMYC retention elements with the indicated ENCODE cCRE annotations. (d) Hi-C heatmap of pairwise interactions between the MYC ecDNA retention elements and chromosome bookmarked regions with the indicated ENCODE cCRE annotations in asynchronous cells. Hi-C counts are normalized to number of interactions as well as bin sizes. (e) APA of Hi-C data of asynchronous GBM39 cells. (f) Importance scores (error bars show s.e.m.) indicating the relative contribution of each bookmarking factor to the cumulative distribution of retention elements. Scores represent the mean incremental number of retention elements containing binding sites for each factor over 1000 randomized cumulative distributions of the 20 bookmarking factors shown. Bookmarking factors are displayed in order of ChIP-seq peak enrichment within retention elements relative to random genomic intervals. (g) Fraction of tethered ecDNAs following CRISPR/Cas9 knockouts of selected bookmarking factors in mitotic COLO320DM cells. Box plot parameters as in Fig. 2. n = 55 (SMARCE1 NTC1), n = 42 (SMARCE1 KO1), n = 39 (SMARCE1 KO2), n = 34 (HEY1 NTC2), n = 33 (HEY1 KO1), n = 8 (CHD1 NTC1), n = 36 (CHD1 KO1) cells. (h) Mean immunofluorescence intensity of selected bookmarking factors in cells receiving targeting guide RNAs or non-targeting control (NTC) guides. n = 1874 (SMARCE1 NTC1), n = 2217 (SMARCE1 KO1), n = 1371 (SMARCE1 KO2), n = 1459 (HEY1 NTC2), n = 1976 (HEY1 KO1), n = 316 (CHD1 NTC1), n = 2730 (CHD1 KO1) cells. Box plot parameters as in Fig. 2.
Extended Data Figure 8.
Extended Data Figure 8.. Evolutionary modeling of ecDNA retention and selection in growing cancer cell populations.
(a) Time-resolved simulated trajectories of ecDNA frequency and mean copy number (95% confidence intervals shaded) across 25 simulated time units with various selection and retention values. (b) Time-resolved simulated trajectories of ecDNA frequency and mean copy number (95% confidence intervals shaded) across 25 simulated time units stratified by the number of initial ecDNA copies present in the parental cell. Trajectories are reported for various levels of retention. Selection is fixed at 0.5.
Extended Data Figure 9.
Extended Data Figure 9.. Summary statistics of DNA amplifications identified in WGS data of patient tumor samples.
(a) Patient samples analyzed and classification of amplicons identified. (b) Number of genomic intervals implicated in each amplicon (i.e., degree of genomic rearrangement within an amplicon) across amplicon classes. n = 364 (BFB), n = 759 (ecDNA), and n = 1295 (linear) amplicons. Box plot parameters as in Fig. 2. P-values computed using two-sample Wilcoxon tests. (c) Amplicon widths (in bp) across amplicon classes. n = 364 (BFB), n = 759 (ecDNA), and n = 1295 (linear) amplicons. Box plot parameters as in Fig. 2. P-values computed using two-sample Wilcoxon tests. (d) Frequency of amplicons (left) or amplicon intervals (segments; right) containing at least one retention element across classes. P-values determined by one-sided hypergeometric test. (e) Top 10 oncogenes most frequently amplified as ecDNAs in analyzed patient samples. (f) Frequency of co-amplification of CDK4 (left) or EGFR (right) with neighboring retention elements (within 250 kb of gene midpoint) in observed ecDNA amplicons (below each plot) reconstructed from patient samples relative to corresponding oncogene-containing random genomic intervals drawn from an equivalent size distribution.
Extended Data Figure 10.
Extended Data Figure 10.. Hypomethylated CpG state is essential to retention element function.
(a) 5mC methylation status of individual CpG sites and their density within and surrounding retention elements on the EGFR ecDNA in GBM39 cells as measured in single-molecule long-read nanopore sequencing. (b) Viability of cells expressing CRISPRoff and a targeting guide cargo or non-targeting control over time. Cells were sorted at day 2 post-transfection and tracked until day 12, when no live targeted cells remained. Each line represents an independent biological replicate. (c) Counts of cells expressing CRISPRoff and a targeting guide cargo or non-targeting control guide RNA over time. Cells were sorted at day 2 post-transfection and tracked until day 12, when no live targeted cells remained. Each line represents an independent biological replicate. (d) Abundance of ecDNA following CpG methylation of retention elements by CRISPRoff at 5 days post-transfection compared to cells expressing a non-targeting control guide RNA in WGS coverage. (e) Representative image showing ecDNA foci lost from the nucleus in an interphase GBM39 cell 5 days after transfection with CRISPRoff and a guide cargo targeting retention elements (n = 50 image positions). Scale bar, 10 μm. (f) Abundance of nuclear ecDNA measured by nuclear EGFR DNA FISH signal at 5 days after transfection of CRISPRoff and guide cargo targeting retention elements compared to cells expressing a non-targeting control guide RNA. P-value computed using two-sided two-sample Kolmogorov-Smirnov test. (g) Mean cell trajectories of methylated retention element plasmid (n = 51 cells) or ecMYC DNA signal colocalization with chromosomes throughout mitosis. Mean cell trajectories include all time points with more than 3 cells. Measurements for the control and unmethylated retention element plasmid conditions are reproduced from Figure 3d. Error bars show s.e.m. P-values determined by two-sided paired t-test of the means.
Figure 1.
Figure 1.. Identification of genetic elements that promote episomal DNA retention.
(a) Hypothesis of mitotic retention of ecDNAs in cancer cells via chromosome hitchhiking. (b) Representative image of tethered (bottom arrow) and untethered (top arrow) ecDNA foci in mitotic PC3 cells (n = 92 daughter cell pairs). Scale bar, 10 μm. (c) Representative live-cell images (n = 10 fields of view) showing ecDNA (labeled with TetR-mNeonGreen) colocalization with chromosomes during cancer cell division. Scale bar, 10 μm. (d) Fractions of ecDNAs with various oncogenes colocalizing with mitotic chromosomes in cancer cell lines (glioblastoma GBM39, EGFR ecDNA from chromosome 7; prostate cancer PC3, MYC ecDNA from chromosome 8; gastric cancer SNU16, MYC and FGFR2 ecDNAs from chromosome 8 and chromosome 10, respectively; colorectal cancer COLO320DM, MYC ecDNA (ecMYC); raw images obtained from a previous publication) in IF-DNA-FISH of anaphase cells. (e) Schematic diagram of Retain-seq. (f) Retain-seq enrichment of a known EBV sequence that promotes viral retention, with EBNA-1 ChIP-seq in the EBV-transformed GM12878 cells below. (g) Retain-seq signal at three representative enriched genomic loci. Red tracks represent loci that were significantly enriched in Retain-seq screens in the corresponding cell line, thus marking these loci as retention elements in that line; black tracks indicate that the sequence was not identified as a retention element in the corresponding experiment. (h) Principal component analysis of Retain-seq in various cell lines at different time points. (i) Individual validation by quantitative PCR of six episomally retained elements identified by Retain-seq experiments in the K562 cell line and amplified on the COLO320DM (RE-C) and GBM39 (others) ecDNAs. Each line in the plot for a given retention element represents a single replicate. The empty vector control is the pUC19 plasmid alone, while the random inserts control comprises the pUC19 plasmid with random insert sequences from the genome of the human GM12878 cell line. P-values determined by one-sided t-test.
Figure 2.
Figure 2.. Sequence features of retention elements.
(a) Analyses of sequence features of retention elements. (b) Input-normalized Retain-seq signal across annotated gene sequences. TSS, transcription start site; TTS, transcription termination site. (c) Sequence annotations overlapping with retention elements identified in K562 cells. Percentages represent the proportion of retention elements overlapping with a given annotation class. (d) ENCODE candidate cis-Regulatory Elements (cCREs) overlapping with retention elements identified in K562 cells. Fractions represent the proportion of retention elements overlapping with a given cCRE class. (e) ENCODE ChIP-seq signals of the indicated histone marks and RNA polymerase II and III in K562 cells surrounding retention elements identified in the same cell line. (f) CpG density surrounding the combined set of retention elements. (g) Number of CpG sites in genomic bins overlapping with retention elements (n = 18494) compared to those that do not (n = 2543727). Box center line median; limits, upper and lower quartiles; whiskers, 1.5× interquartile range. P-value computed by two-sided Wilcoxon rank-sums test. (h) Fraction of origins of replication (identified by SNS-seq in K562 cells) overlapping with retention elements identified in K562 cells and random genomic intervals. P-value determined by one-sided hypergeometric test. (i) Retention of plasmids containing one, two or three copies of a retention element (RE-C; red segments in schematic) in COLO320DM cells by quantitative PCR. Fold changes were computed using plasmid levels at day 14 post-transfection, normalizing to levels at day 2 to adjust for differential transfection efficiency across conditions (three biological replicates). P-values computed using one-sided t-test. (j) Left: transfection of plasmids with a CMV promoter and/or a retention element (RE-C) into COLO320DM cells. Right: retention of plasmids containing a CMV promoter and/or a retention element in COLO320DM cells by quantitative PCR (three biological replicates). Data for two different plasmid backbones, pUC19 and pGL4, are shown. P-values computed using one-sided t-test.
Figure 3.
Figure 3.. Retention elements promote extrachromosomal interactions with chromosomes during mitosis.
(a) Live-cell imaging experiment schematic. (b) Representative live-cell time-lapse images of dividing COLO320DM cells with labeled ecMYC following transfection with plasmid containing a retention element or empty vector control. Scale bar, 10 μm. (c) Fraction of DNA signal not colocalizing with mitotic chromosomes during anaphase. n = 51 (control), n = 83 cells (retention element). Box plot parameters as in Fig. 2. P-values by two-sided Wilcoxon rank-sums test. (d) Individual (left) and mean (right) cell trajectories of DNA signal colocalization with chromosomes throughout mitosis. n = 42 (control), n = 45 (retention element) cells. Mean cell trajectories include all time points with > 3 cells. Error bars show s.e.m. P-values by two-sided paired t-test. (e) Hi-C interaction maps in asynchronous or mitotically arrested COLO320DM cells. Density plots show flow cytometric analysis of DNA content. (f,g) Aggregated peak analysis (APA) of Hi-C data of asynchronous (f) and mitotically arrested (g) COLO320DM cells. Heatmaps are summed percentile matrices of pairwise interactions between chromosome bookmarked regions and a combined set of ecMYC retention elements with 5-kb resolution. (h) Hi-C heatmap of pairwise interactions in mitotically arrested COLO320DM cells between ecMYC retention elements and chromosome bookmarked regions with ENCODE cCRE annotations. (i) Mitotically bookmarked regions overlapping with retention elements or matched-size random genomic intervals. P-values by two-sided Fisher’s Exact Test. (j) Cumulative distribution of retention elements containing binding sites of bookmarking factors, ordered by factor enrichment relative to random genomic intervals. (k) ecDNA-chromosome interactions recapitulate enhancer-promoter interactions. While gene expression in interphase cells is activated by an interaction between enhancer (blue) and promoter (red) sequences on the same chromosome, we hypothesize that ecDNA retention in mitotic cells is mediated by an analogous intermolecular contact between promoter-like retention elements (red) on ecDNA and enhancer-like, or less commonly, promoter-like bookmarked sites (blue) on the chromosome.
Figure 4.
Figure 4.. Retention elements enable selection of oncogene-carrying ecDNAs in cancer.
(a) Mean frequency (over 10 independent replicates) of cells carrying ≥1 ecDNA in simulations. Shaded area, s.e.m. (b) Analysis of retention element co-amplification with oncogenes on ecDNA in patient tumors. (c) ecDNA amplicons containing retention elements and/or oncogenes. (d) Top: an ecDNA segment lacking retention elements co-amplified with a retention element. Bottom: frequency of co-amplification with retention elements within BFB, ecDNA, or linear amplicons for genomic segments lacking retention elements. One-sided test of equal proportions. (e) Top to bottom: oncogene sizes on ecDNA; frequency of genomic segments containing retention elements sorted by size; total ecDNA amplicon sizes. (f) Distribution of retention element numbers among ecDNAs. (g) Correlation (Pearson’s R; 95% confidence intervals) between local density of retention elements (Methods) and amplicon size. P-values by two-sided Fisher’s z-test. Plot: Linear fit (OLS) with 95% confidence intervals. (h) Circular microDNAs in five human cell lines overlapping with retention elements or matched-size random genomic intervals. Two-sided Fisher’s Exact Test. (i) Elevated WGS coverage of EGFR ecDNA in GBM39 cells and retention element positions. (j) 5mC CpG methylation of retention elements (n = 9 segments) compared to matched-size sequence intervals (n = 1235 segments) within the GBM39 ecDNA. Two-sided Wilcoxon rank-sums test. (k) 5mC methylation and density of CpG sites surrounding a retention element on the GBM39 ecDNA. (l) Site-specific methylation of retention elements by CRISPRoff. (m) Frequency of GBM39 cells containing untethered ecDNA foci 5 days after transfection. n = 60 (non-targeting) and n = 50 (targeting) visual fields. Box plot parameters as in Fig. 2. Two-sided Mann-Whitney-Wilcoxon test. (n) Plasmid retention after methylation in COLO320DM cells by quantitative PCR (three biological replicates). One-sided t-test. (o) Retention elements and oncogenes on ecDNA (left) confer retention and selection, respectively, two processes shaping the evolution of cancer cell lineages (right).

References

    1. Yan X., Mischel P. & Chang H. Extrachromosomal DNA in cancer. Nat Rev Cancer 24, 261–273 (2024). - PubMed
    1. Ilić M., Zaalberg I. C., Raaijmakers J. A. & Medema R. H. Life of double minutes: generation, maintenance, and elimination. Chromosoma 131, 107–125 (2022). - PMC - PubMed
    1. Levan A. & Levan G. Have double minutes functioning centromeres? Hereditas 88, 81–92 (1978). - PubMed
    1. Lundberg G. et al. Binomial Mitotic Segregation of MYCN-Carrying Double Minutes in Neuroblastoma Illustrates the Role of Randomness in Oncogene Amplification. PLOS ONE 3, e3099 (2008). - PMC - PubMed
    1. Lange J. T. et al. The evolutionary dynamics of extrachromosomal DNA in human cancers. Nat Genet 1–7 (2022) doi: 10.1038/s41588-022-01177-x. - DOI - PubMed

Publication types

LinkOut - more resources