Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Feb 27;15(1):1768.
doi: 10.1038/s41467-024-45972-y.

scCircle-seq unveils the diversity and complexity of extrachromosomal circular DNAs in single cells

Affiliations

scCircle-seq unveils the diversity and complexity of extrachromosomal circular DNAs in single cells

Jinxin Phaedo Chen et al. Nat Commun. .

Erratum in

Abstract

Extrachromosomal circular DNAs (eccDNAs) have emerged as important intra-cellular mobile genetic elements that affect gene copy number and exert in trans regulatory roles within the cell nucleus. Here, we describe scCircle-seq, a method for profiling eccDNAs and unraveling their diversity and complexity in single cells. We implement and validate scCircle-seq in normal and cancer cell lines, demonstrating that most eccDNAs vary largely between cells and are stochastically inherited during cell division, although their genomic landscape is cell type-specific and can be used to accurately cluster cells of the same origin. eccDNAs are preferentially produced from chromatin regions enriched in H3K9me3 and H3K27me3 histone marks and are induced during replication stress conditions. Concomitant sequencing of eccDNAs and RNA from the same cell uncovers the absence of correlation between eccDNA copy number and gene expression levels, except for a few oncogenes, including MYC, contained within a large eccDNA in colorectal cancer cells. Lastly, we apply scCircle-seq to one prostate cancer and two breast cancer specimens, revealing cancer-specific eccDNA landscapes and a higher propensity of eccDNAs to form in amplified genomic regions. scCircle-seq is a scalable tool that can be used to dissect the complexity of eccDNAs across different cell and tissue types, and further expands the potential of eccDNAs for cancer diagnostics.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. scCircle-seq implementation and validation.
a scCircle-seq workflow. Single nuclei are sorted into 96-well plates or mouth-pipetted into PCR tubes, after which nuclei are lysed and single-stranded DNA breaks (nicks) are repaired using Taq DNA ligase and Bst DNA polymerase. Next, linear genomic DNA (gDNA) is digested with Exonuclease V and the remaining circular DNA is amplified by rolling circle amplification (RCA) using random primers and ϕ29 DNA Polymerase. The amplified DNA is then subjected to library preparation using the Illumina Nextera kit (see Methods). b Scheme of the computational pipeline used to identify circle-producing regions (CPRs). Blue bars, paired-end sequencing reads. Magenta arches, examples of chimeric reads (discordant and split read pairs) defining a chimeric junction (yellow arch) connecting the extremities of a CPR (gray bar). c Distributions of the number of CPRs identified by scCircle-seq in five cell lines. n, number of single cells analyzed. Boxplots extend from the 25th to the 75th percentile, horizontal bars represent the median, and whiskers extend from –1.5 × IQR to + 1.5 × IQR from the closest quartile, where IQR is the inter-quartile range. Black dots, outliers. In each boxplot, the minimum and maximum are defined, respectively, by the uppermost and lowermost outlier dot or extremity of the corresponding whisker. d Same as in (c) but for the percentage of the genome covered by CPRs. e Probability density distributions of the length of CPRs identified in the five cell lines in (c) and (d). kb kilobase. f Integrative Genomics Viewer (IGV) tracks showing the coverage (dark blue) of the indicated genomic region on chromosome (chr) 6 by Circle-Seq (top track) and scCircle-seq in four different Colo320DM cells. Gray bars, CPRs. g Examples of IGV tracks for simple and complex CPRs identified by scCircle-seq on two different chromosomes in Colo320DM cells. For each cell, the upper track indicates the coverage of all reads while the lower track shows the coverage of circle-supporting reads. In (f) and (g), the numbers in squared brackets represent the intensity range of the track. n, number of chimeric junction-supporting reads for the indicated CPRs. Source data are provided as a Source Data file.
Fig. 2
Fig. 2. Genomic distribution of eccDNAs identified by scCircle-seq.
a Scheme explaining how we computed the autocorrelation of the scCircle-seq signal as a function of genomic distance. In general, autocorrelation refers to the correlation between a signal and a distance or time-delayed version of itself. Here, autocorrelation refers to the correlation between the scCircle-seq signal fx (i.e., the genome coverage of sequencing reads coming from eccDNAs) and the same signal shifted of a genomic distance, d (f(x+d)). Autocorrelation of the scCircle-seq signal as function of genomic distance for a single Hela cell (b) and for the corresponding pseudo-bulk scCircle-seq dataset from 24 HeLa cells (c). Shuffle, auto-correlation after random permutation of the genomic coordinates of scCircle-seq reads. kb, kilobase. d Classification of eccDNAs based on their frequency and genome coverage across multiple single cells. e Correlation between the frequency and coverage (scatterplots) and relative abundance (pie charts) of the four different types of eccDNAs displayed in (d), in each of the five cell types profiled by scCircle-seq. Colors are the same as in (d). Each dot in the scatterplots represents a circle-producing region (CPR). Marginal distributions are shown on the top and right side of each scatterplot. f Number of eccDNA reads along the gene body of protein-coding genes in HeLa cells. n, number of genes. g Number of eccDNA reads inside CPRs overlapping with chromatin immunoprecipitation and sequencing (ChIP-seq) peaks for the indicated histone marks in HeLa cells. n, number of ChIP-seq peaks. h Probability density distribution of the frequency of CPRs overlapping with enhancers versus all other CPRs in Colo320DM cells. n, total number of CPRs. i Log-transformed mean transcripts per kilobase million (TPM) versus the Pearson’s correlation coefficient (PCC) calculated between the normalized gene expression and normalized number of eccDNA reads for the same gene. Each dot represents a gene. Genes on HFHU eccDNAs are labeled and colored in darker green. The red dashed vertical line serves to visually highlight the minority of genes for which the correlation is relatively strong (PCC > 0.6). n, number of genes. Source data are provided as a Source Data file.
Fig. 3
Fig. 3. Cell specificity and dynamics of eccDNAs.
a Scheme of the computational approach used for topic modeling of scCircle-seq data using cisTopic. For a detailed description of the approach, see Methods. b Uniform Manifold Approximation and Projection (UMAP) representation of scCircle-seq data after topic modeling performed as described in (a). Cells are colored by cell type. c Heatmap representation of topic contribution for each single cell. Cells and topics are clustered hierarchically. Cells are colored by cell type as in (b). d Genome-wide contribution to each of the 14 topics identified by topic modeling of scCircle-seq data. Cell type-specific topics are colored by cell type as in (b). Gray tracks indicate topics that are not specific to one of the five cell lines processed by scCircle-seq. Heatmap representation of the enrichment in various genomic features (e) and histone marks (f) for each of the 14 topics identified by topic modeling of scCircle-seq data. Rows and columns are clustered hierarchically. bp base-pair, kb kilobase. g UMAP representation of scCircle-seq data from HeLa cells treated or not with methotrexate (MTX). Each dot represents a single cell. h Distribution of the length of circle-producing regions (CPRs) identified in the same HeLa cells shown in (g). Heatmap representations of the Euclidean distance between pairs of daughter cells for HFHU (i) and LFLU (j) eccDNAs. Cell pairs are indicated by the colorbars on the right and bottom of each heatmap. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. scCircle-seq detects eccDNAs in patient-derived tumor samples.
a Distributions of the number of circle-producing regions (CPRs) in nuclei extracted from three tumor samples. PRAD, prostate adenocarcinoma. LumB, Luminal B-like breast cancer. TNBC, triple-negative breast cancer. n, number of cells analyzed. b As (a) but for the genome coverage of the CPRs identified. c Uniform Manifold Approximation and Projection (UMAP) representation of all cells analyzed, after topic modeling of the corresponding eccDNAs. Each dot represents one cell. d Volcano plot showing differentially expressed topics. –Log(p-adj), negative logarithm of the adjusted P value calculated using the Benjamini–Hochberg method (two-sided, pair-wise). Log2(fold change), base-2 logarithm of the fold change. e, f UMAP representation of LumB and TNBC cells, after topic modeling of the corresponding eccDNAs. Left, Cells colored by sample type. Right, Cells colored by clusters identified by unsupervised clustering. g As (d) but comparing Cluster-1 and Cluster-2 from (f). h As (e) but with nuclei color-coded based on enrichment of the corresponding eccDNAs inside genomic regions amplified in breast cancer samples in The Cancer Genome Atlas (TCGA). i Distributions of the enrichment inside genomic regions amplified in TCGA breast cancers, for the eccDNAs identified in LumB and TNBC cells belonging to UMAP Cluster-1 and Cluster-2 in (f). n, number of single cells analyzed. P, t-test, two-tailed. j, k Same as in (h) and (i), respectively, but for genomic regions amplified in the TNBC sample based on single-cell DNA sequencing using Acoustic Cell Tagmentation (ACT). l Distributions of the normalized number of eccDNAs per 100 kilobases (kb) inside genomic regions with different copy numbers determined by ACT. P, t-test, two-sided. Genomic regions are grouped based on the corresponding copy number. In all the boxplots, boxes extend from the 25th to the 75th percentile, horizontal bars represent the median, and whiskers extend from –1.5 × IQR to +1.5 × IQR from the closest quartile, where IQR is the inter-quartile range. Black dots, outliers. Minimum and maximum are defined, respectively, by the uppermost and lowermost outlier dot or extremity of the corresponding whisker. Source data are provided as a Source Data file.

Similar articles

Cited by

References

    1. Cox D, Yuncken C, Spriggs AI. Minute chromatin bodies in malignant tumours of childhood. Lancet Lond. Engl. 1965;1:55–58. doi: 10.1016/S0140-6736(65)90131-5. - DOI - PubMed
    1. Montgomery KT, Biedler JL, Spengler BA, Melera PW. Specific DNA sequence amplification in human neuroblastoma cells. Proc. Natl Acad. Sci. USA. 1983;80:5724–5728. doi: 10.1073/pnas.80.18.5724. - DOI - PMC - PubMed
    1. Haber DA, Schimke RT. Unstable amplification of an altered dihydrofolate reductase gene associated with double-minute chromosomes. Cell. 1981;26:355–362. doi: 10.1016/0092-8674(81)90204-X. - DOI - PubMed
    1. Beverley SM, Coderre JA, Santi DV, Schimke RT. Unstable DNA amplifications in methotrexate-resistant Leishmania consist of extrachromosomal circles which relocalize during stabilization. Cell. 1984;38:431–439. doi: 10.1016/0092-8674(84)90498-7. - DOI - PubMed
    1. Cohen S, Houben A, Segal D. Extrachromosomal circular DNA derived from tandemly repeated genomic sequences in plants. Plant J. Cell Mol. Biol. 2008;53:1027–1034. doi: 10.1111/j.1365-313X.2007.03394.x. - DOI - PubMed