Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Aug 1;6(3):zcae035.
doi: 10.1093/narcan/zcae035. eCollection 2024 Sep.

CytoCellDB: a comprehensive resource for exploring extrachromosomal DNA in cancer cell lines

Affiliations

CytoCellDB: a comprehensive resource for exploring extrachromosomal DNA in cancer cell lines

Jacob Fessler et al. NAR Cancer. .

Abstract

Recently, the cancer community has gained a heightened awareness of the roles of extrachromosomal DNA (ecDNA) in cancer proliferation, drug resistance and epigenetic remodeling. However, a hindrance to studying ecDNA is the lack of available cancer model systems that express ecDNA. Increasing our awareness of which model systems express ecDNA will advance our understanding of fundamental ecDNA biology and unlock a wealth of potential targeting strategies for ecDNA-driven cancers. To bridge this gap, we created CytoCellDB, a resource that provides karyotype annotations for cell lines within the Cancer Dependency Map (DepMap) and the Cancer Cell Line Encyclopedia (CCLE). We identify 139 cell lines that express ecDNA, a 200% increase from what is currently known. We expanded the total number of cancer cell lines with ecDNA annotations to 577, which is a 400% increase, covering 31% of cell lines in CCLE/DepMap. We experimentally validate several cell lines that we predict express ecDNA or homogeneous staining regions (HSRs). We demonstrate that CytoCellDB can be used to characterize aneuploidy alongside other molecular phenotypes, (gene essentialities, drug sensitivities, gene expression). We anticipate that CytoCellDB will advance cytogenomics research as well as provide insights into strategies for developing therapeutics that overcome ecDNA-driven drug resistance.

PubMed Disclaimer

Figures

Graphical Abstract
Graphical Abstract
Figure 1.
Figure 1.
Increasing the scope of ecDNA knowledge in common cancer cell lines. (A) The biogenesis mechanism for ecDNA is not well understood, but ecDNA are focal amplifications of genomic regions that can multimerize to co-localize on the same ecDNA amplicon. Depending on the cell line, this can lead to many diverse species. (B) Microscopy methods, such as scanning electron microscopy (SEM) and DNA Fluorescence in situ Hybridization (FISH) are gold standard approaches for visualizing ecDNA. DNA FISH uses fluorescent probes that bind to specific DNA sequences that indicate which regions of the genome are amplified by ecDNA. Multiple species can be observed depending on which fluorescent probe is detected. (C) Co-localization of genes from different chromosomes on the same ecDNA amplicon may lead to novel biological functionalities. A breakpoint junction is where two parts of different chromosomes come together, potentially producing fusion DNA or may lead to changes in regulation and transcription through enhancer remodeling. (D) Asymmetric division of ecDNA molecules into daughter cells during replication and division. Multiple rounds of replication leads to a heterogeneous mixture of cells with various ecDNA counts and species. (E) Recent efforts to probe ecDNA status in cell line populations provides data on 112 cell lines, in which 46 are labeled as ‘true positive’ and 66 are labeled as ‘true negative.’ These cell lines mostly consist of small cell lung cancer (54%). (F) CytoCellDB represents the largest repository for ecDNA classification, providing queryable data for cell lines, as well as ecDNA and homogenous staining region (HSR) annotations. It also provides computational predictions using several available computational frameworks. Karyotype records were classified as high, medium or low confidence. (G) CytoCellDB aligns with the Cancer Cell Line Encyclopedia (CCLE) and Dependency Map (DepMap) to enable a multi-omic view of ecDNA biology. A donut plot represents all the available multi-omic data for cell lines with ecDNA annotations.
Figure 2.
Figure 2.
Discovery and analysis of hundreds of cancer cell karyotype data. (A) We developed a pipeline to mine and organize unstructured karyotype data for hundreds of cancer cell lines. Some records were found in previously published literature while others were taken from cell line vendor websites. All records were manually curated to extract key features. In total, features helped to identify 139 ecDNA+ cell lines, and 438 ecDNA– cell lines, which represents 30% of the cell lines in CCLE/DepMap. (B) Polyploidy characteristics of ecDNA+ indicate that half the cells have ploidies greater than 3N. (C) Karyotype features were compared between ecDNA+ and ecDNA– cells to identify the most significant, differentiating properties of ecDNA+ cell lines. The lower p-values indicate more significant associations with a karyotype feature in ecDNA+ cell lines. (D) Chromothripsis was predicted in ecDNA+ and ecDNA– cell lines, indicating that genomic rearrangements for many ecDNA+ are not characteristic of chromothripsis. (E) Alignment of chromosomes involved in chromothripsis predictions (high confidence versus low confidence) with chromosomes observed to have abnormalities (e.g. chromosome gains or losses, translocations or derivative and marker chromosomes) in karyotyping. Matched refers to the percentage of chromosomes predicted to be involved in chromothripsis that were also seen to have aberrations in karyotype data.
Figure 3.
Figure 3.
EcDNA+ cells have distinct CNV patterns, pathways and drug responses. (A) Multi-omic pipeline that integrates DepMap CRISPR data with DepMap/CCLE transcriptomics data and CTRP/GDSC drug sensitivity profiles to explore uniquely different genes, vulnerabilities and drug responses in ecDNA+ and ecDNA– cell lines. (B) In ecDNA+ cell lines, certain chromosomes are more commonly amplified (chr 17, 8, 1 and 2) and based on the chromosome, may have a wide range of genes amplified (chr 1) or may have specific regions of the chromosome commonly amplified (chr 2). (C) Two commonly amplified chromosomal regions in ecDNA+/HSR+ cell lines are chr 8 (bottom) and chr 17 (top). Plotted are heatmaps of CNV and gene expression in these amplified regions, showing patterns of expressions across cell lines. (D) Comparing differentially expressed genes and genes with significantly different gene dependency scores in ecDNA+ versus ecDNA– cell lines. (E) Comparing differentially expressed genes and genes involved in drug response, we identified apoptosis (BCL2-family) genes that have significantly higher amplification in ecDNA+ cells. (F) Comparing gene dependencies and genes involved in drug response, we identified three genes in which ecDNA+ cells have lower dependency scores and increased drug sensitivity. The lower p-values indicate more significant associations between drug sensitivity and ecDNA presence. For example, a BCL2-family gene (BCL2) has significantly lower dependencies in ecDNA+ cells compared to ecDNA– cells and is a mediator for apoptosis activating drugs, such as sz4ta2 and bam7. In both of these drugs, ecDNA+ cell lines exhibit significantly more sensitive responses compared to ecDNA– cell lines.
Figure 4.
Figure 4.
Comparison of computational predictions of ecDNA to karyotype annotations. (A) Confusion matrices for two software programs that predict ecDNA were generated after assessing whether predictions aligned with karyotype-based ecDNA annotations. True positives (TP) refer to cell lines that are predicted to express ecDNA and are confirmed by karyotype whereas true negatives (TN) are cell lines that are not predicted to express ecDNA and karyotypes do not report ecDNA amplification. False positives (FP) refer to cell lines that are computationally predicted to express ecDNA, but ecDNA amplification was not observed experimentally. False negatives are cell lines that are computationally predicted to not express ecDNA but ecDNA amplification was seen experimentally. (B) Confusion matrix for AmpliconArchitect, in which accuracies are reported for 448 runs, which used default parameters, and 328 predictions from Amplicon Repository. (C) Confusion matrix for CircleHunter, in which accuracies are reported from 233 runs, which used default parameters. (D) A summary of accuracies, precision scores, f1 scores and MCC metrics for AmpliconArchitect and CircleHunter.
Figure 5.
Figure 5.
Machine learning of karyotype data predicts ecDNA and HSRs. (A) A machine learning-based approach that takes in training karyotype data from CytoCellDB and the Mitelman database. Feature extraction identified 237 unique and independent features that were used to predict the expression of either ecDNA or HSRs. The training data was split so that 80% of the data was used for training and 20% of the data was used for testing. The final output was an analytic assessment of how well karyotype features predicted ecDNA versus HSRs. (B) ROC curves of the candidate models indicated that both random forest and gradient boosting models performed reasonably well for ecDNA+ prediction, with AUC = 0.92. (C) Karyotype features with the highest importance during ecDNA+ prediction. (D) ROC curves of the candidate models indicated that both random forest and gradient boosting models performed reasonably well for ecDNA/HSR+ prediction, with AUC = 0.88. (E) Karyotype features with the highest importance during HSR/ecDNA+ prediction.
Figure 6.
Figure 6.
Multi-omics analysis finds key functional relationships in ecDNA cell lines. (A) Integration of bulk RNA-seq data with copy number variation data (derived from WGS) was taken from CCLE/DepMap. Pairwise analysis across genes and cell lines was performed and clustering analytics identified key patterns in cell lines expressing ecDNA. (B) Clustering across all cell lines for a given gene (left) versus clustering across all genes for a given cell line (right). Clusters were assigned groups, based on quantile-based thresholding. (C) Similar pairwise patterns in RNA versus CNV are seen for specific genes, such as ERBB2, MYC and FGFR2. Cell lines in cluster I (blue) have a high likelihood of expressing ecDNA or HSRs. This means that high copy number counts (amplification via ecDNA or HSR) is coupled with high expression of the same genes, indicating a functional relationship exists. (D) Several cell lines with unknown ecDNA annotations were validated experimentally with DNA FISH. In two out of the three cases, genes were amplified on ecDNA and in one case on HSR. (E) A parameter QCNV was derived to functionally differentiate genes amplified and expressed from ecDNA versus from HSR. (F) Computing QCNV based on copy number data indicates that there is a difference in the number of genes amplified and the frequency at which they are amplified in the population that separates ecDNA+ cells from ecDNA– cells. (G) Building an integrative predictive model for ecDNA amplification considered functional clustering and predictions from AmpliconArchitect and CircleHunter. A confusion matrix reports the accuracy and precision metrics.

References

    1. Pongor L.S., Schultz C.W., Rinaldi L., Wangsa D., Redon C.E., Takahashi N., Fialkoff G., Desai P., Zhang Y., Burkett S. et al. . Extrachromosomal DNA amplification contributes to small cell lung cancer heterogeneity and is associated with worse outcomes. Cancer Discov. 2023; 13:928–949. - PMC - PubMed
    1. Kim H., Nguyen N.-P., Turner K., Wu S., Gujar A.D., Luebeck J., Liu J., Deshpande V., Rajkumar U., Namburi S. et al. . Extrachromosomal DNA is associated with oncogene amplification and poor outcome across multiple cancers. Nat. Genet. 2020; 52:891–897. - PMC - PubMed
    1. deCarvalho A.C., Kim H., Poisson L.M., Winn M.E., Mueller C., Cherba D., Koeman J., Seth S., Protopopov A., Felicella M. et al. . Discordant inheritance of chromosomal and extrachromosomal DNA elements contributes to dynamic disease evolution in glioblastoma. Nat. Genet. 2018; 50:708–717. - PMC - PubMed
    1. Sanborn J.Z., Salama S.R., Grifford M., Brennan C.W., Mikkelsen T., Jhanwar S., Katzman S., Chin L., Haussler D. Double minute chromosomes in glioblastoma multiforme are revealed by precise reconstruction of oncogenic amplicons. Cancer Res. 2013; 73:6036–6045. - PMC - PubMed
    1. Spriggs A.I., Boddington M.M., Clarke C.M. Chromosomes of human cancer cells. Br. Med. J. 1962; 2:1431–1435. - PMC - PubMed

LinkOut - more resources