Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 4;41(3):btaf072.
doi: 10.1093/bioinformatics/btaf072.

Single-cell copy number calling and event history reconstruction

Affiliations

Single-cell copy number calling and event history reconstruction

Jack Kuipers et al. Bioinformatics. .

Abstract

Motivation: Copy number alterations are driving forces of tumour development and the emergence of intra-tumour heterogeneity. A comprehensive picture of these genomic aberrations is therefore essential for the development of personalised and precise cancer diagnostics and therapies. Single-cell sequencing offers the highest resolution for copy number profiling down to the level of individual cells. Recent high-throughput protocols allow for the processing of hundreds of cells through shallow whole-genome DNA sequencing. The resulting low read-depth data poses substantial statistical and computational challenges to the identification of copy number alterations.

Results: We developed SCICoNE, a statistical model and MCMC algorithm tailored to single-cell copy number profiling from shallow whole-genome DNA sequencing data. SCICoNE reconstructs the history of copy number events in the tumour and uses these evolutionary relationships to identify the copy number profiles of the individual cells. We show the accuracy of this approach in evaluations on simulated data and demonstrate its practicability in applications to two breast cancer samples from different sequencing protocols.

Availability and implementation: SCICoNE is available at https://github.com/cbg-ethz/SCICoNE.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Overview of CNA calling and tree inference with SCICoNE. (a) From single-cell DNA sequencing we obtain noisy read count data reflecting the underlying copy number profiles. (b) As a first step, we detect breakpoints to partition the genome into segments (each comprising several bins) that may experience CNAs. In this case, four breakpoints define the five segments S1,,S5. (c) From the data, we then infer the evolutionary history of copy number changes which accumulate (plus signifies an amplification, minus a deletion) at the nodes of an event tree. By attaching the single-cells to the event tree we obtain their CNAs by tracing the path from the root to call the copy numbers of each clone, or group of cells with the same profile (d). For example, Clone 4 has experienced two CN events after the diploid root, namely Event 1, a gain in a region spanning segments S1 and S2, and Event 5, a further gain in S1, such that the copy number profile of Clone 4 is (4,3,2,2,2) across the five segments.
Figure 2.
Figure 2.
Inferred tree for 260 cells from a breast xenograft (Zahn et al. 2017). Inside the nodes of the CNA tree we highlight the total number of amplification or deletion events (in parentheses), the genes which are affected [amongst all genes from the COSMIC Cancer Gene Census (Sondka et al. 2018), with the 33 associated with breast cancer highlighted], including how much they are amplified or deleted, and the number of cells that best attach to each node. The CNAs are not displayed at the grey (leaf) nodes where only the number of cells attached is indicated. The number of cells attached to the leaf and internal nodes combine to the 260 cells in total. Example profiles of cells attaching to the two nodes with coloured (thick) borders (one from the largest subclone and one from the other main lineage) are displayed in Fig. 3c.
Figure 3.
Figure 3.
Inferred copy number profiles for 260 cells from a breast xenograft (Zahn et al. 2017). (a) Normalized counts per bin, ordered according to the tree in Fig. 2. (b) Copy number profiles estimated jointly with the CNA tree of Fig. 2. (c) Two examples of raw count data (black dots) and inferred copy number profiles (coloured lines) of the two cells indicated by arrows in the heatmaps.
Figure 4.
Figure 4.
Inferred copy number profiles for 2053 cells from a breast cancer (10x Genomics). (a) Inferred tree on the clustered data, with the genes which are affected [among the 33 associated with breast cancer in the COSMIC Cancer Gene Census Sondka et al. 2018] displayed at each node. (b) Counts per cell for the different clusters. (c) Normalized counts per bin. (d) Copy number profiles estimated jointly with the CNA tree.
Figure 5.
Figure 5.
Comparison of copy number calling for simulated data. For uniform random trees with 20 nodes, we attached 400 cells and simulated overdispersed read data according to each cell’s copy number profile over 10 000 bins. The total number of reads was 20k, 40k, and 80k for an average read depth of 2X, 4X, and 8X (colour intensity, left to right) per bin for each cell. The maximal number of segments affected by copy number changes was 40 and 80 (panels). The root mean squared difference Δ between the true simulated copy number profiles and the corresponding inferred profiles over all bins and cells is summarized in each box plot (generated with ggplot2 default settings), for a neutral diploid profile, for profiles inferred by hierarchical , and PhenoGraph clustering as well as SCICoNE on PhenoGraph clustered data, followed by CONET (Markowska et al. 2022), and HMMcopy (Lai et al. 2016), Ginkgo (Garvin et al. 2015), and SCOPE (Wang et al. 2020), NestedBD (Liu et al. 2024), and SCICoNE on the full single-cell sequencing data. The comparison with a logarithmic transform of Δ is displayed in Supplementary Fig. S5.
Figure 6.
Figure 6.
Comparison of copy number tree reconstruction for simulated data. For the simulated data of Fig. 5 and Supplementary Fig. S6 we compute the tree distance τ (Section 2.10) between the true and inferred tree from CONET (Markowska et al. 2022), MEDALT (Wang et al. 2021) run on the output of Ginkgo (Garvin et al. 2015), and SCOPE (Wang et al. 2020), NestedBD (Liu et al. 2024) as well as SCICoNE. Since the tree distances cover several orders of magnitude we use a logarithmic axis.

References

    1. 10x Genomics. https://www.10xgenomics.com/products/single-cell-cnv (25 February 2025, date last accessed).
    1. Allison KH, Sledge GW. Heterogeneity and cancer. Oncology (Williston Park) 2014;28:772–8. - PubMed
    1. Baslan T, Kendall J, Rodgers L et al. Genome-wide copy number analysis of single cells. Nat Protoc 2012;7:1024–41. - PMC - PubMed
    1. Bielski CM, Zehir A, Penson AV et al. Genome doubling shapes the evolution and prognosis of advanced cancers. Nat Genet 2018;50:1189–95. - PMC - PubMed
    1. Bouckaert R, Heled J, Kühnert D et al. BEAST 2: a software platform for Bayesian evolutionary analysis. PLoS Comput Biol 2014;10:e1003537. - PMC - PubMed

LinkOut - more resources