Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015;16 Suppl 11(Suppl 11):S7.
doi: 10.1186/1471-2164-16-S11-S7. Epub 2015 Nov 10.

Reference-free inference of tumor phylogenies from single-cell sequencing data

Reference-free inference of tumor phylogenies from single-cell sequencing data

Ayshwarya Subramanian et al. BMC Genomics. 2015.

Erratum in

Abstract

Background: Effective management and treatment of cancer continues to be complicated by the rapid evolution and resulting heterogeneity of tumors. Phylogenetic study of cell populations in single tumors provides a way to delineate intra-tumoral heterogeneity and identify robust features of evolutionary processes. The introduction of single-cell sequencing has shown great promise for advancing single-tumor phylogenetics; however, the volume and high noise in these data present challenges for inference, especially with regard to chromosome abnormalities that typically dominate tumor evolution. Here, we investigate a strategy to use such data to track differences in tumor cell genomic content during progression.

Results: We propose a reference-free approach to mining single-cell genome sequence reads to allow predictive classification of tumors into heterogeneous cell types and reconstruct models of their evolution. The approach extracts k-mer counts from single-cell tumor genomic DNA sequences, and uses differences in normalized k-mer frequencies as a proxy for overall evolutionary distance between distinct cells. The approach computationally simplifies deriving phylogenetic markers, which normally relies on first aligning sequence reads to a reference genome and then processing the data to extract meaningful progression markers for constructing phylogenetic trees. The approach also provides a way to bypass some of the challenges that massive genome rearrangement typical of tumor genomes presents for reference-based methods. We illustrate the method on a publicly available breast tumor single-cell sequencing dataset.

Conclusions: We have demonstrated a computational approach for learning tumor progression from single cell sequencing data using k-mer counts. k-mer features classify tumor cells by stage of progression with high accuracy. Phylogenies built from these k-mer spectrum distance matrices yield splits that are statistically significant when tested for their ability to partition cells at different stages of cancer.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Workflow for inferring phylogenies from single-cell genome sequencing data based on the k-mer approach. The major steps are k-mer counting, normalization, computation of distance matrices and phylogeny building.
Figure 2
Figure 2
Histogram of k-mer relative abundances. Both 20- and 25-mer relative abundance densities appear log-laplacian. These data included 20- and 25-mers found in all tumor cells. (a) Histogram of 20-mer relative abundances in log10 scale. (b) Histogram of 25-mer relative abundances in log10 scale.
Figure 3
Figure 3
tSNE ordination of single tumor cells. (a) Projection of tumor cells in the space of 20-mers present in all cells. The 20-mers separate the cells into 3 loose groups, the rightmost of which is dominated by primary cells and the others by metastatic cells. The cluster in the middle is equally represented by primary and metastatic cells from patient T16, suggesting a state of transition. The cluster on the upper right is mostly metastatic with some primary cells. This suggests two distinct stages of advancing tumor cells based on k-mer composition. (b) Projection of cells in the space of only those 20-mers that remain after the differential abundance test for k-mer selection. Cells group into 2 clusters, one of which is entirely composed of primary cells and the other a mix of primary and metastatic cells.
Figure 4
Figure 4
20-mer bootstrap consensus neighbor-joining tree built from T16 primary (prefix P) and metastatic data (prefix M). Distinct groupings of cells are labeled as clusters.
Figure 5
Figure 5
20-mer bootstrap consensus neighbor-joining tree built from T10 primary breast tumor cells (prefix C), T16 primary (prefix P) and metastatic data (prefix M). Distinct groupings of cells are labeled as clusters.

References

    1. Polyak K. Tumor heterogeneity confounds and illuminates: A case for darwinian tumor evolution. Nature Medicine. 2014;20(4):344–6. - PubMed
    1. Burrell RA, McGranahan N, Bartek J, Swanton C. The causes and consequences of genetic heterogeneity in cancer evolution. Nature. 2013;501(7467):338–345. - PubMed
    1. Golub TR, Slonim DK, Tamayo P. et al.Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286:531–537. - PubMed
    1. Perou CM, Sorlie T, Eisen MB. et al.Molecular portraits of human breast tumors. Nature. 2000;406:747–752. - PubMed
    1. Stratton MR, Campbell PJ, Futreal PA. The cancer genome. Nature. 2009;458(7239):719–724. - PMC - PubMed

Publication types

LinkOut - more resources