Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2017 Aug 1;7(8):a026625.
doi: 10.1101/cshperspect.a026625.

Principles of Reconstructing the Subclonal Architecture of Cancers

Affiliations
Review

Principles of Reconstructing the Subclonal Architecture of Cancers

Stefan C Dentro et al. Cold Spring Harb Perspect Med. .

Abstract

Most cancers evolve from a single founder cell through a series of clonal expansions that are driven by somatic mutations. These clonal expansions can lead to several coexisting subclones sharing subsets of mutations. Analysis of massively parallel sequencing data can infer a tumor's subclonal composition through the identification of populations of cells with shared mutations. We describe the principles that underlie subclonal reconstruction through single nucleotide variants (SNVs) or copy number alterations (CNAs) from bulk or single-cell sequencing. These principles include estimating the fraction of tumor cells for SNVs and CNAs, performing clustering of SNVs from single- and multisample cases, and single-cell sequencing. The application of subclonal reconstruction methods is providing key insights into tumor evolution, identifying subclonal driver mutations, patterns of parallel evolution and differences in mutational signatures between cellular populations, and characterizing the mechanisms of therapy resistance, spread, and metastasis.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
General overview of subclonal reconstruction. (A) During cancer evolution, a tumor acquires driver mutations (marked with a plus sign) that can initiate clonal expansions. (B) Over time, a number of these clonal expansions can occur, resulting in the increase of subpopulations of cells harboring distinct sets of mutations. Tumor samples typically consist of a mixture of tumor cells with mutations (solid lines) and normal cells without mutations (dashed lines). (C) Some mutations are carried by all tumor cells (marked with a square), whereas others are present in a subset of cells (triangle and circle). Using allele frequencies of mutations obtained from sequencing data and accounting for copy number aberrations, an estimate of the fraction of tumor cells carrying each mutation can be obtained. A set of mutations can then be used as a marker for a population of cells, allowing estimation of the fraction of tumor cells of the corresponding subclone. Clustering algorithms can be applied to obtain the cancer cell fractions (CCFs) of each subclone. (D and E) The relationship between subclones can be visualized as a tree. (D) Some methods perform this clustering in fraction of tumor cells space, and (E) others in the space of fraction of all cells.
Figure 2.
Figure 2.
Copy number alterations affect variant allele frequencies. Allele frequencies of single nucleotide variants (SNVs) must be transformed to cancer cell fractions (CCFs), accounting for copy number changes, before they can be clustered to identify subclonal populations. This illustration shows four SNVs in different (sub)clonal populations and in regions with different copy number states, to illustrate this principle. SNVs 1 and 2 are clonal and subclonal respectively and appear in a nonaberrated copy number state. SNV 3 coincides with a subclonal deletion, with the SNV falling on the retained allele (i.e., the other allele is subclonally deleted). SNV 4 has occurred before a gain and is therefore carried by two chromosome copies. Even though SNV 1, 3, and 4 are clonal, their allele frequencies differ because of copy number alterations (CNAs).
Figure 3.
Figure 3.
Stick-breaking schematic. The stick-breaking property of the Dirichlet process (DP) is used to estimate the number of mutation clusters in the data. For each mutation, a stick of arbitrary length is broken into randomly sized bits that represent a cluster. At point A, breaks have been introduced, corresponding to clusters c1-c4. B shows the stick after introducing break 5, whereas C shows the completed stick-breaking procedure. The size of each broken part represents the weight associated with a cluster and influences the mutation assignments, in which a high weight makes it more likely that a mutation is assigned to that cluster. These weights are updated after probabilities for each cluster have been obtained for each mutation, eventually converging on a solution.
Figure 4.
Figure 4.
Coverage, purity, and alterations affect the B-allele frequency of SNPs. B-allele frequencies (BAF) of germline heterozygous SNPs can be used to identify copy number aberrations. AF show that the BAF is noisy, and that it gets increasingly more difficult to separate the bands as the purity or coverage goes down and when the aberration is subclonal. To reduce the noise, SNPs can be phased to determine which allele is the B-allele. By combining the SNPs over longer stretches of DNA it becomes possible to detect subclonal aberrations.
Figure 5.
Figure 5.
Principles of SNV phasing. (A) A pair of single nucleotide variants (SNVs) that are close enough to be covered by a single read pair and that have occurred in the same lineage appear as read pairs that contain both variant alleles. (B) If the SNVs have originated in different lineages they appear in read pairs that contain the variant allele of one SNV and the wild type allele of the other. The variant allele frequency (VAF) of a subclonal SNV that has been subclonally deleted (as shown by the striped reads that represent the deleted copies in C) is “shifted” (E). An SNV that has occurred on the retained allele (i.e., the other allele is subclonally deleted, D) will not be shifted (F).

References

    1. Andor N, Graham TA, Jansen M, Xia LC, Aktipis CA, Petritsch C, Ji HP, Maley CC. 2016. Pan-cancer analysis of the extent and consequences of intratumor heterogeneity. Nat Med 22: 105–113. - PMC - PubMed
    1. Beerenwinkel N, Schwarz RF, Gerstung M, Markowetz F. 2015. Cancer evolution: Mathematical models and computational inference. Syst Biol 64: e1–e25. - PMC - PubMed
    1. Bolli N, Avet-Loiseau H, Wedge DC, Van Loo P, Alexandrov LB, Martincorena I, Dawson KJ, Iorio F, Nik-Zainal S, Bignell GR, et al. 2014. Heterogeneity of genomic evolution and mutational profiles in multiple myeloma. Nat Commun 10.1038/ncomms3997. - DOI - PMC - PubMed
    1. Campbell PJ, Pleasance ED, Stephens PJ, Dicks E, Rance R, Goodhead I, Follows GA, Green AR, Futreal PA, Stratton MR. 2008. Subclonal phylogenetic structures in cancer revealed by ultra-deep sequencing. Proc Natl Acad Sci 105: 13081–13086. - PMC - PubMed
    1. Carter SL, Cibulskis K, Helman E, McKenna A, Shen H, Zack T, Laird PW, Onofrio RC, Winckler W, Weir BA, et al. 2012. Absolute quantification of somatic DNA alterations in human cancer. Nat Biotechnol 30: 413–421. - PMC - PubMed