Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Sep;41(17):e165.
doi: 10.1093/nar/gkt641. Epub 2013 Jul 27.

TrAp: a tree approach for fingerprinting subclonal tumor composition

Affiliations

TrAp: a tree approach for fingerprinting subclonal tumor composition

Francesco Strino et al. Nucleic Acids Res. 2013 Sep.

Abstract

Revealing the clonal composition of a single tumor is essential for identifying cell subpopulations with metastatic potential in primary tumors or with resistance to therapies in metastatic tumors. Sequencing technologies provide only an overview of the aggregate of numerous cells. Computational approaches to de-mix a collective signal composed of the aberrations of a mixed cell population of a tumor sample into its individual components are not available. We propose an evolutionary framework for deconvolving data from a single genome-wide experiment to infer the composition, abundance and evolutionary paths of the underlying cell subpopulations of a tumor. We have developed an algorithm (TrAp) for solving this mixture problem. In silico analyses show that TrAp correctly deconvolves mixed subpopulations when the number of subpopulations and the measurement errors are moderate. We demonstrate the applicability of the method using tumor karyotypes and somatic hypermutation data sets. We applied TrAp to Exome-Seq experiment of a renal cell carcinoma tumor sample and compared the mutational profile of the inferred subpopulations to the mutational profiles of single cells of the same tumor. Finally, we deconvolve sequencing data from eight acute myeloid leukemia patients and three distinct metastases of one melanoma patient to exhibit the evolutionary relationships of their subpopulations.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
A schema of deconvolution of the mixed signal of four subclones. In this example, the aggregate signal frequency vector y on the left side of the matrix-vector equation represents the frequency of five aberrations in the aggregate sample. To allow the heterogeneous mixture of subclones to include normal cells we introduce a dummy aberration that is present in any cell. The frequency of the dummy aberration y1 is equal to one. The frequencies of the actual five aberrations A2, A3, A4, A5 and A6 encoded in the remaining elements of the vector Y are given by formula image, formula image, formula image, formula image and formula image, respectively. In this example, the optimal TrAp solution is unique and has four populated subclones: C2 with aberrations formula image, C4 with aberrations formula image, C5 with aberrations formula image and C6 with aberrations formula image. The optimal solution is shown both as an evolutionary tree (left) and in matrix form according to Equation (1) (right), where the tree topology is encoded in the binary matrix and the relative composition of the subclones is represented in the column vector.
Figure 2.
Figure 2.
Identification of first-generation trees. In this example, the aggregate signal frequency vector formula image is consistent with three first-generation trees formula image, formula image formula image and formula image. Each first-generation tree is visualized as a matrix equation formula image according to Equation (2) (left) and as a partial evolutionary tree (right). In the bottom row, the partial tree PT1 given by the union of the partial trees T1 and T3 is shown. Question marks indicate values that are unknown as they are not specified by the first-generation tree or by the partial tree.
Figure 3.
Figure 3.
Illustration of the usage of first-generation trees and partial trees for deriving the TrAp solution of a mixture of four subclones. In this example, five aberrations were measured from an aggregate sample and their frequencies were formula image, formula image, formula image, formula image and formula image, respectively. The dummy measurement formula image was also added to generate the aggregate signal frequency vector formula image. In the first step, TrAp identifies all first-generation trees, namely formula image and formula image. In the second step, TrAp generates the possible partial trees, namely formula image, formula image, formula image and formula image, and consequently selects only formula image, as it is the only partial tree that contains a maximum number of first-generation trees. In the third step, TrAp generates evolutionary trees starting from the partial tree formula image. To complete the evolutionary tree starting from PT4, the subclone C1 is positioned as the root of the tree. Because C1 is part of the first-generation tree T1, the subclones C2 and C6 are automatically added as direct descendants of C1. Next, C3 is added as a direct descendant of C2. Because C3 is part of the first-generation tree T2, C4 is automatically added as direct descendant of C3. Finally, C5 is added as a direct descendant of C4, generating the optimal TrAp solution to the subclonal deconvolution problem. We remark that the optimal solution generated by the TrAp algorithm is equal to the left solution of Supplementary Figure S1 and to solution formula image in Supplementary Figure S2.
Figure 4.
Figure 4.
Deconvolution of simulated data. In each table the index of a column represents the number of populated subclones and the index of a row represents the number of mutations. We generated 1000 simulations for any pair of row and column indices (pixel) in these tables. We performed this analysis using different level of noise (error) drawn from a uniform distribution formula image. The heatmaps (tables) show the percentage of trees in each cell for which the true solution has the minimum number of subclones (left panel), is a TrAp solution (middle panel) and is the only TrAp solution (right panel) if the best solution is unique.
Figure 5.
Figure 5.
Deconvolution of random mixtures of three subclones. The boxes represent different subclones, each denoted by the list of its aberrations. The aberration profiles of two subclones identified by cytogenetics in a melanoma biopsy (left) and the aberration profiles of three subclones identified in an adenocarcinoma biopsy (right) have been mixed in silico using random coefficients. In both cases, the mixtures were successfully deconvolved. Aberrations are grouped within the boxes according to the order of occurrence. The reconstructed evolutionary trees suggest intermediate (white boxes), probably rare, subclones that were not reported in the cytogenetic data.
Figure 6.
Figure 6.
Deconvolution of a random mixture of eight sequences from SHM data. Eight sequences from the Ig locus of eight cells extracted from the same germinal center were mixed with the random coefficients given by formula image. Since sequences five and eight are identical, they are grouped in a single clone whose relative frequency is formula image. In total, 20 mutated nucleotides were found in the data, and two different mutations were identified at position 170. Mutations are shown using the notation ‘formula image’, e.g., the notation formula image indicates that the nucleotide at position 170 was mutated from Adenine to Guanine. The notation formula image indicates that the nucleotide at position 170 was mutated twice, first from Adenine to Guanine and then from Guanine to Cytosine. In this example, all seven subclones were correctly deconvolved by the TrAp algorithm, the frequency of the subclones was correctly estimated and the solution was unique.
Figure 7.
Figure 7.
Evolutionary trees inferred from three metastases of a melanoma patient. Each subclone in these trees is represented by a box with a list of mutations that includes only its new mutations (ancestral mutations can be read off by tracing back the mutation lists of all of its ancestors). Mutations are labeled according to the gene affected and the amino acid change caused by the mutation (e.g. the label DCC.L1099H indicates a mutation in the DCC gene that causes a mutation from a Leucine to a Histidine at position 1099 in the DCC protein). Highly expressed genes from this patient are indicated in bold. Mutations in the left branch of TM4 are more abundant than in TM1 and TM3. 44% (19%) of the subclones of TM3 (TM4) have mutations in DSC3, DSG1 and IMPACT. The TM3 subclone has an additional mutation in DCC.

References

    1. Nowell P. The clonal evolution of tumor cell populations. Science. 1976;194:23–28. - PubMed
    1. Greaves M, Maley CC. Clonal evolution in cancer. Nature. 2012;481:306–313. - PMC - PubMed
    1. Anderson K, Lutz C, van Delft FW, Bateman CM, Guo Y, Colman SM, Kempski H, Moorman AV, Titley I, Swansbury J, et al. Genetic variegation of clonal architecture and propagating cells in leukaemia. Nature. 2011;469:356–361. - PubMed
    1. Cairns J. Mutation selection and the natural history of cancer. Sci. Aging Knowledge Environ. 2006;2006:cp1.
    1. Klein CA. Parallel progression of primary tumours and metastases. Nat. Rev. Cancer. 2009;9:302–312. - PubMed

Publication types