. 2013 Sep;41(17):e165.

doi: 10.1093/nar/gkt641. Epub 2013 Jul 27.

TrAp: a tree approach for fingerprinting subclonal tumor composition

Francesco Strino¹, Fabio Parisi, Mariann Micsinai, Yuval Kluger

Affiliations

Affiliation

¹ Department of Pathology, Yale University School of Medicine, New Haven, CT 06520, USA, NYU Center for Health Informatics and Bioinformatics, New York University Langone Medical Center, 227 East 30th Street, New York, NY 10016, USA and Yale Cancer Center, New Haven, CT 06520, USA.

PMID: 23892400
PMCID: PMC3783191
DOI: 10.1093/nar/gkt641

TrAp: a tree approach for fingerprinting subclonal tumor composition

Francesco Strino et al. Nucleic Acids Res. 2013 Sep.

. 2013 Sep;41(17):e165.

doi: 10.1093/nar/gkt641. Epub 2013 Jul 27.

Authors

Francesco Strino¹, Fabio Parisi, Mariann Micsinai, Yuval Kluger

Affiliation

¹ Department of Pathology, Yale University School of Medicine, New Haven, CT 06520, USA, NYU Center for Health Informatics and Bioinformatics, New York University Langone Medical Center, 227 East 30th Street, New York, NY 10016, USA and Yale Cancer Center, New Haven, CT 06520, USA.

PMID: 23892400
PMCID: PMC3783191
DOI: 10.1093/nar/gkt641

Abstract

Revealing the clonal composition of a single tumor is essential for identifying cell subpopulations with metastatic potential in primary tumors or with resistance to therapies in metastatic tumors. Sequencing technologies provide only an overview of the aggregate of numerous cells. Computational approaches to de-mix a collective signal composed of the aberrations of a mixed cell population of a tumor sample into its individual components are not available. We propose an evolutionary framework for deconvolving data from a single genome-wide experiment to infer the composition, abundance and evolutionary paths of the underlying cell subpopulations of a tumor. We have developed an algorithm (TrAp) for solving this mixture problem. In silico analyses show that TrAp correctly deconvolves mixed subpopulations when the number of subpopulations and the measurement errors are moderate. We demonstrate the applicability of the method using tumor karyotypes and somatic hypermutation data sets. We applied TrAp to Exome-Seq experiment of a renal cell carcinoma tumor sample and compared the mutational profile of the inferred subpopulations to the mutational profiles of single cells of the same tumor. Finally, we deconvolve sequencing data from eight acute myeloid leukemia patients and three distinct metastases of one melanoma patient to exhibit the evolutionary relationships of their subpopulations.

PubMed Disclaimer

Figures

**Figure 1.**
A schema of deconvolution of the mixed signal of four subclones. In this example, the aggregate signal frequency vector y on the left side of the matrix-vector equation represents the frequency of five aberrations in the aggregate sample. To allow the heterogeneous mixture of subclones to include normal cells we introduce a dummy aberration that is present in any cell. The frequency of the dummy aberration y₁ is equal to one. The frequencies of the actual five aberrations A₂, A₃, A₄, A₅ and A₆ encoded in the remaining elements of the vector Y are given by , , , and , respectively. In this example, the optimal TrAp solution is unique and has four populated subclones: C₂ with aberrations , C₄ with aberrations , C₅ with aberrations and C₆ with aberrations . The optimal solution is shown both as an evolutionary tree (left) and in matrix form according to Equation (1) (right), where the tree topology is encoded in the binary matrix and the relative composition of the subclones is represented in the column vector.

formula image — **Figure 1.**
A schema of deconvolution of the mixed signal of four subclones. In this example, the aggregate signal frequency vector y on the left side of the matrix-vector equation represents the frequency of five aberrations in the aggregate sample. To allow the heterogeneous mixture of subclones to include normal cells we introduce a dummy aberration that is present in any cell. The frequency of the dummy aberration y₁ is equal to one. The frequencies of the actual five aberrations A₂, A₃, A₄, A₅ and A₆ encoded in the remaining elements of the vector Y are given by , , , and , respectively. In this example, the optimal TrAp solution is unique and has four populated subclones: C₂ with aberrations , C₄ with aberrations , C₅ with aberrations and C₆ with aberrations . The optimal solution is shown both as an evolutionary tree (left) and in matrix form according to Equation (1) (right), where the tree topology is encoded in the binary matrix and the relative composition of the subclones is represented in the column vector.

**Figure 2.**
Identification of first-generation trees. In this example, the aggregate signal frequency vector is consistent with three first-generation trees , and . Each first-generation tree is visualized as a matrix equation according to Equation (2) (left) and as a partial evolutionary tree (right). In the bottom row, the partial tree PT₁ given by the union of the partial trees T₁ and T₃ is shown. Question marks indicate values that are unknown as they are not specified by the first-generation tree or by the partial tree.

**Figure 3.**
Illustration of the usage of first-generation trees and partial trees for deriving the TrAp solution of a mixture of four subclones. In this example, five aberrations were measured from an aggregate sample and their frequencies were , , , and , respectively. The dummy measurement was also added to generate the aggregate signal frequency vector . In the first step, TrAp identifies all first-generation trees, namely and . In the second step, TrAp generates the possible partial trees, namely , , and , and consequently selects only , as it is the only partial tree that contains a maximum number of first-generation trees. In the third step, TrAp generates evolutionary trees starting from the partial tree . To complete the evolutionary tree starting from PT₄, the subclone C₁ is positioned as the root of the tree. Because C₁ is part of the first-generation tree T₁, the subclones C₂ and C₆ are automatically added as direct descendants of C₁. Next, C₃ is added as a direct descendant of C₂. Because C₃ is part of the first-generation tree T₂, C₄ is automatically added as direct descendant of C₃. Finally, C₅ is added as a direct descendant of C₄, generating the optimal TrAp solution to the subclonal deconvolution problem. We remark that the optimal solution generated by the TrAp algorithm is equal to the left solution of Supplementary Figure S1 and to solution in Supplementary Figure S2.

**Figure 4.**
Deconvolution of simulated data. In each table the index of a column represents the number of populated subclones and the index of a row represents the number of mutations. We generated 1000 simulations for any pair of row and column indices (pixel) in these tables. We performed this analysis using different level of noise (error) drawn from a uniform distribution . The heatmaps (tables) show the percentage of trees in each cell for which the true solution has the minimum number of subclones (left panel), is a TrAp solution (middle panel) and is the only TrAp solution (right panel) if the best solution is unique.

**Figure 5.**
Deconvolution of random mixtures of three subclones. The boxes represent different subclones, each denoted by the list of its aberrations. The aberration profiles of two subclones identified by cytogenetics in a melanoma biopsy (left) and the aberration profiles of three subclones identified in an adenocarcinoma biopsy (right) have been mixed *in silico* using random coefficients. In both cases, the mixtures were successfully deconvolved. Aberrations are grouped within the boxes according to the order of occurrence. The reconstructed evolutionary trees suggest intermediate (white boxes), probably rare, subclones that were not reported in the cytogenetic data.

**Figure 6.**
Deconvolution of a random mixture of eight sequences from SHM data. Eight sequences from the Ig locus of eight cells extracted from the same germinal center were mixed with the random coefficients given by . Since sequences five and eight are identical, they are grouped in a single clone whose relative frequency is . In total, 20 mutated nucleotides were found in the data, and two different mutations were identified at position 170. Mutations are shown using the notation ‘’, e.g., the notation indicates that the nucleotide at position 170 was mutated from Adenine to Guanine. The notation indicates that the nucleotide at position 170 was mutated twice, first from Adenine to Guanine and then from Guanine to Cytosine. In this example, all seven subclones were correctly deconvolved by the TrAp algorithm, the frequency of the subclones was correctly estimated and the solution was unique.

**Figure 7.**
Evolutionary trees inferred from three metastases of a melanoma patient. Each subclone in these trees is represented by a box with a list of mutations that includes only its new mutations (ancestral mutations can be read off by tracing back the mutation lists of all of its ancestors). Mutations are labeled according to the gene affected and the amino acid change caused by the mutation (e.g. the label DCC.L1099H indicates a mutation in the *DCC* gene that causes a mutation from a Leucine to a Histidine at position 1099 in the DCC protein). Highly expressed genes from this patient are indicated in bold. Mutations in the left branch of TM4 are more abundant than in TM1 and TM3. 44% (19%) of the subclones of TM3 (TM4) have mutations in DSC3, DSG1 and IMPACT. The TM3 subclone has an additional mutation in DCC.

See this image and copyright information in PMC

References

1. Nowell P. The clonal evolution of tumor cell populations. Science. 1976;194:23–28. - PubMed
1. Greaves M, Maley CC. Clonal evolution in cancer. Nature. 2012;481:306–313. - PMC - PubMed
1. Anderson K, Lutz C, van Delft FW, Bateman CM, Guo Y, Colman SM, Kempski H, Moorman AV, Titley I, Swansbury J, et al. Genetic variegation of clonal architecture and propagating cells in leukaemia. Nature. 2011;469:356–361. - PubMed
1. Cairns J. Mutation selection and the natural history of cancer. Sci. Aging Knowledge Environ. 2006;2006:cp1.
1. Klein CA. Parallel progression of primary tumours and metastases. Nat. Rev. Cancer. 2009;9:302–312. - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

TrAp: a tree approach for fingerprinting subclonal tumor composition

Affiliation

TrAp: a tree approach for fingerprinting subclonal tumor composition

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources