Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jul 1;29(13):i189-98.
doi: 10.1093/bioinformatics/btt205.

Phylogenetic analysis of multiprobe fluorescence in situ hybridization data from tumor cell populations

Affiliations

Phylogenetic analysis of multiprobe fluorescence in situ hybridization data from tumor cell populations

Salim Akhter Chowdhury et al. Bioinformatics. .

Abstract

Motivation: Development and progression of solid tumors can be attributed to a process of mutations, which typically includes changes in the number of copies of genes or genomic regions. Although comparisons of cells within single tumors show extensive heterogeneity, recurring features of their evolutionary process may be discerned by comparing multiple regions or cells of a tumor. A useful source of data for studying likely progression of individual tumors is fluorescence in situ hybridization (FISH), which allows one to count copy numbers of several genes in hundreds of single cells. Novel algorithms for interpreting such data phylogenetically are needed, however, to reconstruct likely evolutionary trajectories from states of single cells and facilitate analysis of tumor evolution.

Results: In this article, we develop phylogenetic methods to infer likely models of tumor progression using FISH copy number data and apply them to a study of FISH data from two cancer types. Statistical analyses of topological characteristics of the tree-based model provide insights into likely tumor progression pathways consistent with the prior literature. Furthermore, tree statistics from the resulting phylogenies can be used as features for prediction methods. This results in improved accuracy, relative to unstructured gene copy number data, at predicting tumor state and future metastasis.

Availability: Source code for software that does FISH tree building (FISHtrees) and the data on cervical and breast cancer examined here are available at ftp://ftp.ncbi.nlm.nih.gov/pub/FISHtrees.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Phylogenetic trees showing progression of (A) primary and (B) metastasis stage cervical cancer in patient 1. The trees are built from single cell-copy number data using the ploidyless heuristic approach implemented in FISHtrees. Each node in the trees represents a copy number profile of the four gene probes LAMP3, PROX1, PRKAA1 and CCND1, respectively. Nodes with solid borders represent cells present in the collected sample, while nodes with dotted borders represent inferred Steiner nodes. Green and red edges model gene gain and gene loss, respectively. The weight value on each edge connecting two nodes x and y is the rectilinear distance between the states of x and y. The weight on each node describes the fraction of cells in the sample with the particular copy number profile modeled by that node; Steiner nodes are assigned weight 0
Fig. 2.
Fig. 2.
P-values from χ2 tests comparing the number of descendants in the (A) eight children of the root in the primary tumor tree versus the metastasis tree in the same CC patient, (B) 16 children of the root in the DCIS tree versus the IDC tree in the same BC patient. The total number of (C) CC and (D) BC patients for which each bin for gain of oncogenes or loss of tumor suppressor genes shows significance in individual 2 × 2 χ2 tests
Fig. 3.
Fig. 3.
Increase and decrease in copy number count of LAMP3, PROX1, PRKAA1 and CCND1 (A and B) across 16 CC patients and COX-2, DBC2, MYC, CCND1, CDH1, p53, HER-2 and ZNF217 (C and D) genes across 13 BC patients. Copy number count is calculated using (A and C) average of cell count data and (B and D) net tree edge changes. The units on the x-axis differ in the two adjacent subfigures due to the different types of data used
Fig. 4.
Fig. 4.
Accuracy of tree-based versus cell-based features in classification tasks using an SVM classifier. Each chart shows accuracy of three tree-based and four cell-based feature sets on the three defined prediction tasks
Fig. 5.
Fig. 5.
Distribution of cells across different levels of tumor progression trees, counted for primary and metastatic trees separately
Fig. 6.
Fig. 6.
(A) Classification performance for particular subsets of features that show best prediction accuracy among all possible subsets on CC and BC datasets. (B) Sets of gene probes that show best classification accuracy

References

    1. Attolini CS-O, Michor F. Evolutionary theory of cancer. Ann. NY. Acad. Sci. 2009;1168:23–51. - PubMed
    1. Bandelt H-J, et al. Median-joining networks for inferring intraspecific phylogenies. Mol. Biol. Evol. 1999;16:37–48. - PubMed
    1. Beerenwinkel N, et al. Mtreemix: a software package for learning and using mixture models of mutagenetic trees. Bioinformatics. 2005;21:2106–2107. - PubMed
    1. Birchmeier W, Behrens J. Cadherin expression in carcinomas: role in the formation of cell junctions and the prevention of invasiveness. Biochim. Biophys. Acta. 1994;1198:11–26. - PubMed
    1. Bleyer A, Welch G. Effects of three decades of screening mammography on breast-cancer incidence. New Engl. J. Med. 2012;367:1998–2005. - PubMed

Publication types