. 2010 Jan 20:11:42.

doi: 10.1186/1471-2105-11-42.

Applying unmixing to gene expression data for tumor phylogeny inference

Russell Schwartz¹, Stanley E Shackney

Affiliations

PMID: 20089185
PMCID: PMC2823708
DOI: 10.1186/1471-2105-11-42

Applying unmixing to gene expression data for tumor phylogeny inference

Russell Schwartz et al. BMC Bioinformatics. 2010.

. 2010 Jan 20:11:42.

doi: 10.1186/1471-2105-11-42.

Authors

Russell Schwartz¹, Stanley E Shackney

Affiliation

¹ Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA USA. russells@andrew.cmu.edu

PMID: 20089185
PMCID: PMC2823708
DOI: 10.1186/1471-2105-11-42

Abstract

Background: While in principle a seemingly infinite variety of combinations of mutations could result in tumor development, in practice it appears that most human cancers fall into a relatively small number of "sub-types," each characterized a roughly equivalent sequence of mutations by which it progresses in different patients. There is currently great interest in identifying the common sub-types and applying them to the development of diagnostics or therapeutics. Phylogenetic methods have shown great promise for inferring common patterns of tumor progression, but suffer from limits of the technologies available for assaying differences between and within tumors. One approach to tumor phylogenetics uses differences between single cells within tumors, gaining valuable information about intra-tumor heterogeneity but allowing only a few markers per cell. An alternative approach uses tissue-wide measures of whole tumors to provide a detailed picture of averaged tumor state but at the cost of losing information about intra-tumor heterogeneity.

Results: The present work applies "unmixing" methods, which separate complex data sets into combinations of simpler components, to attempt to gain advantages of both tissue-wide and single-cell approaches to cancer phylogenetics. We develop an unmixing method to infer recurring cell states from microarray measurements of tumor populations and use the inferred mixtures of states in individual tumors to identify possible evolutionary relationships among tumor cells. Validation on simulated data shows the method can accurately separate small numbers of cell states and infer phylogenetic relationships among them. Application to a lung cancer dataset shows that the method can identify cell states corresponding to common lung tumor types and suggest possible evolutionary relationships among them that show good correspondence with our current understanding of lung tumor development.

Conclusions: Unmixing methods provide a way to make use of both intra-tumor heterogeneity and large probe sets for tumor phylogeny inference, establishing a new avenue towards the construction of detailed, accurate portraits of common tumor sub-types and the mechanisms by which they develop. These reconstructions are likely to have future value in discovering and diagnosing novel cancer sub-types and in identifying targets for therapeutic development.

PubMed Disclaimer

Figures

**Figure 1**
**Illustration of the geometric mixture model used in the present work**. The image shows a hypothetical set of three mixture components (C₁, C₂, and C₃) and two mixed samples (M₁and M₂) produced from different mixtures of those components. The triangular simplex enclosed by the mixture components is shown with dashed lines. To the right are the matrices M, C, and F corresponding to the example data points.

**Figure 2**
**Examples of mixture components inferred from simulated data sets**. Green circles show the true mixture components, red points the simulated data points that serve as the input to the algorithms, and blue X's the inferred mixture components. (a) A uniform mixture of three independent components with no noise. Each data point is a mixture of all three components. Inferred mixture fractions for the three components, averaged over all points, are (0.295 0.367 0.339). (b) A tree-embedded mixture of three components with noise equal to signal. Each data point is a mixture of a root component (top, labeled 1) and one of two leaf components (bottom, labeled 2 and 3). The inset shows the phylogenetic tree in which the labeled components are embedded. Inferred mixture fractions averaged over points in the two branches of the simplex are (0.410 0.567 0.025) and (0.410 0.020 0.535) (c) A tree-embedded mixture of five components with 10% noise. Each data point contains a portion of the root component (bottom, labeled 1), a subset contain portions of one of two internal components (far left, labeled 2, and far right, labeled 4), and subsets of these contain portions of one of two leaf components (center left, labeled 3, and center right, labeled 5). The inset shows the phylogenetic tree in which the labeled components are embedded. Inferred mixture fractions averaged over points in the two branches of the simplex are (0.356 0.462 0.141 0.006 0.005) and (0.387 0.072 0.008 0.187 0.378).

**Figure 3**
**Accuracy of methods in inferring simulated mixture components and assigning mixture fractions to data points**. (a) Root mean square error in inferred mixture components as a function of noise level for uniform mixtures of k = 3 to k = 7 mixture components. (b) Root mean square error in fractional assignments of components to data points as a function of noise level for uniform mixtures of k = 3 to k = 7 mixture components. (c) Root mean square error in inferred mixture components as a function of noise level for tree-embedded mixtures of k = 3 to k = 7 mixture components. (d) Root mean square error in fractional assignments of components to data points as a function of noise level for tree-embedded mixtures of k = 3 to k = 7 mixture components.

**Figure 4**
**Accuracy of tree inference on simulated tree-embedded data**. The plot shows the fraction of true tree edges accurately inferred for k = 3 to k = 7 components as functions of noise levels.

**Figure 5**
**Visualization of four-component unmixing results from the lung cancer data of Jones et al.** [33]. (a) All components and tumor samples. Tumor samples appear as red points and components as blue X's labeled by numbers. (b-d) Three views of the same data with distinct clinical subtypes highlighted. Components appear as blue X's labeled by numbers. Tumors are marked as follows: normal lung tissue (black point), large cell carcinoma (blue star), carcinoid (cyan asterisk), adenocarcinoma (yellow circle), large cell neuroendocrine (green diamond), small cell primary tumors (red upward-pointing triangles), small cell cell lines (magenta downward-pointing triangles). The two primary combined small cell/adenocarcinoma samples were omitted from (b-d).

**Figure 6**
**Phylogenies inferred on components derived from Jones et al**. [33]. Each phylogeny shows nodes labeled with component numbers. We further manually added labels reflecting approximately which tumor types are most specifically labeled by a given component based on Tables 1 and 2: NOR (normal cells); LCC/AD (large cell carcinoma and adenocarcinoma); SCC (small cell); CA (carcinoid); CMB (combined small cell/adenocarcinoma) and NOR/SCC (normal and small cell). Edges with over 50% confidence are shown as solid lines while those between 10% and 50% confidence are shown as dashed lines. Edges with confidence below 10% are omitted. Edges are labeled by confidences rounded to the nearest percent. (a) Phylogeny derived from four mixture components. (b) Phylogeny derived from six mixture components.

See this image and copyright information in PMC

Cited by

Reconstructing tumor clonal lineage trees incorporating single-nucleotide variants, copy number alterations and structural variations.
Fu X, Lei H, Tao Y, Schwartz R. Fu X, et al. Bioinformatics. 2022 Jun 24;38(Suppl 1):i125-i133. doi: 10.1093/bioinformatics/btac253. Bioinformatics. 2022. PMID: 35758777 Free PMC article.
Semi-deconvolution of bulk and single-cell RNA-seq data with application to metastatic progression in breast cancer.
Lei H, Guo XA, Tao Y, Ding K, Fu X, Oesterreich S, Lee AV, Schwartz R. Lei H, et al. Bioinformatics. 2022 Jun 24;38(Suppl 1):i386-i394. doi: 10.1093/bioinformatics/btac262. Bioinformatics. 2022. PMID: 35758822 Free PMC article.
CTen: a web-based platform for identifying enriched cell types from heterogeneous microarray data.
Shoemaker JE, Lopes TJ, Ghosh S, Matsuoka Y, Kawaoka Y, Kitano H. Shoemaker JE, et al. BMC Genomics. 2012 Sep 6;13:460. doi: 10.1186/1471-2164-13-460. BMC Genomics. 2012. PMID: 22953731 Free PMC article.
Archetypal analysis of diverse Pseudomonas aeruginosa transcriptomes reveals adaptation in cystic fibrosis airways.
Thøgersen JC, Mørup M, Damkiær S, Molin S, Jelsbak L. Thøgersen JC, et al. BMC Bioinformatics. 2013 Sep 23;14:279. doi: 10.1186/1471-2105-14-279. BMC Bioinformatics. 2013. PMID: 24059747 Free PMC article.
Neural Network Deconvolution Method for Resolving Pathway-Level Progression of Tumor Clonal Expression Programs With Application to Breast Cancer Brain Metastases.
Tao Y, Lei H, Lee AV, Ma J, Schwartz R. Tao Y, et al. Front Physiol. 2020 Sep 4;11:1055. doi: 10.3389/fphys.2020.01055. eCollection 2020. Front Physiol. 2020. PMID: 33013452 Free PMC article.

See all "Cited by" articles

References

1. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286:531–537. doi: 10.1126/science.286.5439.531. - DOI - PubMed
1. Perou CM, Sorlie T, Eisen MB, Rijn M van der, Rees SSJCA, Pollack JR, Ross DT, Johnsen H, Akslen LA, Fluge O, Pergamenschikov A, Williams C, Zhu SX, Lønning PE, Børresen-Dale AL, Brown PO, Botstein D. Molecular portraits of human breast tumors. Nature. 2000;406:747–752. doi: 10.1038/35021093. - DOI - PubMed
1. Sorlie T, Perrou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, Hastie T, Eisen MB, Rijn M van de, Jeffrey SS, Thorsen T, Quist H, Matese JC, Brown PO, Botstein D, Eystein Lønning P, Børresen-Dale AL. Gene expression profiles of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci USA. 2001;98:10869–10864. doi: 10.1073/pnas.191367098. - DOI - PMC - PubMed
1. Sorlie T, Tibshirani R, Parker J, Hastie T, Marron JS, Nobel A, Deng S, Johnsen H, Pesich R, Geisler S, Demeter J, Perou CM, Lønning PE, Brown PO, Børresen-Dale AL, Botstein D. Repeated observation of breast tumor subtypes in indepednent gene expression data sets. Proc Natl Acad Sci USA. 2003;100:8418–8423. doi: 10.1073/pnas.0932692100. - DOI - PMC - PubMed
1. Pegram MD, Konecny G, Slamon DJ. The molecular and cellular biology of HER2/neu gene amplification/overexpression and the clinical development of herceptin (trastuzumab) therapy for breast cancer. Cancer Treat Res. 2000;103:57–75. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Applying unmixing to gene expression data for tumor phylogeny inference

Affiliation

Applying unmixing to gene expression data for tumor phylogeny inference

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources