Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jan 20:11:42.
doi: 10.1186/1471-2105-11-42.

Applying unmixing to gene expression data for tumor phylogeny inference

Affiliations

Applying unmixing to gene expression data for tumor phylogeny inference

Russell Schwartz et al. BMC Bioinformatics. .

Abstract

Background: While in principle a seemingly infinite variety of combinations of mutations could result in tumor development, in practice it appears that most human cancers fall into a relatively small number of "sub-types," each characterized a roughly equivalent sequence of mutations by which it progresses in different patients. There is currently great interest in identifying the common sub-types and applying them to the development of diagnostics or therapeutics. Phylogenetic methods have shown great promise for inferring common patterns of tumor progression, but suffer from limits of the technologies available for assaying differences between and within tumors. One approach to tumor phylogenetics uses differences between single cells within tumors, gaining valuable information about intra-tumor heterogeneity but allowing only a few markers per cell. An alternative approach uses tissue-wide measures of whole tumors to provide a detailed picture of averaged tumor state but at the cost of losing information about intra-tumor heterogeneity.

Results: The present work applies "unmixing" methods, which separate complex data sets into combinations of simpler components, to attempt to gain advantages of both tissue-wide and single-cell approaches to cancer phylogenetics. We develop an unmixing method to infer recurring cell states from microarray measurements of tumor populations and use the inferred mixtures of states in individual tumors to identify possible evolutionary relationships among tumor cells. Validation on simulated data shows the method can accurately separate small numbers of cell states and infer phylogenetic relationships among them. Application to a lung cancer dataset shows that the method can identify cell states corresponding to common lung tumor types and suggest possible evolutionary relationships among them that show good correspondence with our current understanding of lung tumor development.

Conclusions: Unmixing methods provide a way to make use of both intra-tumor heterogeneity and large probe sets for tumor phylogeny inference, establishing a new avenue towards the construction of detailed, accurate portraits of common tumor sub-types and the mechanisms by which they develop. These reconstructions are likely to have future value in discovering and diagnosing novel cancer sub-types and in identifying targets for therapeutic development.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Illustration of the geometric mixture model used in the present work. The image shows a hypothetical set of three mixture components (C1, C2, and C3) and two mixed samples (M1 and M2) produced from different mixtures of those components. The triangular simplex enclosed by the mixture components is shown with dashed lines. To the right are the matrices M, C, and F corresponding to the example data points.
Figure 2
Figure 2
Examples of mixture components inferred from simulated data sets. Green circles show the true mixture components, red points the simulated data points that serve as the input to the algorithms, and blue X's the inferred mixture components. (a) A uniform mixture of three independent components with no noise. Each data point is a mixture of all three components. Inferred mixture fractions for the three components, averaged over all points, are (0.295 0.367 0.339). (b) A tree-embedded mixture of three components with noise equal to signal. Each data point is a mixture of a root component (top, labeled 1) and one of two leaf components (bottom, labeled 2 and 3). The inset shows the phylogenetic tree in which the labeled components are embedded. Inferred mixture fractions averaged over points in the two branches of the simplex are (0.410 0.567 0.025) and (0.410 0.020 0.535) (c) A tree-embedded mixture of five components with 10% noise. Each data point contains a portion of the root component (bottom, labeled 1), a subset contain portions of one of two internal components (far left, labeled 2, and far right, labeled 4), and subsets of these contain portions of one of two leaf components (center left, labeled 3, and center right, labeled 5). The inset shows the phylogenetic tree in which the labeled components are embedded. Inferred mixture fractions averaged over points in the two branches of the simplex are (0.356 0.462 0.141 0.006 0.005) and (0.387 0.072 0.008 0.187 0.378).
Figure 3
Figure 3
Accuracy of methods in inferring simulated mixture components and assigning mixture fractions to data points. (a) Root mean square error in inferred mixture components as a function of noise level for uniform mixtures of k = 3 to k = 7 mixture components. (b) Root mean square error in fractional assignments of components to data points as a function of noise level for uniform mixtures of k = 3 to k = 7 mixture components. (c) Root mean square error in inferred mixture components as a function of noise level for tree-embedded mixtures of k = 3 to k = 7 mixture components. (d) Root mean square error in fractional assignments of components to data points as a function of noise level for tree-embedded mixtures of k = 3 to k = 7 mixture components.
Figure 4
Figure 4
Accuracy of tree inference on simulated tree-embedded data. The plot shows the fraction of true tree edges accurately inferred for k = 3 to k = 7 components as functions of noise levels.
Figure 5
Figure 5
Visualization of four-component unmixing results from the lung cancer data of Jones et al. [33]. (a) All components and tumor samples. Tumor samples appear as red points and components as blue X's labeled by numbers. (b-d) Three views of the same data with distinct clinical subtypes highlighted. Components appear as blue X's labeled by numbers. Tumors are marked as follows: normal lung tissue (black point), large cell carcinoma (blue star), carcinoid (cyan asterisk), adenocarcinoma (yellow circle), large cell neuroendocrine (green diamond), small cell primary tumors (red upward-pointing triangles), small cell cell lines (magenta downward-pointing triangles). The two primary combined small cell/adenocarcinoma samples were omitted from (b-d).
Figure 6
Figure 6
Phylogenies inferred on components derived from Jones et al. [33]. Each phylogeny shows nodes labeled with component numbers. We further manually added labels reflecting approximately which tumor types are most specifically labeled by a given component based on Tables 1 and 2: NOR (normal cells); LCC/AD (large cell carcinoma and adenocarcinoma); SCC (small cell); CA (carcinoid); CMB (combined small cell/adenocarcinoma) and NOR/SCC (normal and small cell). Edges with over 50% confidence are shown as solid lines while those between 10% and 50% confidence are shown as dashed lines. Edges with confidence below 10% are omitted. Edges are labeled by confidences rounded to the nearest percent. (a) Phylogeny derived from four mixture components. (b) Phylogeny derived from six mixture components.

Similar articles

Cited by

References

    1. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286:531–537. doi: 10.1126/science.286.5439.531. - DOI - PubMed
    1. Perou CM, Sorlie T, Eisen MB, Rijn M van der, Rees SSJCA, Pollack JR, Ross DT, Johnsen H, Akslen LA, Fluge O, Pergamenschikov A, Williams C, Zhu SX, Lønning PE, Børresen-Dale AL, Brown PO, Botstein D. Molecular portraits of human breast tumors. Nature. 2000;406:747–752. doi: 10.1038/35021093. - DOI - PubMed
    1. Sorlie T, Perrou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, Hastie T, Eisen MB, Rijn M van de, Jeffrey SS, Thorsen T, Quist H, Matese JC, Brown PO, Botstein D, Eystein Lønning P, Børresen-Dale AL. Gene expression profiles of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci USA. 2001;98:10869–10864. doi: 10.1073/pnas.191367098. - DOI - PMC - PubMed
    1. Sorlie T, Tibshirani R, Parker J, Hastie T, Marron JS, Nobel A, Deng S, Johnsen H, Pesich R, Geisler S, Demeter J, Perou CM, Lønning PE, Brown PO, Børresen-Dale AL, Botstein D. Repeated observation of breast tumor subtypes in indepednent gene expression data sets. Proc Natl Acad Sci USA. 2003;100:8418–8423. doi: 10.1073/pnas.0932692100. - DOI - PMC - PubMed
    1. Pegram MD, Konecny G, Slamon DJ. The molecular and cellular biology of HER2/neu gene amplification/overexpression and the clinical development of herceptin (trastuzumab) therapy for breast cancer. Cancer Treat Res. 2000;103:57–75. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources