Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Aug 1;29(15):1865-71.
doi: 10.1093/bioinformatics/btt301. Epub 2013 May 27.

DeMix: deconvolution for mixed cancer transcriptomes using raw measured data

Affiliations

DeMix: deconvolution for mixed cancer transcriptomes using raw measured data

Jaeil Ahn et al. Bioinformatics. .

Abstract

Motivation: Tissue samples of tumor cells mixed with stromal cells cause underdetection of gene expression signatures associated with cancer prognosis or response to treatment. In silico dissection of mixed cell samples is essential for analyzing expression data generated in cancer studies. Currently, a systematic approach is lacking to address three challenges in computational deconvolution: (i) violation of linear addition of expression levels from multiple tissues when log-transformed microarray data are used; (ii) estimation of both tumor proportion and tumor-specific expression, when neither is known a priori; and (iii) estimation of expression profiles for individual patients.

Results: We have developed a statistical method for deconvolving mixed cancer transcriptomes, DeMix, which addresses the aforementioned issues in array-based expression data. We demonstrate the performance of our model in synthetic and real, publicly available, datasets. DeMix can be applied to ongoing biomarker-based clinical studies and to the vast expression datasets previously generated from mixed tumor and stromal cell samples.

Availability: All codes are written in C and integrated into an R function, which is available at http://odin.mdacc.tmc.edu/∼wwang7/DeMix.html.

Contact: wwang7@mdanderson.org

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Simulation results for data scenario 1. Shown are estimates of biases, MSEs and IRMSEs based on the DeMix and LM, for mixed samples at assigned B-type tissue proportions (formula image) varying from 0.1 to 0.9. For IRMSE, we also present results from the NM model
Fig. 2.
Fig. 2.
Scatter plots of transcript abundance in mixed-sample expressions minus brain expressions (Y–N) versus pure liver expressions minus pure brain expressions (T–N) at two different mixing rates for (a) log-transformed data; (b) raw measured data
Fig. 3.
Fig. 3.
Estimation of proportions of hidden tissues from four available data sources. MAQC1: MAQC site 1, MAQC3: MAQC site 3, AFFY:Affymetrix, and GSE19830. (a) Estimated tissue proportions versus true proportions; black represents the DeMix estimates; gray represents LM estimates. (b) Estimated 95% confidence intervals of formula image's; solid lines correspond to true π's
Fig. 4.
Fig. 4.
Estimation of gene expression values of hidden tissues from GSE19830. (a) Heat map of expression values from selected genes across samples. The proportions of liver tissue are shown at the bottom. A total of 1323 genes were randomly selected. The samples, left to right, are 3 pure brain samples (observed), 12 liver-brain mixed samples (observed), 12 deconvolved liver samples (unobserved; estimated) and 3 pure liver samples (unobserved; used for comparison). (b) Scatter plots comparing deconvolved mean liver tissue expression levels with observed mean pure liver expression levels at four different mixture proportions of brain tissues
Fig. 5.
Fig. 5.
Detection of DE genes. Total number of identified differentially expressed genes at varying P-value cut-offs (Bonferroni-corrected), using different models. The third curve from the top corresponds to comparison between the three pure brain and the three pure liver tissue samples

References

    1. Abbas AR, et al. Deconvolution of blood microarray data identifies cellular activation patterns in systemic lupus erythematosus. PLoS One. 2009;4:e6098. - PMC - PubMed
    1. Carvalho B, et al. Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data. Biostatistics. 2007;8:485–499. - PubMed
    1. Clarke J, et al. Statistical expression deconvolution from mixed tissue samples. Bioinformatics. 2010;26:1043–1049. - PMC - PubMed
    1. Efron B. Bootstrap methods: Another look at the jackknife. Ann. Stat. 1979;7:1–26.
    1. Emmert-Buck MR, et al. Laser capture microdissection. Science. 1996;274:998–1001. - PubMed

Publication types