Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 May 15;34(10):1642-1649.
doi: 10.1093/bioinformatics/bty011.

Tumor purity quantification by clonal DNA methylation signatures

Affiliations

Tumor purity quantification by clonal DNA methylation signatures

Matteo Benelli et al. Bioinformatics. .

Abstract

Motivation: Controlling for tumor purity in molecular analyses is essential to allow for reliable genomic aberration calls, for inter-sample comparison and to monitor heterogeneity of cancer cell populations. In genome wide screening studies, the assessment of tumor purity is typically performed by means of computational methods that exploit somatic copy number aberrations.

Results: We present a strategy, called Purity Assessment from clonal MEthylation Sites (PAMES), which uses the methylation level of a few dozen, highly clonal, tumor type specific CpG sites to estimate the purity of tumor samples, without the need of a matched benign control. We trained and validated our method in more than 6000 samples from different datasets. Purity estimates by PAMES were highly concordant with other state-of-the-art tools and its evaluation in a cancer cell line dataset highlights its reliability to accurately estimate tumor admixtures. We extended the capability of PAMES to the analysis of CpG islands instead of the more platform-specific CpG sites and demonstrated its accuracy in a set of advanced tumors profiled by high throughput DNA methylation sequencing. These analyses show that PAMES is a valuable tool to assess the purity of tumor samples in the settings of clinical research and diagnostics.

Availability and implementation: https://github.com/cgplab/PAMES.

Contact: matteo.benelli@uslcentro.toscana.it or f.demichelis@unitn.it.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Schematic of PAMES workflow. PAMES identifies a set of tumor specific, highly clonal, CpG sites or islands (informative sites) through the differential analysis of DNA methylation levels in tumor versus normal samples (using β difference and the AUC). The β values of the selected sites are considered as optimal estimators of the admixture of tumor cells in each sample (tumor purity)
Fig. 2.
Fig. 2.
DNA Methylation is shared within and between cancer types. (A) Scatter plot of AUC values versus β-differences for the TCGA BLCA dataset. β-differences are computed as differences between β values in tumor samples and averaged (mean) β-values in normal samples. The local regression analysis with LOESS is reported (red line). Pearson’s correlation coefficient is significant (P < 0.05). (B) Box plots report the distributions of the per sample fraction of supporting events (PFSE) for both hyper- (red) and hypo- (green) methylation in each cancer type. (C) Bar plots show the frequencies of the most recurrent (n = 2) genomic alterations for both single nucleotide variants (SNVs, green) and somatic copy number alterations (SCNAs, purple). For each tumor type, altered genes or genomic regions are reported above the corresponding bar; top to bottom terms correspond to left to right bars. (D) heat map and Ward’s hierarchical clustering using Euclidean as distance measure of the Pearson’s correlation coefficients of AUC scores of the differential methylation sites among the 14 tumor types
Fig. 3.
Fig. 3.
Purity estimates in TCGA datasets. (A) Correlation coefficient R and RMSD of purity estimates computed with different number of informative and random sites. Lines represent local regression (LOESS). (B) Functional enrichment analysis of the top informative sites (N = 140) from all tumor types, using the 15-state ENCODE model of the functional genome. (C) Heat map and Ward’s hierarchical clustering using Euclidean as distance measure of the Pearson’s correlation coefficients of purity estimates of the seven state-of-the-art methods in BLCA. The row annotation refers to the different strategy to estimate purity. (D) Correlation of PAMES and InfiniumPurify purity estimates on the TCGA tumor samples. R and RMSD refer to the Pearson’ s correlation coefficient and root mean square deviation averaged (mean) across the 14 tumor types
Fig. 4.
Fig. 4.
Accuracy evaluation of the purity estimates. (A) Box plots of PAMES and InfiniumPurify purity estimates on the cancer cell line dataset from Iorio et al., 2016. (B) Scatter plot of PAMES and InfiniumPurify purity estimates on cancer cell line dataset from Iorio et al., 2016. (C) AUC values of PAMES (blue) and InfiniumPurify (red) across the 14 tumor types, using cancer cell lines as positive events and normal samples from TCGA and benign cell lines as negative events
Fig. 5.
Fig. 5.
Platform independent version of PAMES. (A) Density plot of sample purities for all cancer types estimated using β values from informative sites and beta values obtained through CpG island transformation. (B) Plot of the purity estimates from PAMES (y-axis) versus CLONET (x-axis) on the eRRBS data of metastatic prostate cancer from Beltran et al., 2016

Similar articles

Cited by

References

    1. Aran D. et al. (2015) Systematic pan-cancer analysis of tumour purity. Nature Commun., 6, 8971.. - PMC - PubMed
    1. Beltran H. et al. (2016) Divergent clonal evolution of castration-resistant neuroendocrine prostate cancer. Nat. Med., 22, 298–305. - PMC - PubMed
    1. Board R.E. et al. (2008) DNA methylation in circulating tumour DNA as a biomarker for cancer. Biomarker Insights, 2, 307–319. - PMC - PubMed
    1. Carter S.L. et al. (2012) Absolute quantification of somatic DNA alterations in human cancer. Nat. Biotechnol., 30, 413–421. - PMC - PubMed
    1. Cerami E. et al. (2012) The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov., 2, 401–404. - PMC - PubMed

Publication types