Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov 24;18(1):178.
doi: 10.1186/s12915-020-00910-4.

Deconvolution of cellular subsets in human tissue based on targeted DNA methylation analysis at individual CpG sites

Affiliations

Deconvolution of cellular subsets in human tissue based on targeted DNA methylation analysis at individual CpG sites

Marco Schmidt et al. BMC Biol. .

Abstract

Background: The complex composition of different cell types within a tissue can be estimated by deconvolution of bulk gene expression profiles or with various single-cell sequencing approaches. Alternatively, DNA methylation (DNAm) profiles have been used to establish an atlas for multiple human tissues and cell types. DNAm is particularly suitable for deconvolution of cell types because each CG dinucleotide (CpG site) has only two states per DNA strand-methylated or non-methylated-and these epigenetic modifications are very consistent during cellular differentiation. So far, deconvolution of DNAm profiles implies complex signatures of many CpGs that are often measured by genome-wide analysis with Illumina BeadChip microarrays. In this study, we investigated if the characterization of cell types in tissue is also feasible with individual cell type-specific CpG sites, which can be addressed by targeted analysis, such as pyrosequencing.

Results: We compiled and curated 579 Illumina 450k BeadChip DNAm profiles of 14 different non-malignant human cell types. A training and validation strategy was applied to identify and test for cell type-specific CpGs. We initially focused on estimating the relative amount of fibroblasts using two CpGs that were either hypermethylated or hypomethylated in fibroblasts. The combination of these two DNAm levels into a "FibroScore" correlated with the state of fibrosis and was associated with overall survival in various types of cancer. Furthermore, we identified hypomethylated CpGs for leukocytes, endothelial cells, epithelial cells, hepatocytes, glia, neurons, fibroblasts, and induced pluripotent stem cells. The accuracy of this eight CpG signature was tested in additional BeadChip datasets of defined cell mixtures and the results were comparable to previously published signatures based on several thousand CpGs. Finally, we established and validated pyrosequencing assays for the relevant CpGs that can be utilized for classification and deconvolution of cell types.

Conclusion: This proof of concept study demonstrates that DNAm analysis at individual CpGs reflects the cellular composition of cellular mixtures and different tissues. Targeted analysis of these genomic regions facilitates robust methods for application in basic research and clinical settings.

Keywords: Cancer; Cell types; CpG; DNA methylation; Deconvolution; Epigenetic; Fibrosis; Human; NNLS; Pyrosequencing.

PubMed Disclaimer

Conflict of interest statement

W.W. is cofounder of Cygenia GmbH that can provide service for analysis of epigenetic signatures (www.cygenia.com). Apart from that, the authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Selection of cell type-specific CpGs for fibroblasts. a Multidimensional scaling (MDS) plot of the training data set (n = 409) demonstrates that samples cluster by cell type across different studies. All CpGs shared between the 450K and EPIC BeadChip were considered (except XY chromosomes). b Differential mean DNAm levels of fibroblasts/MSCs versus all other cell types were plotted against the sum of variances within both groups. The CpGs, which have been selected for the FibroScore, are indicated. c DNAm levels (β values) of the two selected CpGs of the FibroScore in the training set. Numbers correspond to classification accuracy in percentage values. d DNAm levels of the two selected CpGs and the FibroScore for the validation set. Only muscle stem cells, which might closely resemble MSCs, were classified with fibroblasts/MSCs. Numbers correspond to classification accuracy in percentage values. e DNAm levels of the two selected CpGs and the FibroScore as determined by pyrosequencing in samples of different cell types. Almost all cell preparations (with exception of the HaCat cell line) were classified correctly. f The FibroScore is significantly higher in lung fibrosis versus healthy control tissue (GSE63704; 450K data) [62]. ***p < 0.001. g The FibroScore is significantly higher in liver cirrhosis versus healthy control tissue (GSE60753; 450K data) [29]. *p < 0.05
Fig. 2
Fig. 2
The FibroScore is associated with overall survival in several types of cancer. Hazards ratios from Cox proportional hazards models for datasets from The Cancer Genome Atlas (TCGA). Depicted are six types of cancer for which there is a significant difference in overall survival for patients with either high or low FibroScore. Unless specified otherwise, models take into account sex, age, tumor stage, and the FibroScore stratified by the median (450K BeadChip data). If some of these parameters were not available, we indicated missing cofactors next to the reference: s = sex, a = age, and n = stage
Fig. 3
Fig. 3
Cell type-specific CpG sites are preferentially hypomethylated. a Selection of cell type-specific CpGs for leukocytes, endothelial cells, epithelial cells, fibroblasts/MSCs, glia, hepatocytes, neurons, and iPSCs. The difference of mean β values of each cell type versus all other cell types was plotted against the sum of variances within both groups. CpGs for subsequent deconvolution are highlighted. b DNAm levels of the eight selected CpGs in the training, validation, and pyrosequencing datasets. The vast majority of samples revealed the expected cell type-specific hypomethylation, albeit pyrosequencing of liver cell lines (Hep3B and HuH-7) did not reveal hypomethylation at cg27197524 as expected for primary cells. Glia and neuron samples were not available for pyrosequencing. Numbers correspond to classification accuracy in percentage values
Fig. 4
Fig. 4
Deconvolution of cell mixtures based on individual cell type-specific CpGs. a Heatmap of mean β values of the reference matrix (450K data of the training set), which is used for deconvolution. b Deconvolution of in vitro neuron-glia-DNA-mixes from dataset GSE41826 [68]. The predicted cell fractions by our NNLS-based deconvolution with eight CpGs are depicted. c Deconvolution of eight different in vitro DNA mixes from dataset GSE122126 [7]. The real composition of DNA fractions is plotted next to the predictions by the signatures of Moss et al. (estimates for leukocyte subsets, epithelial cells, and others were combined). The estimates with our NNLS model closely resembled the DNA mixtures of different cell types. Data for DNA mix 4 was lacking one of the eight CpGs and was therefore excluded. d Deconvolution of in vitro DNA mixes measured with pyrosequencing. Five different mixes of five different cell types in different proportions were measured at the eight different sites. Shown are mixed versus estimated cellular fractions with our NNLS-based deconvolution

Similar articles

Cited by

References

    1. Regev A, Teichmann SA, Lander ES, Amit I, Benoist C, Birney E, et al. The human cell atlas. Elife. 2017;6:e27041. - PMC - PubMed
    1. Avila Cobos F, Vandesompele J, Mestdagh P, De Preter K. Computational deconvolution of transcriptomics data from mixed cell populations. Bioinformatics. 2018;34(11):1969–1979. doi: 10.1093/bioinformatics/bty019. - DOI - PubMed
    1. Kuhn A, Kumar A, Beilina A, Dillman A, Cookson MR, Singleton AB. Cell population-specific expression analysis of human cerebellum. BMC Genomics. 2012;13:610. doi: 10.1186/1471-2164-13-610. - DOI - PMC - PubMed
    1. Roy AL, Conroy RS. Toward mapping the human body at a cellular resolution. Mol Biol Cell. 2018;29(15):1779–1785. doi: 10.1091/mbc.E18-04-0260. - DOI - PMC - PubMed
    1. Houseman EA, Accomando WP, Koestler DC, Christensen BC, Marsit CJ, Nelson HH, et al. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinformatics. 2012;13:86. doi: 10.1186/1471-2105-13-86. - DOI - PMC - PubMed

Publication types

LinkOut - more resources