Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 May 15;30(10):1431-9.
doi: 10.1093/bioinformatics/btu029. Epub 2014 Jan 21.

Reference-free cell mixture adjustments in analysis of DNA methylation data

Affiliations

Reference-free cell mixture adjustments in analysis of DNA methylation data

Eugene Andres Houseman et al. Bioinformatics. .

Abstract

Motivation: Recently there has been increasing interest in the effects of cell mixture on the measurement of DNA methylation, specifically the extent to which small perturbations in cell mixture proportions can register as changes in DNA methylation. A recently published set of statistical methods exploits this association to infer changes in cell mixture proportions, and these methods are presently being applied to adjust for cell mixture effect in the context of epigenome-wide association studies. However, these adjustments require the existence of reference datasets, which may be laborious or expensive to collect. For some tissues such as placenta, saliva, adipose or tumor tissue, the relevant underlying cell types may not be known.

Results: We propose a method for conducting epigenome-wide association studies analysis when a reference dataset is unavailable, including a bootstrap method for estimating standard errors. We demonstrate via simulation study and several real data analyses that our proposed method can perform as well as or better than methods that make explicit use of reference datasets. In particular, it may adjust for detailed cell type differences that may be unavailable even in existing reference datasets.

Availability and implementation: Software is available in the R package RefFreeEWAS. Data for three of four examples were obtained from Gene Expression Omnibus (GEO), accession numbers GSE37008, GSE42861 and GSE30601, while reference data were obtained from GEO accession number GSE39981.

Contact: andres.houseman@oregonstate.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Simulation 1: estimated effect by true effect. Comparison of slope estimates: true direct effect (formula image) versus its estimate (formula image), true direct effect versus the SVA-adjusted estimate and true direct effect (formula image) versus the unadjusted effect (formula image). Squares indicate DMRs. Red indicates non-null CpGs. Black squares represent non-null DMRs
Fig. 2.
Fig. 2.
Simulation 1: bootstrap standard error by simulation standard deviation for formula image. To increase legibility of the plot, SE estimates for two CpGs producing extreme bias have been moved to the left, as indicated
Fig. 3.
Fig. 3.
Arthritis dataset: volcano plots. Volcano plots for formula image unadjusted for leukocyte composition, adjusted using the reference-based method that adjusts for six estimated cell type proportions and adjusted using the proposed reference-free method with d = 20. Red indicates 387 leukocyte DMRs (overlap between 450K array and 500 CpGs published by Houseman et al., 2012)
Fig. 4.
Fig. 4.
Arthritis dataset: reference-based versus reference-free. comparison of reference-free coefficient estimates formula image with the corresponding reference-based estimates
Fig. 5.
Fig. 5.
Placenta dataset: volcano plots. Volcano plots for formula image unadjusted for leukocyte composition and adjusted using the proposed reference-free method with d = 12
Fig. 6.
Fig. 6.
Placenta dataset: comparison of reference-free adjustment with unadjusted and SVA-Adjusted. Comparison of reference-free coefficient estimates formula image with the corresponding unadjusted and SVA-adjusted estimates

References

    1. Adalsteinsson BT, et al. Heterogeneity in white blood cells has potential to confound DNA methylation measurements. PLoS One. 2012;7:e46705. - PMC - PubMed
    1. Banister CE, et al. Infant growth restriction is associated with distinct patterns of DNA methylation in human placentas. Epigenetics. 2011;6:920–927. - PMC - PubMed
    1. Baron U, et al. DNA methylation analysis as a tool for cell typing. Epigenetics. 2006;1:55–60. - PubMed
    1. Bock C. Analysing and interpreting DNA methylation data. Nat. Rev. Genet. 2012;13:705–719. - PubMed
    1. Bracken AP, et al. Genome-wide mapping of Polycomb target genes unravels their roles in cell fate transitions. Genes Dev. 2006;20:1123–1136. - PMC - PubMed

Publication types

Associated data