Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jun;77(2):424-438.
doi: 10.1111/biom.13307. Epub 2020 Jun 5.

A novel statistical method for modeling covariate effects in bisulfite sequencing derived measures of DNA methylation

Affiliations

A novel statistical method for modeling covariate effects in bisulfite sequencing derived measures of DNA methylation

Kaiqiong Zhao et al. Biometrics. 2021 Jun.

Abstract

Identifying disease-associated changes in DNA methylation can help us gain a better understanding of disease etiology. Bisulfite sequencing allows the generation of high-throughput methylation profiles at single-base resolution of DNA. However, optimally modeling and analyzing these sparse and discrete sequencing data is still very challenging due to variable read depth, missing data patterns, long-range correlations, data errors, and confounding from cell type mixtures. We propose a regression-based hierarchical model that allows covariate effects to vary smoothly along genomic positions and we have built a specialized EM algorithm, which explicitly allows for experimental errors and cell type mixtures, to make inference about smooth covariate effects in the model. Simulations show that the proposed method provides accurate estimates of covariate effects and captures the major underlying methylation patterns with excellent power. We also apply our method to analyze data from rheumatoid arthritis patients and controls. The method has been implemented in R package SOMNiBUS.

Keywords: EM algorithm; differentially methylated region; generalized additive model; next-generation sequencing; penalized regression splines.

PubMed Disclaimer

Figures

FIGURE 1
FIGURE 1
(A), The estimates (solid lines) and 95% pointwise confidence intervals (dashed lines) of the intercept, the smooth effect of RA and cell type (T cells versus monocytes) on methylation levels. (B), The predicted methylation levels in the logit scale (left) and proportion scale (right) for the four groups of samples with different disease and cell type status. The region‐based P‐values for the effect of RA status and T cell type are calculated as 1.11E16 and 6.37E218, respectively
FIGURE 2
FIGURE 2
The 14 simulation settings of methylation parameters π(t) in Scenario 2. Methylation parameters for samples with Z=1 (dotted‐dashed black curve) are fixed across settings, whereas the methylation parameters for samples from group Z=0 (solid gray lines) vary across simulations corresponding to different degrees of closeness between methylation patterns in the two groups
FIGURE 3
FIGURE 3
Estimates of smooth covariate effects (gray) over the 100 simulations in Scenario 1, using SOMNiBUS. The black curves are the true functional parameters used to generate the data. Data with sample size N=40 were generated with error
FIGURE 4
FIGURE 4
Coverage probability of confidence intervals over 1000 simulations under different sample sizes (N=40,100,150,400). Data were generated with error, under simulation Scenario 1
FIGURE 5
FIGURE 5
Quantile‐Quantile (Q‐Q) plots of the region‐based P‐values for the null covariate Z 3, obtained from the six methods, over 1000 simulations. Data were generated without error with a range of sample sizes (N=40,100,150,400), under simulation Scenario 1. Here, the Expected P‐values are uniformly distributed numbers, equal to =(1/1001,2/1001,,1000/1001).
FIGURE 6
FIGURE 6
Powers to detect DMRs using the six methods for the 14 simulation settings in Scenario 2 under different levels of maximum deviation between π0(t) and π1(t), calculated over 100 simulations. (Sample size N=100).

References

    1. Allum, F., Shao, X., Guénard, F., Simon, M.‐M., Busche, S., Caron, M., Lambourne, J., Lessard, J., Tandre, K., Hedman, A.K., Kwan, T., Ge, B., Rönnblom, L., McCarthy, M.I., Deloukas, P., Richmond, T., Burgess, D., Spector, T.D., Tchernof, A., Marceau, S., Lathrop, M., Vohl, M.‐C., Pastinen, T., Grundberg, E. and Multiple Tissue Human Expression Resource Consortium , (2015) Characterization of functional methylomes by next‐generation capture sequencing identifies novel disease‐associated variants. Nature Communications, 6(1), 1–12. - PMC - PubMed
    1. Cheng, L. and Zhu, Y. (2013) A classification approach for DNA methylation profiling with bisulfite next‐generation sequencing data. Bioinformatics, 30(2), 172–179. - PubMed
    1. De Jager, P.L., Srivastava, G., Lunnon, K., Burgess, J., Schalkwyk, L.C., Yu, L., Eaton, M.L., Keenan, B.T., Ernst, J., McCabe, C., Tang, A., Raj, T., Replogle, J., Brodeur, W., Gabriel, S., Chai, H.S., Younkin, C., Younkin, S.G., Zou, F., Szyf, M., Epstein, C.B., Schneider, J.A., Bernstein, B.E., Meissner, A., Ertekin‐Taner, N., Chibnik, L.B., Kellis, M., Mill, J. and Bennett, D.A. (2014) Alzheimer's disease: early alterations in brain DNA methylation at ANK1, BIN1, RHBDF2 and other loci. Nature Neuroscience, 17(9), 1156–1163. - PMC - PubMed
    1. Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1–22.
    1. Feinberg, A.P. (2007) Phenotypic plasticity and the epigenetics of human disease. Nature, 447(7143), 433–440. - PubMed

Publication types