Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jun 7;19(1):444.
doi: 10.1186/s12864-018-4641-x.

METHimpute: imputation-guided construction of complete methylomes from WGBS data

Affiliations

METHimpute: imputation-guided construction of complete methylomes from WGBS data

Aaron Taudt et al. BMC Genomics. .

Abstract

Background: Whole-genome bisulfite sequencing (WGBS) has become the standard method for interrogating plant methylomes at base resolution. However, deep WGBS measurements remain cost prohibitive for large, complex genomes and for population-level studies. As a result, most published plant methylomes are sequenced far below saturation, with a large proportion of cytosines having either missing data or insufficient coverage.

Results: Here we present METHimpute, a Hidden Markov Model (HMM) based imputation algorithm for the analysis of WGBS data. Unlike existing methods, METHimpute enables the construction of complete methylomes by inferring the methylation status and level of all cytosines in the genome regardless of coverage. Application of METHimpute to maize, rice and Arabidopsis shows that the algorithm infers cytosine-resolution methylomes with high accuracy from data as low as 6X, compared to data with 60X, thus making it a cost-effective solution for large-scale studies.

Conclusions: METHimpute provides methylation status calls and levels for all cytosines in the genome regardless of coverage, thus yielding complete methylomes even with low-coverage WGBS datasets. The method has been extensively tested in plants, but should also be applicable to other species. An implementation is available on Bioconductor.

Keywords: Hidden Markov Model; Imputation; Methylation; Whole-genome bisulfite sequencing.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Coverage distributions. a-c Percentage of cytosines with X coverage (strand-specific). d-f Percentage of cytosines with missing data (red) and “uninformative” coverage (green), defined as less than three reads
Fig. 2
Fig. 2
Conceptual overview of METHimpute. a Cytosines on the sequenced genome are assumed to be either unmethylated or methylated. b Bisulphite-sequencing and alignment yields methylation levels for each cytosine, i.e. the number of reads showing methylation divided by the total number of reads. c Emission densities for each state are obtained with a binomial test with state-specific parameters. Note that "imputed" cytosines, i.e. cytosines without any reads, are treated identically as all other cytosines. However, since the emission densities for all states are 1 for imputed cytosines, the methylation status call is purely driven by the neighborhood of cytosines. d Model fitting yields posterior probabilities for methylation status calls. e Inferred methylation status calls and methylation levels
Fig. 3
Fig. 3
Maximum posterior distributions for imputed cytosines (coverage = 0), uninformative cytosines (coverage = 1 or 2) and informative cytosines (coverage ≥3). For Arabidopsis (a), rice (b) and maize (c), for each context. The figure shows the distributions of the maximum posterior probabilities with density on the y-axis and the maximum posterior probability on x-axis. The maximum posterior probability, i.e. the confidence in the methylation status calls, is generally lower for sites with less coverage
Fig. 4
Fig. 4
Enrichment profiles for genes (left panels) and transposable elements or repeats (right panels) for Arabidopsis (a, b), rice (c, d) and maize (e, f), for each context. Sub-panels show the enrichment profiles for imputed (coverage = 0), uninformative (coverage = 1 or 2) and informative cytosines (coverage ≥3). See the “Methods” section for definition of the recalibrated methylation level
Fig. 5
Fig. 5
Saturation analysis. a-c F1-score for METHimpute and the binomial test, compared to the full sample, respectively. The F1-score is the harmonic mean of precision and recall. d-f Proportion of imputed cytosines. g-i Proportion of the genome in each state. The x-axis shows the average strand-specific coverage per cytosine

References

    1. Feng S, Cokus SJ, Zhang X, Chen P-Y, Bostick M, Goll MG, Hetzel J, Jain J, Strauss SH, Halpern ME, Ukomadu C, Sadler KC, Pradhan S, Pellegrini M, Jacobsen SE. Conservation and divergence of methylation patterning in plants and animals. Proc Natl Acad Sci. 2010;107(19):8689–94. doi: 10.1073/pnas.1002720107. - DOI - PMC - PubMed
    1. Zemach A, McDaniel IE, Silva P, Zilberman D. Genome-Wide Evolutionary Analysis of Eukaryotic DNA Methylation. Science. 2010;328(5980):916–9. doi: 10.1126/science.1186366. - DOI - PubMed
    1. Niederhuth CE, Bewick AJ, Ji L, Alabady M, Kim KD, Page JT, Li Q, Rohr NA, Rambani A, Burke JM, Udall JA, Egesi C, Schmutz J, Grimwood J, Jackson SA, Springer NM, Schmitz RJ. Widespread natural variation of DNA methylation within angiosperms. Genome Biol. 2016;17(194). 10.1186/s13059-016-1059-0. - PMC - PubMed
    1. Takuno S, Ran J-H, Gaut BS. Evolutionary patterns of genic DNA methylation vary across land plants. Nat Plants. 2016;2(January):15222. doi: 10.1038/nplants.2015.222. - DOI - PubMed
    1. Law JA, Jacobsen SE. Establising, maintaining and modifying DNA methylation patterns in plants and animals. Nat Rev Genet. 2010;11(3):204–20. doi: 10.1038/nrg2719. - DOI - PMC - PubMed

LinkOut - more resources