Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Oct 3;13(10):R83.
doi: 10.1186/gb-2012-13-10-r83.

BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions

BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions

Kasper D Hansen et al. Genome Biol. .

Abstract

DNA methylation is an important epigenetic modification involved in gene regulation, which can now be measured using whole-genome bisulfite sequencing. However, cost, complexity of the data, and lack of comprehensive analytical tools are major challenges that keep this technology from becoming widely applied. Here we present BSmooth, an alignment, quality control and analysis pipeline that provides accurate and precise results even with low coverage data, appropriately handling biological replicates. BSmooth is open source software, and can be downloaded from http://rafalab.jhsph.edu/bsmooth.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The need for biological replicates. We show smoothed methylation profiles for three normal samples (blue) and matched cancers (red) from the Hansen data [1]. Also shown is the smoothed methylation profile for an IMR90 cell-line (black) from the Lister data [3]. Had we only analyzed normal-cancer pair 3 (thick lines), there would appear to be a methylation difference between cancer and normal in this genomic region. When all three cancer-normal pairs are considered, however, this region does not appear to be a cancer-specific differentially methylated region.
Figure 2
Figure 2
Quality control plots. (a) M-bias plot for the Hansen data, a WGBS experiment on cancer samples. Each sample was sequenced on two flowcells. We show the methylation proportion across each possible read position. This plot shows limited evidence of methylation bias across the read positions. Vertical lines indicate cutoffs used for M-bias filtering. (b) M-bias plots for the Lister data, a WGBS experiment in a fibroblast cell line. These data were aligned using iterative trimming and each read length is depicted separately (different colors). The plot shows methylation bias toward the end of reads for all read lengths. (c) M-bias plot for the Hansen-capture data, a capture bisulfite sequencing experiment on cancer samples. The plot shows methylation bias at the start of the reads.
Figure 3
Figure 3
The advantages of smoothing. (a) Points represent single-CpG methylation estimates plotted against their genomic location. Large points are based on greater than 20× coverage. The orange circle denotes the location for which we are estimating the methylation profile. The blue points are those receiving positive weight in the local likelihood estimation. The orange line is obtained from the fitted parabola. The black line is the methylation profile resulting from repeating the procedure for each location. (b) The curve represents the kernel used in the weighted regression and the points are the actual weights, which are also influenced by coverage. (c) Points are as in (a) for the 25× coverage Lister data. The pink line is obtained by applying BSmooth to a the full data. The black line is the estimate from BSmooth based on a 5× subset of the Lister data. (d) The points are as in (a) but for the Hansen-capture data with average 35× coverage, and average across three replicates. The black line is the BSmooth estimate obtained from the 4× Hansen data, averaged across three replicates.
Figure 4
Figure 4
Evaluation of the differentialy methylated regions finder. (a) Specificity plotted against sensitivity for the BSmooth DMR finder (black) and a method based on Fisher's exact test (orange) applied to the Hansen data. The gold-standard definition is based on mean differences. Details are explained in the text. (b) As (a), but using a gold-standard definition accounting for biological variation. (c) Comparison based on the association between gene expression and methylation changes in the Tung data. For DMR lists of varying sizes (x-axis), the log2-odds ratios of finding a DMR within 5 kb of the transcription start site of a differentially expressed gene (FDR ≤5%) compared to genes not differentially expression (FDR ≥25%) are shown. FP, false positive; TP, true positive.

References

    1. Hansen KD, Timp W, Corrada Bravo H, Sabunciyan S, Langmead B, McDonald OG, Wen B, Wu H, Liu Y, Diep D, Briem E, Zhang K, Irizarry RA, Feinberg AP. Generalized loss of stability of epigenetic domains across cancer types. Nat Genet. 2011;43:768–775. doi: 10.1038/ng.865. - DOI - PMC - PubMed
    1. Laird PW. Principles and challenges of genome-wide DNA methylation analysis. Nat Rev Genet. 2010;11:191. - PubMed
    1. Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon GC, Tonti-Filippini J, Nery JR, Lee L, Ye Z, Ngo QM, Edsall L, Antosiewicz-Bourget J, Stewart R, Ruotti V, Millar AH, Thomson JA, Ren B, Ecker JR. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature. 2009;462:315–322. doi: 10.1038/nature08514. - DOI - PMC - PubMed
    1. Jaffe AE, Feinberg AP, Irizarry RA, Leek JT. Significance analysis and statistical dissection of variably methylated regions. Biostatistics. 2012;13:166–178. doi: 10.1093/biostatistics/kxr013. - DOI - PMC - PubMed
    1. Zeschnigk M, Martin M, Betzl G, Kalbe A, Sirsch C, Buiting K, Gross S, Fritzilas E, Frey B, Rahmann S, Hors- themke B. Massive parallel bisulfite sequencing of CG-rich DNA fragments reveals that methylation of many X-chromosomal CpG islands in female blood DNA is incomplete. Hum Mol Genet. 2009;18:1439–1448. doi: 10.1093/hmg/ddp054. - DOI - PubMed

Publication types