Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jan 15:72:21-8.
doi: 10.1016/j.ymeth.2014.10.036. Epub 2014 Nov 24.

Probe Lasso: a novel method to rope in differentially methylated regions with 450K DNA methylation data

Affiliations

Probe Lasso: a novel method to rope in differentially methylated regions with 450K DNA methylation data

Lee M Butcher et al. Methods. .

Abstract

The speed and resolution at which we can scour the genome for DNA methylation changes has improved immeasurably in the last 10years and the advent of the Illumina 450K BeadChip has made epigenome-wide association studies (EWAS) a reality. The resulting datasets are conveniently formatted to allow easy alignment of significant hits to genes and genetic features, however; methods that parse significant hits into discreet differentially methylated regions (DMRs) remain a challenge to implement. In this paper we present details of a novel DMR caller, the Probe Lasso: a flexible window based approach that gathers neighbouring significant-signals to define clear DMR boundaries for subsequent in-depth analysis. The method is implemented in the R package ChAMP (Morris et al., 2014) and returns sets of DMRs according to user-tuned levels of probe filtering (e.g., inclusion of sex chromosomes, polymorphisms) and probe-lasso size distribution. Using a sub-sample of colon cancer- and healthy colon-samples from TCGA we show that Probe Lasso shifts DMR calling away from just probe-dense regions, and calls a range of DMR sizes ranging from tens-of-bases to tens-of-kilobases in scale. Moreover, using TCGA data we show that Probe Lasso leverages more information from the array and highlights a potential role of hypomethylated transcription factor binding motifs not discoverable using a basic, fixed-window approach.

Keywords: DNA methylation; Differentially methylated regions; EWAS; Epigenetics; Illumina 450K BeadChip.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Probe spacing on the Illumina 450K BeadChip. (A) Probes are gene-centric, with those near transcription start sites (TSSs) most densely spaced. (B) Probe spacing is sparser the further a probe’s distance from a CpG island (CGI). (C) Combining genetic and epigenetic annotation information reveals a diverse range of probe spacing.
Fig. 2
Fig. 2
Schematic figure illustrating the Probe Lasso workflow. After probe spacing distributions have been calculated for each of the 28 genetic/epigenetic features, a quantile is set that is based on a user-specified min/max lasso size and lasso radius. This quantile results in 28 dynamic window sizes (‘probe-lassos’) that are thrown around each significantly-associated probe. If these lassos capture a user-specified number of significant probes, that probe’s lasso boundaries are retained. Overlapping- and neighbouring-lasso boundaries less than a user-specified distance apart are then merged to define DMR boundaries. All probes in the dataset are then binned into the DMRs and their p-values combined for the DMR, weighted by the underlying correlation structure of probe methylation values.
Fig. 3
Fig. 3
An example quantile distribution of probe spacing for each gene/CGI feature. The black horizontal and vertical dashed lines indicate the quantile (43rd) that results from choosing a maximum lasso size of 2000 bp.
Fig. 4
Fig. 4
Enrichment plot illustrating the distribution of genetic/epigenetic features for probes captured using the Probe Lasso algorithm (dark-grey bars), a sliding-fixed window approach (mid-greys) and all MVPs (light-grey). As predicted, the sliding fixed window approach enriched for probes near transcription start sites (TSSs) and CGIs. Conversely, the Probe Lasso enriches for CGI shelves and open sea, which is more in keeping with the genetic/epigenetic features of all MVP probes.
Fig. 5
Fig. 5
DMR, probe and sequence sharing between Probe Lasso and window.250 algorithms. Approximately 50% of DMRs are shared between the two methods (A), but the number of probes shared is less (B). When DMR sequences are analysed, we see a drastic reduction in shared information (C), which is due to Probe Lasso DMRs leveraging more information from IGRs, which are typified by lower CpG density. This trend is maintained even when probe-lasso boundaries are controlled for (D).
Fig. 6
Fig. 6
Violin plots demonstrating the distribution of DMR sizes using the Probe Lasso and sliding fixed-window approach with different levels of stringency. Generally, the Probe Lasso captures a wider range of DMR sizes while the smallest DMRs captured by a sliding-window based approach are often constrained to the size of a non-overlapping window. Overall, the Probe Lasso accomplishes a similar job to the combined effort of sliding windows of various sizes, without generating an unwieldy number of DMRs.

References

    1. Morris T.J., Butcher L.M., Feber A., Teschendorff A.E., Chakravarthy A.R., Wojdacz T.K., Beck S. Bioinformatics. 2014;30(3):428–430. - PMC - PubMed
    1. Ziller M.J., Gu H., Muller F., Donaghey J., Tsai L.T., Kohlbacher O., De Jager P.L., Rosen E.D., Bennett D.A., Bernstein B.E., Gnirke A., Meissner A. Nature. 2013;500(7463):477–481. - PMC - PubMed
    1. Bock C. Nat. Rev. Genet. 2012;13(10):705–719. - PubMed
    1. Li Y., Zhu J., Tian G., Li N., Li Q., Ye M., Zheng H., Yu J., Wu H., Sun J., Zhang H., Chen Q., Luo R., Chen M., He Y., Jin X., Zhang Q., Yu C., Zhou G., Sun J., Huang Y., Zheng H., Cao H., Zhou X., Guo S., Hu X., Li X., Kristiansen K., Bolund L., Xu J., Wang W., Yang H., Wang J., Li R., Beck S., Wang J., Zhang X. PLoS Biol. 2010;8(11):e1000533. - PMC - PubMed
    1. Lister R., Pelizzola M., Dowen R.H., Hawkins R.D., Hon G., Tonti-Filippini J., Nery J.R., Lee L., Ye Z., Ngo Q.M., Edsall L., ntosiewicz-Bourget J., Stewart R., Ruotti V., Millar A.H., Thomson J.A., Ren B., Ecker J.R. Nature. 2009;462(7271):315–322. - PMC - PubMed

Publication types