Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jun 1;39(6):btad386.
doi: 10.1093/bioinformatics/btad386.

Efficiently quantifying DNA methylation for bulk- and single-cell bisulfite data

Affiliations

Efficiently quantifying DNA methylation for bulk- and single-cell bisulfite data

Jonas Fischer et al. Bioinformatics. .

Abstract

Motivation: DNA CpG methylation (CpGm) has proven to be a crucial epigenetic factor in the mammalian gene regulatory system. Assessment of DNA CpG methylation values via whole-genome bisulfite sequencing (WGBS) is, however, computationally extremely demanding.

Results: We present FAst MEthylation calling (FAME), the first approach to quantify CpGm values directly from bulk or single-cell WGBS reads without intermediate output files. FAME is very fast but as accurate as standard methods, which first produce BS alignment files before computing CpGm values. We present experiments on bulk and single-cell bisulfite datasets in which we show that data analysis can be significantly sped-up and help addressing the current WGBS analysis bottleneck for large-scale datasets without compromising accuracy.

Availability and implementation: An implementation of FAME is open source and licensed under GPL-3.0 at https://github.com/FischerJo/FAME.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
Alignment aware mapping. (a) Errors made through traditional reduced alphabet mapping. Former C reduced to T are highlighted, strand origin indicated by arrows. (b) Principle of asymmetric aware mapping, T can be mapped unidirectionally to C. (c) Example of asymmetric (bisulfite-aware) Shift-And automaton. Visualization of the query of the letter sequence [i.e. genomic segment “TCT” for the pattern “AACTT” (i.e. read)]. The current position in the segment (left side) is indicated by a red arrow.
Figure 2.
Figure 2.
FAME workflow. General workflow of FAME for index construction (a) and read alignment (b). For a given reference genome (top), a CpG index is constructed for 2 kb genomic segments called Meta CpGs (MCpGs) using a rolling hash function for spaced k-mers. WGBS reads are matched in three phases (bottom): candidate retrieval, verification, and methylation calling. Methylation rates are updated in a separate data structure, directly yielding methylation values without any re-alignment. FAME can process bulk or single cell datasets.
Figure 3.
Figure 3.
Results for different index parameters. Depicted are the results of a grid search for varying q (color), and filter threshold t (x-axis). (a) Root Mean Squared Error (RMSE, smaller is better) compared between actual and predicted methylation rates on the synthetic grid dataset and (b) runtime for the same.
Figure 4.
Figure 4.
Comparison on bisulfite sequencing data. Visualization of results for simulated bulk sequencing WGBS reads (a) and real WGBS data (b) of LNCaP cell lines (Pidsley et al. 2016). We compare runtime (x-axis) against error of the predicted methylation values as RMSE (y-axis). The size of each point indicates the number of unmapped CpGs. In case of the real world data, EPIC arrays serve as baseline methylation calls against which we compare the methods. Runtimes in hours on log-scale for scBS-seq (c) of 192 cells (Linker et al. 2019).

Similar articles

References

    1. Assenov Y, Müller F, Lutsik P. et al. Comprehensive analysis of DNA methylation data with RnBeads. Nat Methods 2014;11:1138–40. - PMC - PubMed
    1. Baeza-Yates R, Gonnet GH.. A new approach to text searching. Commun ACM 1992;35:74–82.
    1. Bashkeel N, Perkins TJ, Kærn M. et al. Human gene expression variability and its dependence on methylation and aging. BMC Genomics 2019;20:941. - PMC - PubMed
    1. Bray NL, Pimentel H, Melsted P. et al. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol 2016;34:525–7. - PubMed
    1. Chen P-Y, Cokus SJ, Pellegrini M. et al. BS seeker: precise mapping for bisulfite sequencing. BMC Bioinformatics 2010;11:203. - PMC - PubMed

Publication types