Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 22;21(5):e1011667.
doi: 10.1371/journal.pgen.1011667. eCollection 2025 May.

Cost-effective solutions for high-throughput enzymatic DNA methylation sequencing

Affiliations

Cost-effective solutions for high-throughput enzymatic DNA methylation sequencing

Amy Longtin et al. PLoS Genet. .

Abstract

Characterizing DNA methylation patterns is important for addressing key questions in evolutionary biology, development, geroscience, and medical genomics. While costs are decreasing, whole-genome DNA methylation profiling remains prohibitively expensive for most population-scale studies, creating a need for cost-effective, reduced representation approaches (i.e., assays that rely on microarrays, enzyme digests, or sequence capture to target a subset of the genome). Most common whole genome and reduced representation techniques rely on bisulfite conversion, which can damage DNA resulting in DNA loss and sequencing biases. Enzymatic methyl sequencing (EM-seq) was recently proposed to overcome these issues, but thorough benchmarking of EM-seq combined with cost-effective, reduced representation strategies is currently lacking. To address this gap, we optimized the Targeted Methylation Sequencing protocol (TMS)-which profiles ~4 million CpG sites-for miniaturization, flexibility, and multispecies use. First, we tested modifications to increase throughput and reduce cost, including increasing multiplexing, decreasing DNA input, and using enzymatic rather than mechanical fragmentation to prepare DNA. Second, we compared our optimized TMS protocol to commonly used techniques, specifically the Infinium MethylationEPIC BeadChip (n = 55 paired samples) and whole genome bisulfite sequencing (n = 6 paired samples). In both cases, we found strong agreement between technologies (R2 = 0.97 and 0.99, respectively). Third, we tested the optimized TMS protocol in three non-human primate species (rhesus macaques, geladas, and capuchins). We captured a high percentage (mean = 77.1%) of targeted CpG sites and produced methylation level estimates that agreed with those generated from reduced representation bisulfite sequencing (R2 = 0.98). Finally, we confirmed that estimates of 1) epigenetic age and 2) tissue-specific DNA methylation patterns are strongly recapitulated using data generated from TMS versus other technologies. Altogether, our optimized TMS protocol will enable cost-effective, population-scale studies of genome-wide DNA methylation levels across human and non-human primate species.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Experimental design and study populations.
[A] To optimize the TMS protocol, we used samples from three human and three NHP populations: the Tsimane of Bolivia, a Vanderbilt University Medical Center cohort, the Orang Asli of Malaysia, rhesus macaques from Cayo Santiago in Puerto Rico, tufted capuchins from captive sites throughout the United States, and gelada monkeys from Ethiopia. Created using BioRender. [B] The TMS protocol begins with DNA fragmentation and adapter ligation. Next, two enzymes, TET2 and APOBEC, are used to oxidize and deaminate the DNA. TET2 recognizes methyl groups attached to cytosines and converts them to Ca/g. APOBEC follows TET2 and converts the unmethylated cytosines to uracils. Following PCR amplification (which converts uracils to thymines), hybrid capture is used to enrich for targeted regions of the genome. Samples are then assayed via high throughput sequencing. Created using Microsoft Powerpoint. [C] Overview of experiments and analyses. The samples used for each set of experiments are noted by a population-specific icon. Icons from Biorender, OpenClipArt, and Microsoft Powerpoint.
Fig 2
Fig 2. Optimized TMS produces high-quality DNA methylation data across a range of plexing strategies, input amounts, and protocol modifications.
[A] High (>70%) mean mapping efficiency across plexing strategies. Each point represents a sample within a plexing strategy and the y-axis represents the percent of reads uniquely mapped per sample. [B] Mapping efficiency increases as input amount increases. Each point represents a 12-plex pool made with varying DNA input amounts per sample, the y-axis represents the percent of reads uniquely mapped per sample. [C] Distribution of median DNA methylation levels for CpG sites located within different chromHMM genomic annotations; annotations from NIH Roadmap Epigenomics and data from the 96-plex, 200 ng input from experiment 1. [D] The total number of CpG sites falling within different chromHMM genomic annotations (using data from the 96-plex, 200 ng input from experiment 1). [E] Percent of reads that are not within the Twist probe set (i.e., off-target reads) following protocol modifications to annealing temperature and methylation enhancer (ME) volume. For each set of protocol conditions, the x-axis represents the percent of mapped reads that do not overlap with the Twist probe set. [F] Percent of Twist probes that are represented within each dataset following protocol modifications to adjust the annealing temperature and ME volume. For each set of protocol conditions, the x-axis represents the percentage of Twist probes that were represented by at least 1 read.
Fig 3
Fig 3. Optimized TMS recapitulates DNA methylation levels measured with the EPIC array and WGBS.
[A] Correlation in DNA methylation levels for EPIC array versus TMS (R2 = 0.97). Each point represents the DNA methylation level of a given CpG averaged across 6 samples measured using the EPIC array (x-axis) and 96-plex, 200 ng input TMS (y-axis). The R2 value was generated using linear modeling. Sites were filtered to>5X coverage in >75% of samples within each technology. [B] Histogram of R2 values calculated for each individual sample (i.e., comparing per CpG DNA methylation levels measured on both technologies for a given sample). R2 values are provided when all CpG sites common to both technologies are included, as well as when only variably methylated CpG sites are included. [C] Correlation in DNA methylation levels for WGBS versus TMS (R2 = 0.9871). Each point represents the DNA methylation level of a given CpG averaged across 6 samples measured using WGBS (x-axis) and 96-plex, 200 ng input TMS (y-axis). The R2 value was generated using linear modeling. Sites were filtered to>5X coverage in >75% of samples within each technology. [D] Density plot of the average DNA methylation levels detected for common sites between the three technologies (713,282 sites). Notably, the EPIC array is biased against DNA methylation levels of 100%, as previously observed [51] and explained by the equation used to calculate beta values.
Fig 4
Fig 4. Optimized TMS performs well in non-human primate species and when compared to RRBS.
[A] Optimized TMS in NHPs results in high mapping efficiencies despite the use of human-specific probes. Here, each of the species are mapped to their respective reference genome. We hypothesize that low mapping efficiency in certain rhesus macaque samples is due to variation in sample quality. [B] Number of expected and observed CpG sites covered in each NHP genome. Expected sites were derived from mapping the Twist probes to each NHP genome, while observed sites represent those detected with a coverage > 5X in >75% of samples. [C] Principal components analysis of TMS-derived DNA methylation levels for rhesus macaque samples spanning six distinct tissues. [D] Similar per CpG DNA methylation level estimates using RRBS (x-axis) and optimized TMS (y-axis) (R2 = 0.97). [E] Density plot of linear model R2 values obtained from comparing data generated via optimized TMS and RRBS for the same rhesus macaque samples. R2 values are provided when all CpG sites common to both technologies are included, as well as when only variably methylated (methylation > 10% and methylation < 90%) CpG sites are included. [F] Density curves of the average genome-wide DNA methylation level estimates for each NHP species. Curves show the expected bimodal distribution in which many of the CpG sites in the genome are either hypomethylated or hypermethylated.
Fig 5
Fig 5. TMS recapitulates epigenetic age predictions and tissue-dependent effects identified via other technologies.
[A] Pearson’s correlation coefficient comparing epigenetic age predictions for five PC-based epigenetic clocks run on TMS versus EPIC v2 array data from the VUMC cohort (n paired samples = 55). All correlations were significant following multiple hypothesis testing (FDR < 5%). [B] Correlation between standardized effect sizes, estimating liver-specific effects, using RRBS versus TMS data (n paired rhesus macaque samples = 96). To derive effect size estimates, models were run comparing the liver to all other tissues. Each point represents the effect size for a given CpG site common to both datasets. [C] Pearson’s correlation coefficient comparing effect sizes for estimates of tissue-specific effects using TMS versus RRBS data (n paired rhesus macaque samples per tissue = 96). Separate models were run for each tissue, comparing the focal tissue on the x-axis to all other other tissues to identify tissue-specific effects. All correlations were significant following multiple hypothesis testing (FDR < 5%). [D] Degree of enrichment (represented as an log2 odds ratio from a Fisher’s Exact test), between CpG sites identified as tissue-specific in TMS versus RRBS data using matched samples. Dashed line represents no enrichment and error bars represent confidence intervals.

Update of

References

    1. Reik W. Stability and flexibility of epigenetic gene regulation in mammalian development. Nature. 2007;447(7143):425–32. doi: 10.1038/nature05918 - DOI - PubMed
    1. Duhl DM, Vrieling H, Miller KA, Wolff GL, Barsh GS. Neomorphic agouti mutations in obese yellow mice. Nat Genet. 1994;8(1):59–65. doi: 10.1038/ng0994-59 - DOI - PubMed
    1. Morgan HD, Sutherland HG, Martin DI, Whitelaw E. Epigenetic inheritance at the agouti locus in the mouse. Nat Genet. 1999;23(3):314–8. doi: 10.1038/15490 - DOI - PubMed
    1. Dolinoy DC, Weidman JR, Waterland RA, Jirtle RL. Maternal genistein alters coat color and protects Avy mouse offspring from obesity by modifying the fetal epigenome. Environ Health Perspect. 2006;114(4):567–72. doi: 10.1289/ehp.8700 - DOI - PMC - PubMed
    1. Mohn F, Schübeler D. Genetics and epigenetics: stability and plasticity during cellular differentiation. Trends Genet. 2009;25(3):129–36. doi: 10.1016/j.tig.2008.12.005 - DOI - PubMed

MeSH terms

LinkOut - more resources