Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Oct 15:2:217.
doi: 10.12688/f1000research.2-217.v2. eCollection 2013.

MethylExtract: High-Quality methylation maps and SNV calling from whole genome bisulfite sequencing data

Affiliations

MethylExtract: High-Quality methylation maps and SNV calling from whole genome bisulfite sequencing data

Guillermo Barturen et al. F1000Res. .

Abstract

Whole genome methylation profiling at a single cytosine resolution is now feasible due to the advent of high-throughput sequencing techniques together with bisulfite treatment of the DNA. To obtain the methylation value of each individual cytosine, the bisulfite-treated sequence reads are first aligned to a reference genome, and then the profiling of the methylation levels is done from the alignments. A huge effort has been made to quickly and correctly align the reads and many different algorithms and programs to do this have been created. However, the second step is just as crucial and non-trivial, but much less attention has been paid to the final inference of the methylation states. Important error sources do exist, such as sequencing errors, bisulfite failure, clonal reads, and single nucleotide variants. We developed MethylExtract, a user friendly tool to: i) generate high quality, whole genome methylation maps and ii) detect sequence variation within the same sample preparation. The program is implemented into a single script and takes into account all major error sources. MethylExtract detects variation (SNVs - Single Nucleotide Variants) in a similar way to VarScan, a very sensitive method extensively used in SNV and genotype calling based on non-bisulfite-treated reads. The usefulness of MethylExtract is shown by means of extensive benchmarking based on artificial bisulfite-treated reads and a comparison to a recently published method, called Bis-SNP. MethylExtract is able to detect SNVs within High-Throughput Sequencing experiments of bisulfite treated DNA at the same time as it generates high quality methylation maps. This simultaneous detection of DNA methylation and sequence variation is crucial for many downstream analyses, for example when deciphering the impact of SNVs on differential methylation. An exclusive feature of MethylExtract, in comparison with existing software, is the possibility to assess the bisulfite failure in a statistical way. The source code, tutorial and artificial bisulfite datasets are available at http://bioinfo2.ugr.es/MethylExtract/ and http://sourceforge.net/projects/methylextract/, and also permanently accessible from 10.5281/zenodo.7144.

PubMed Disclaimer

Conflict of interest statement

Competing interests: No competing interests were disclosed.

Figures

Figure 1.
Figure 1.. SNV detection in bisulfite converted reads.
Sequence variation can be detected for a cytosine position analyzing the nucleotide frequency at the same position but on the complementary strand. Bisulfite conversion does not affect the guanine on the complementary strand, therefore the presence of any other base (H=A,C,T) might indicate the existence of an SNV. The figure illustrates three different situations: ( a) a methylated cytosine in a CpG context without sequence variation (all reads that map to the position independently of the strand carry a cytosine in the corresponding position), ( b) a heterozygous SNV (genotype C/T, SNV detected on the ‘+’ strand) and ( c) a homozygous SNV (genotype T/T, SNV detected on the ‘-’ strand). The example in b) shows a heterozygous SNV; the 6 reads with A/G mismatch from a total of 11 reads mapping the position indicate a heterozygous variation. Furthermore, we can conclude that the cytosine allele is methylated (7 reads with C/C matches to the ‘-’ strand). The case illustrated in part c), shows 12 reads that show C/T mismatch (‘+’ strand in blue in the upper part). Without looking at the complementary strand, the inference would be a completely un-methylated cytosine. However, the 11 reads that map to the complementary strand show an A/G mismatch at the corresponding position (we would expect guanines in the case of bisulfite conversion). Note that on bisulfite treated datasets only G/A mapped on the ‘+’ strand and C/T on the ‘-’ strand (refereed to the ‘+’ strand) can be used for SNV calling purposes. The figure was generated using the UCSC Genome Browser .
Figure 2.
Figure 2.. Distribution of C/(C+T) ratios for cytosines within the CpG context in the H1 cell line.
C/(C+T) values for cytosines at non-variant and variant (homo- and heterozygotic) positions were shown. The minimum read coverage was set to 10 reads.
Figure 3.
Figure 3.. CpGs methylation profiling comparison for alignment methods.
The results obtained from MethylExtract (correctly profiled methylation values and CpG coverage) using two bisulfite short read aligners, NGSmethPipe and Bismark are compared. The results are nearly independent of the used alignment algorithm.
Figure 4.
Figure 4.. MethylExtract SNV calling as a function of the minimum relative nucleotide frequency (‘varFraction’).
The figures show the sensitivity (Sn) and the positive predictive value (PPV) for SNV detection using two different p-value thresholds. The graphs are based on the methylated (top) and un-methylated (bottom) artificial bisulfite datasets at a mean 20× read coverage.
Figure 5.
Figure 5.. MethylExtract SNV calling and methylation profiling as a function of the base quality.
Both graphs show the positive predictive value (PPV) for SNV calling and the fraction of correctly profiled CpG methylation values (methylation profiling) as a function of the minimum base quality (PHRED score parameter ‘minQ’). The graphs are based on the methylated (top) and un-methylated (bottom) artificial bisulfite datasets at a mean 20× read coverage. Y-axis represents SNV PPV, Fraction of correct methylation values and CpG coverage. All of them vary between 0 to 1 therefore being represented together.
Figure 6.
Figure 6.. Comparison of SNV calling between MethylExtract and Bis-SNP.
The top graph shows the sensitivity (Sn) and the bottom graph the specificity (PPV) obtained for the methylated and un-methylated artificial bisulfite datasets at two different mean coverages (5×, 15×, 20× and 35×).
Figure 7.
Figure 7.. Comparison of CpG methylation values between MethylExtract and Bis-SNP.
Both methods are compared in terms of fraction of correctly profiled CpG methylation values (top) and the fraction of recovered CpG positions (bottom).
Supplementary Figure 1.
Supplementary Figure 1.. Methylation profiling comparison MethylExtract and Bis-SNP using relaxed criterion.
Both methods are compared in terms of fraction of correctly profiled CpG methylation values. The upper part of the graph shows the result allowing up to 10% deviation from the real methylation values, while the lower part shows the outcome increasing this range to 20%. The analyses were done for unmethylated and methylated datasets at four different coverages (5×, 15×, 20× and 35×).

References

    1. Oliveira DC, Tomasz A, de Lencastre H: The evolution of pandemic clones of methicillin-resistant Staphylococcus aureus: identification of two ancestral genetic backgrounds and the associated mec elements. Microb Drug Resist. 2001;7(4):349–61. 10.1089/10766290152773365 - DOI - PubMed
    1. Gu F, Doderer MS, Huang YW, et al. : CMS: a web-based system for visualization and analysis of genome-wide methylation data of human cancers. PLoS One. 2013;8(4):e60980. 10.1371/journal.pone.0060980 - DOI - PMC - PubMed
    1. Wasserkort R, Kalmar A, Valcz G, et al. : Aberrant septin 9 DNA methylation in colorectal cancer is restricted to a single CpG island. BMC Cancer. 2013;13(1):398. 10.1186/1471-2407-13-398 - DOI - PMC - PubMed
    1. Eden S, Cedar H: Role of DNA methylation in the regulation of transcription. Curr Opin Genet Dev. 1994;4(2):255–9. 10.1016/S0959-437X(05)80052-8 - DOI - PubMed
    1. Eden A, Gaudet F, Waghmare A, et al. : Chromosomal instability and tumors promoted by DNA hypomethylation. Science. 2003;300(5618):455. 10.1126/science.1083557 - DOI - PubMed

LinkOut - more resources