Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb 22;22(1):68.
doi: 10.1186/s13059-021-02283-5.

Megabase-scale methylation phasing using nanopore long reads and NanoMethPhase

Affiliations

Megabase-scale methylation phasing using nanopore long reads and NanoMethPhase

Vahid Akbari et al. Genome Biol. .

Abstract

The ability of nanopore sequencing to simultaneously detect modified nucleotides while producing long reads makes it ideal for detecting and phasing allele-specific methylation. However, there is currently no complete software for detecting SNPs, phasing haplotypes, and mapping methylation to these from nanopore sequence data. Here, we present NanoMethPhase, a software tool to phase 5-methylcytosine from nanopore sequencing. We also present SNVoter, which can post-process nanopore SNV calls to improve accuracy in low coverage regions. Together, these tools can accurately detect allele-specific methylation genome-wide using nanopore sequence data with low coverage of about ten-fold redundancy.

Keywords: Allele-specific methylation; NanoMethPhase; Nanopore sequencing; Phasing.

PubMed Disclaimer

Conflict of interest statement

We declare that there is no conflict of interest associated with this publication.

Figures

Fig. 1
Fig. 1
Methylation calling from nanopore data and comparison to gold standard platforms. a UpSet plot of the intersections of CpG sites detected using DeepSignal, Nanopolish, Megalodon, and whole-genome bisulfite sequencing (WGBS) for NA12878. Because Encode WGBS workflow used index generated from GRCh38 assembly with alternative contiguous removed, only CpG methylations on main chromosomes (1–22 and X) were considered. b, c Pearson correlation matrix of methylation levels from the tools with WGBS (b) and Illumina’s 27k methylation array (c). For comparison to WGBS, only CpGs with at least 5 calls were considered. The number represents common CpGs in all methods. d Distribution of methylation over CpG islands (CGIs). e Distribution of methylation at transcription start (TSS) and end sites (TES). f Scatter plot of methylation level obtained from Nanopolish and Illumina’s 27k methylation array for NA19240 sample. Pearson correlation coefficient presented as r. gh Distribution of methylation over common CpGs between Nanopolish and 27k array at CpG islands (CGIs) (g) or transcription start (TSS) and end sites (TES) (h). Gold standard methods for CpG methylation detection are indicated by an asterisk
Fig. 2
Fig. 2
Improvement in Clair SNV calls from nanopore data. a Base quality and mutation frequencies obtained for 1 million randomly selected 5-mers with SNV in the base 3 position (1:1 true positives to false positives). b PCA analysis of the reference 5-mer GTACT shows separation of true positive from false positives based on quality scores and mismatch frequencies. c Clair variant calling quality distribution for the NA19240 run 0 sample. d Quality distribution upon normalization of Clair’s qualities using the weights given by SNVoter to each SNV. The highlighted region represents the optimal threshold area to filter out low-quality calls. e Receiver operating characteristic curves for SNV calling using Clair or using Clair+SNVoter for different coverage depths. NA19240 run 1, NA19240 run 0, and Colo829BL are processed by SNVoter using the model trained on NA12878 20FCs (24×). NA19240 runs 0&1 and NA19240 runs 1&2 is processed using the model trained on NA12878 whole dataset (44×)
Fig. 3
Fig. 3
NanoMethPhase workflow and read phasing. a, b Haplotype block sizes following phasing of NA19240 and Colo829BL detected high-quality SNVs using WhatsHap. c NanoMethPhase workflow representing inputs, processing steps, and outputs. The output options can be requested independently to fit the needs. df Number of reads that were phased, filtered out, or could not be assigned to any phased SNV (left panel) and their length distribution (right panel, for ease in visualization reads with length < 50 kb are shown). d Obtained from NA19240 run 1 using nanopore phasing alone, e NA19240 run 1 trio phasing, and f Colo829BL sample. *NanoMethPhase phasing step ignores duplicated, QC failed, unmapped, and secondary reads. Supplementary reads also excluded by default but can be included as an optional parameter. The plots represent reads using default parameters
Fig. 4
Fig. 4
Methylation levels and phased CpGs at human ICRs. a, b Methylation levels of phased CpGs presented with haplotypes of origin at reported ICRs as heatmaps. a CpGs mapped to known ICRs. b CpGs mapped to novel ICRs from Court et al. and Joshi et al. The heatmap colors represent the mean of methylation at the regions. Origin bar indicates known or reported origin from previous studies, and heatmap column labels represent assigned haplotype by NanoMethPhase. In trio phasing, Pat stands for paternal and Mat for maternal. c, d Integrative Genomics Viewer screen captures of phased bam files converted to mock WGBS format for samples NA19240 run 1 and Colo829BL at two well-known ICRs
Fig. 5
Fig. 5
Differentially methylated regions mapping and imprinted genes. a Number of DMRs detected at each chromosome in Colo829BL, NA19240 run 1 nanopore alone phasing, and NA19240 run 1 trio phasing. The numerous DMRs in the X chromosome of NA19240 cell line are explained by its X chromosome inactivation. b Mapped DMRs to 4 Mb upstream and downstream of known, predicted, conflicting and provisional imprinted genes from GeneImprint and the catalog of human imprinted gene databases. NA19240 NA stands for NA19240 nanopore phasing alone

References

    1. Khamlichi AA, Feil R. Parallels between mammalian mechanisms of monoallelic gene expression. Trends Genet. 2018;34:954–971. doi: 10.1016/j.tig.2018.08.005. - DOI - PubMed
    1. Goovaerts T, Steyaert S, Vandenbussche CA, Galle J, Thas O, Van Criekinge W, et al. A comprehensive overview of genomic imprinting in breast and its deregulation in cancer. Nat Commun. 2018;9:1–14. doi: 10.1038/s41467-018-06566-7. - DOI - PMC - PubMed
    1. Reinius B, Sandberg R. Random monoallelic expression of autosomal genes: stochastic transcription and allele-level regulation. Nat Rev Genet. 2015;16:653–664. doi: 10.1038/nrg3888. - DOI - PubMed
    1. Morcos L, Ge B, Koka V, KCL L, Pokholok DK, Gunderson KL, et al. Genome-wide assessment of imprinted expression in human cells. Genome Biol. 2011;12:R25. doi: 10.1186/gb-2011-12-3-r25. - DOI - PMC - PubMed
    1. Jelinic P, Shaw P. Loss of imprinting and cancer. J Pathol. 2007;211:261–268. doi: 10.1002/path.2116. - DOI - PubMed

Publication types