Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Nov 28:2023.11.28.569045.
doi: 10.1101/2023.11.28.569045.

Examining chromatin heterogeneity through PacBio long-read sequencing of M.EcoGII methylated genomes: an m6A detection efficiency and calling bias correcting pipeline

Affiliations

Examining chromatin heterogeneity through PacBio long-read sequencing of M.EcoGII methylated genomes: an m6A detection efficiency and calling bias correcting pipeline

Allison F Dennis et al. bioRxiv. .

Update in

Abstract

Recent studies have combined DNA methyltransferase footprinting of genomic DNA in nuclei with long-read sequencing, resulting in detailed chromatin maps for multi-kilobase stretches of genomic DNA from one cell. Theoretically, nucleosome footprints and nucleosome-depleted regions can be identified using M.EcoGII, which methylates adenines in any sequence context, providing a high-resolution map of accessible regions in each DNA molecule. Here we report PacBio long-read sequence data for budding yeast nuclei treated with M.EcoGII and a bioinformatic pipeline which corrects for three key challenges undermining this promising method. First, detection of m6A in individual DNA molecules by the PacBio software is inefficient, resulting in false footprints predicted by random gaps of seemingly unmethylated adenines. Second, there is a strong bias against m6A base calling as AT content increases. Third, occasional methylation occurs within nucleosomes, breaking up their footprints. After correcting for these issues, our pipeline calculates a correlation coefficient-based score indicating the extent of chromatin heterogeneity within the cell population for every gene. Although the population average is consistent with that derived using other techniques, we observe a wide range of heterogeneity in nucleosome positions at the single-molecule level, probably reflecting cellular chromatin dynamics.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Identification of m6A bases in single DNA molecules. (A) Workflow for m6A identification in PacBio long reads. The zero-mode waveguide (ZMW) is a nanophotonic device for sequencing a single DNA molecule multiple times, producing repeat reads of both strands (subreads). A consensus read with accurate m6A base calls is obtained using the subreads. The PacBio software produces a circular consensus (CCS) read with quality scores for each base within the same DNA molecule. (B) Histograms of m6A fraction per read. Reads were filtered using different average base quality scores and the m6A fraction was calculated for each read. The two gDNA replicates are compared. Note that increasing the quality score effectively removes reads with low methylation. (C) Distribution of inter-m6A distances. The distance between each m6A and the next m6A was calculated for each read. The number of inter-m6A distances in each sample was normalised to the number of nucleotides sequenced per million. ‘Rep 1’ and ‘Rep 2’ refer to biological replicate experiments.
Figure 2.
Figure 2.
Pipeline for identifying nucleosome footprints and accessible regions in single DNA molecules. (A) Simplified diagram of the prediction process. The pipeline predicts accessible regions and nucleosome footprints for each read. We developed a statistical test to compare the local fraction of m6A in each 25 bp window with the expected m6A fraction based on the read average and the AT content of the window. Nucleosomal DNA should be hypo-methylated, whereas accessible regions should be hyper-methylated relative to the read average. (B) Linear relationship between window methylation and average read methylation as a function of window AT-content. The average methylation was calculated for each read. The 25-bp windows within each read were grouped by the number of adenines + thymines in the window sequence, and the average methylation was calculated for each group of windows with the same AT content. Data for windows from all reads with the same average read methylation were combined (rounded to 1%) and grouped by AT-content. The average methylation of each group of windows is plotted against the average read methylation. The grey dashed line corresponds to the ‘no bias’ condition, where the window average = read average. (C) Average fraction of m6A in poly(A) runs. All genomic adenines were grouped according to their location in poly(A) runs of 1 to 8 nt and the average m6A fraction in the gDNA samples was computed for each group. (D) Lengths of predicted nucleosome footprints. The major peaks at 140 –150 bp correspond to nucleosome footprints. Some apparently fragmented nucleosome footprints appear as minor peaks < 140 bp, consistent with occasional methylation within the nucleosome, or with sub-nucleosomal particles, or other complexes (compare with Figure 1C). The number of nucleosome footprints in each sample was normalised to the number of nucleotides sequenced per million. (E) Snapshot of base pair level nucleosome prediction for the YIL126W/STH1 locus in the Integrative Genomics Viewer v.2.9.2 (IGV). Called m6A bases are indicated as insertions (purple ‘I’), accessible (methylated) regions as matched regions (grey blocks), nucleosomes as substitutions (red blocks) and ambiguous regions as deletions (lines between blocks). (F) Nucleosome occupancy relative to the transcription start site (TSS) for a set of 3745 non-overlapping genes with a minimum length of 200 bp and a minimum promoter region of 200 bp. At each genome position, nucleosome protection was calculated as the percentage of reads in which base pair ‘n’ is nucleosomal of the total read coverage of base pair ‘n’.
Figure 3.
Figure 3.
Heterogeneity in nucleosome positioning. (A) Histogram of correlations in nucleosome positioning on individual genes. We selected 3,733 genes with at least 300-bp separation between the gene and the next gene upstream and with at least 6 reads overlapping the TSS +/− 300 bp region. For each gene, we calculated the correlation coefficient for nucleosome positions in all the reads in the +/− 300 bp region (decreasing correlation indicates increasing heterogeneity in nucleosome positioning; see Methods). (B) Nucleosome positioning at the YIL126W/STH1 locus. An example of relatively homogeneous positioning. Each block represents one read. The basepairs predicted to occupy accessible regions are indicated as cyan blocks, nucleosomes as red blocks, and ambiguous regions as grey blocks. m6A bases are indicated by black vertical lines. (C) Nucleosome positioning at the YPL223C/GRE1 locus. This locus illustrates heterogeneous positioning. (D) The top 20 genes with homogeneous nucleosome positioning. Each line represents a single read, annotated as above. Reads belonging to different genes are separated by dashed lines. All of these genes have obvious promoter NDRs. (E) The top 20 genes with heterogeneous nucleosome positioning. These genes do not have accessible promoters and nucleosome positions vary widely from molecule to molecule.

References

    1. Shipony Z., Marinov G.K., Swaffer M.P., Sinnott-Armstrong N.A., Skotheim J.M., Kundaje A. and Greenleaf W.J. (2020) Long-range single-molecule mapping of chromatin accessibility in eukaryotes. Nat Methods, 17, 319–327. - PMC - PubMed
    1. Stergachis A.B., Debo B.M., Haugen E., Churchman L.S. and Stamatoyannopoulos J.A. (2020) Single-molecule regulatory architectures captured by chromatin fiber sequencing. Science, 368, 1449–1454. - PubMed
    1. Abdulhay N.J., McNally C.P., Hsieh L.J., Kasinathan S., Keith A., Estes L.S., Karimzadeh M., Underwood J.G., Goodarzi H., Narlikar G.J. et al. (2020) Massively multiplex single-molecule oligonucleosome footprinting. Elife, 9. - PMC - PubMed
    1. Wang Y., Wang A., Liu Z., Thurman A.L., Powers L.S., Zou M., Zhao Y., Hefel A., Li Y., Zabner J. et al. (2019) Single-molecule long-read sequencing reveals the chromatin basis of gene expression. Genome Res, 29, 1329–1342. - PMC - PubMed
    1. Dubocanin D., Cortes A.E.S., Ranchalis J., Real T., Mallory B. and Stergachis A.B. (2022) Single-molecule architecture and heterogeneity of human telomeric DNA and chromatin. bioRxiv, 2022.2005.2009.491186.

Publication types