Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015;16 Suppl 2(Suppl 2):S8.
doi: 10.1186/1471-2164-16-S2-S8. Epub 2015 Jan 21.

A linear time algorithm for detecting long genomic regions enriched with a specific combination of epigenetic states

A linear time algorithm for detecting long genomic regions enriched with a specific combination of epigenetic states

Kazuki Ichikawa et al. BMC Genomics. 2015.

Abstract

Background: Epigenetic modifications are essential for controlling gene expression. Recent studies have shown that not only single epigenetic modifications but also combinations of multiple epigenetic modifications play vital roles in gene regulation. A striking example is the long hypomethylated regions enriched with modified H3K27me3 (called, "K27HMD" regions), which are exposed to suppress the expression of key developmental genes relevant to cellular development and differentiation during embryonic stages in vertebrates. It is thus a biologically important issue to develop an effective optimization algorithm for detecting long DNA regions (e.g., >4 kbp in size) that harbor a specific combination of epigenetic modifications (e.g., K27HMD regions). However, to date, optimization algorithms for these purposes have received little attention, and available methods are still heuristic and ad hoc.

Results: In this paper, we propose a linear time algorithm for calculating a set of non-overlapping regions that maximizes the sum of similarities between the vector of focal epigenetic states and the vectors of raw epigenetic states at DNA positions in the set of regions. The average elapsed time to process the epigenetic data of any of human chromosomes was less than 2 seconds on an Intel Xeon CPU. To demonstrate the effectiveness of the algorithm, we estimated large K27HMD regions in the medaka and human genomes using our method, ChromHMM, and a heuristic method.

Conclusions: We confirmed that the advantages of our method over those of the two other methods. Our method is flexible enough to handle other types of epigenetic combinations. The program that implements the method is called "CSMinfinder" and is made available at: http://mlab.cb.k.u-tokyo.ac.jp/~ichikawa/Segmentation/

PubMed Disclaimer

Figures

Figure 1
Figure 1
Examples of long K27HMD regions in the medaka genome. Examples of K27HMD regions enclosed in dashed boxes. Each screen capture shows an image in a medaka genome browser that displays tracks of gene structures, CpG methylation levels observed by bisulfite sequencing, and levels of H3K27me3 and H3K4me2 in blastula cells (half-day embryos). A. A K27HMD region of length ~4 kbp with cbx4, and a ~8 kbp region with cbx8. B. A large region of length ~90 kbp with hoxa genes. C. A ~6 kbp region with six2, and a ~14 kbp region with hnf6. D. A ~20 kbp region with zic1 and zic4.
Figure 2
Figure 2
Lengths and average methylation levels of K27HMD regions in the medaka genome. Each dot represents a region that is identified by CSMinfinder, ChromHMM, and Nakamura's method in the medaka genome. The x-axis shows the length of a K27HMD region and the y-axis presents the average methylation level of the region.
Figure 3
Figure 3
Length distribution of large K27HMD regions in the medaka genome. A-B. Comparison between CSMinfinder (minimum length threshold of 4 kbp), ChromHMM, and Nakamura's method. The x-axis shows the minimum length of K27HMD regions, and the y-axis shows the accumulated number of K27HMD regions longer than or equal to the threshold in the x-axis. Because of the space limitations, the histogram is divided into two sub-histograms A (threshold is ≤ 10 kbp) and B (threshold ≥ 11 kbp). C. In this case, we set the minimum threshold to 8 kbp using CSMinfinder.
Figure 4
Figure 4
Examples of large K27HMD regions around developmental genes in the human genome. A. The figure displays several K27HMD regions in the human chromosome 11 around pax6, a gene that regulates eye and brain development. CSMinfinder and Nakamura's method detected large K27HMD regions of >4 kbp in size and output large regions that largely overlapped; however, ChromHMM divided these regions into smaller ones. B. These large K27HMD on human chromosome 7 were located around a cluster of hox genes that regulate the body plan of the head-tail axis. ChromHMM yielded much smaller K27HMD regions as output than did the other two methods.
Figure 5
Figure 5
Length distribution of large K27HMD regions in the human genome. Comparison between CSMinfinder (minimum length threshold of 8 kbp), ChromHMM, and Nakamura's method. The x-axis shows the minimum K27HMD region length threshold, and the y-axis shows the accumulated number of K27HMD regions longer than or equal to the threshold on the x-axis.
Figure 6
Figure 6
Average elapsed time of processing human (A) and medaka (B) chromosomes ten times by using CSMinfinder. The minimum threshold is set to 8 kbp for handing the human genome, and 4 kbp for the medaka genome. Each dot represents a chromosome, the x-axis value shows the size of the chromosome, and the y-axis value is the average elapsed time.

Similar articles

References

    1. Hendrich B, Tweedie S. The methyl-CpG binding domain and the evolving role of DNA methylation in animals. Trends in Genetics. 2003;19(5):269–277. doi: 10.1016/S0168-9525(03)00080-5. - DOI - PubMed
    1. Bird A. DNA methylation patterns and epigenetic memory. Genes Dev. 2002;16(1):6–21. doi: 10.1101/gad.947102. - DOI - PubMed
    1. Vastenhouw NL, Schier AF. Bivalent histone modifications in early embryogenesis. Curr Opin Cell Biol. 2012;24(3):374–386. doi: 10.1016/j.ceb.2012.03.009. - DOI - PMC - PubMed
    1. Zhou VW, Goren A, Bernstein BE. Charting histone modifications and the functional organization of mammalian genomes. Nat Rev Genet. 2011;12(1):7–18. - PubMed
    1. Xie W, Schultz MD, Lister R, Hou Z, Rajagopal N, Ray P, Whitaker JW, Tian S, Hawkins RD, Leung D. et al.Epigenomic analysis of multilineage differentiation of human embryonic stem cells. Cell. 2013;153(5):1134–1148. doi: 10.1016/j.cell.2013.04.022. - DOI - PMC - PubMed

Publication types

LinkOut - more resources