Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Sep 9:11:456.
doi: 10.1186/1471-2105-11-456.

Multivariate Hawkes process models of the occurrence of regulatory elements

Affiliations

Multivariate Hawkes process models of the occurrence of regulatory elements

Lisbeth Carstensen et al. BMC Bioinformatics. .

Abstract

Background: A central question in molecular biology is how transcriptional regulatory elements (TREs) act in combination. Recent high-throughput data provide us with the location of multiple regulatory regions for multiple regulators, and thus with the possibility of analyzing the multivariate distribution of the occurrences of these TREs along the genome.

Results: We present a model of TRE occurrences known as the Hawkes process. We illustrate the use of this model by analyzing two different publically available data sets. We are able to model, in detail, how the occurrence of one TRE is affected by the occurrences of others, and we can test a range of natural hypotheses about the dependencies among the TRE occurrences. In contrast to earlier efforts, pre-processing steps such as clustering or binning are not needed, and we thus retain information about the dependencies among the TREs that is otherwise lost. For each of the two data sets we provide two results: first, a qualitative description of the dependencies among the occurrences of the TREs, and second, quantitative results on the favored or avoided distances between the different TREs.

Conclusions: The Hawkes process is a novel way of modeling the joint occurrences of multiple TREs along the genome that is capable of providing new insights into dependencies among elements involved in transcriptional regulation. The method is available as an R package from http://www.math.ku.dk/~richard/ppstat/.

PubMed Disclaimer

Figures

Figure 1
Figure 1
ChIP-chip to point process. Illustration of the way in which data from the ChIP-chip experiment can be viewed as a point process. In each cell, the different TREs are positioned along the double stranded DNA-sequence (top). The abundance of binding sites across cells at a particular position of the sequence results in a signal generated from the ChIP-chip experiment (middle). The midpoint of the interval where the signal is above a specified cut off is used as a proxy for the actual binding site. The midpoints for each of the TREs considered are viewed as points from a multivariate point process along the line (bottom).
Figure 2
Figure 2
Estimated g-functions, forward direction, mouse embryonic stem cell data. Plots of the g-functions modeling the effect of the occurrence of one TRE (column) on the occurrence of another TRE (row). The effects are estimated in the multivariate model. A value less than one indicates that this inter-TRE distance tends not to occur while a value greater than one indicates an inter-TRE distance that is likely. Point-wise 95% confidence intervals for the functions are also shown. To ease comparisons between effects, all the y-axes have the same scale with a maximum value of 10.
Figure 3
Figure 3
Clustering of TREs based on interaction graphs, mouse embryonic stem cell data. Result of a hierarchical clustering procedure based on the Ward method of the graphs for each TRE given in Figure 2. The clustering is based on the integral of the absolute value of the logarithm of the functions in Figure 2.
Figure 4
Figure 4
Tests for local independence, mouse embryonic stem cell data. This figure shows results for the 121 parallel likelihood ratio tests for local independence between all pairs of the 11 TREs in the multivariate model. We show the results for the model estimated in the forward direction (squares, effect of TRE (column) on TRE (row)). The size of the symbol for each test corresponds to the magnitude of the test statistic. Correcting for multiple testing using Holm's procedure the hypotheses of local independence that are rejected are shown in red while the hypotheses that are not rejected are shown in blue.
Figure 5
Figure 5
Estimated g-functions, forward direction, ENCODE data. Plots of the g-functions modeling the effect of the occurrence of one TRE (column) on the occurrence of another TRE (row). The effects are estimated in the multivariate model adjusting for the histone modifications and allowing different baseline intensities for the ENCODE regions. A value less than one indicates that this inter-TRE distance tends not to occur while a value greater than one indicates an inter-TRE distance that is likely. Point-wise 95% confidence intervals for the functions are also shown.
Figure 6
Figure 6
Tests for local independence, ENCODE data. This figure shows results for the 64 parallel likelihood ratio tests for local independence between all pairs of the 8 TREs in the multivariate model adjusting for histone modifications and different baseline intensities. We show the results for the model estimated in the forward direction (squares, effect of TRE (column) on TRE (row)) as well as in the reverse direction (circles, effect of TRE (row) on TRE (column)). The size of the symbol for each test corresponds to the magnitude of the test statistic. Correcting for multiple testing using Holm's procedure the hypotheses of local independence that are rejected are shown in red while the hypotheses that are not rejected are shown in blue.
Figure 7
Figure 7
Estimated g-functions, reverse direction, ENCODE data. Plots of the-functions modeling the effect of the occurrence of one TRE (row) on the occurrence of another TRE (column), estimated in the reverse direction. Note that the figure is transposed compared to Figure 5. The effects are estimated in the multivariate model adjusting for the histone modifications and allowing for different baseline intensities for the ENCODE regions. A value less than one indicates that this inter-TRE distance tends not to occur while a value greater than one indicates an inter-TRE distance that is likely. Point-wise 95% confidence intervals for the functions are also shown.
Figure 8
Figure 8
Effect of histone modifications, ENCODE data. Estimates and 95% confidence intervals for the parameters γH4Kac4k and γH3K27me3k for k one of the eight different TREs considered. These factors give the fold-changes of the baseline intensity in the presence of one of the histone modifications for each of the eight TREs.
Figure 9
Figure 9
Tests for local independence, all four time points, ENCODE data. This figure shows the results for the 64 parallel tests for local independence between all pairs of the 8 TREs in the multivariate model, adjusting for all covariates, at the four time points (0, 2, 8, 32 hours post-treatment). The models are estimated in the forward direction with the effect of TRE (column) on TRE (row). The size of the symbol for each test corresponds to the magnitude of the test statistic. Correcting for multiple testing using Holm's procedure the hypotheses of local independence that are rejected are shown in red while the hypotheses that are not rejected are shown in blue.

References

    1. Wasserman WW, Sandelin A. Applied bioinformatics for the identification of regulatory elements. Nat Rev Genet. 2004;5(4):276–287. doi: 10.1038/nrg1315. - DOI - PubMed
    1. Krivan W, Wasserman WW. A predictive model for regulatory sequences directing liver-specific transcription. Genome Res. 2001;11:1559–1566. doi: 10.1101/gr.180601. - DOI - PMC - PubMed
    1. Wasserman WW, Fickett JW. Identification of regulatory regions which confer muscle-specific gene expression. J Mol Biol. 1998;278:167–181. doi: 10.1006/jmbi.1998.1700. - DOI - PubMed
    1. Sharan R, Ben-Hur A, Loots GG, Ovcharenko I. CREME: Cis-Regulatory Module Explorer for the human genome. Nucleic Acids Res. 2004;32:W253–256. doi: 10.1093/nar/gkh385. - DOI - PMC - PubMed
    1. Won KJ, Sandelin A, Marstrand TT, Krogh A. Modeling promoter grammars with evolving hidden Markov models. Bioinformatics. 2008;24:1669–1675. doi: 10.1093/bioinformatics/btn254. - DOI - PubMed

Publication types