Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Dec 15;26(24):3028-34.
doi: 10.1093/bioinformatics/btq590. Epub 2010 Oct 21.

Discovering homotypic binding events at high spatial resolution

Affiliations

Discovering homotypic binding events at high spatial resolution

Yuchun Guo et al. Bioinformatics. .

Abstract

Motivation: Clusters of protein-DNA interaction events involving the same transcription factor are known to act as key components of invertebrate and mammalian promoters and enhancers. However, detecting closely spaced homotypic events from ChIP-Seq data is challenging because random variation in the ChIP fragmentation process obscures event locations.

Results: The Genome Positioning System (GPS) can predict protein-DNA interaction events at high spatial resolution from ChIP-Seq data, while retaining the ability to resolve closely spaced events that appear as a single cluster of reads. GPS models observed reads using a complexity penalized mixture model and efficiently predicts event locations with a segmented EM algorithm. An optional mode permits GPS to align common events across distinct experiments. GPS detects more joint events in synthetic and actual ChIP-Seq data and has superior spatial resolution when compared with other methods. In addition, the specificity and sensitivity of GPS are superior to or comparable with other methods.

Availability: http://cgs.csail.mit.edu/gps.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
GPS probabilistically models ChIP-Seq read spatial distributions. (a) Protein-DNA interaction events at positions 1 and 2 on the genome result in DNA end sequence reads in the ChIP-Seq protocol. (b) The observed spatial read density (blue: ‘+’ strand, red: ‘−’ strand) from ∼4000 CTCF events aligned with respect to the CTCF motif position at each event (c) GPS models ChIP-Seq reads as being generated by a mixture of binding events at every genomic base, with each event producing the characteristic spatial read density. (d) A sparse prior on mixture components causes GPS to assign events to as few bases as possible to explain the observed reads (green and orange reads). In GPS, a given read can be explained by more than one event (yellow reads).
Fig. 2.
Fig. 2.
Probabilistic model for GPS event alignment.
Fig. 3.
Fig. 3.
GPS improves the effective spatial resolution and accuracy in resolving proximal binding events. (a) Fraction of predicted CTCF binding events with a motif within the given distance with event discovery by GPS, SISSRs, MACS, cisGenome, QuEST, FindPeaks, spp-wtd and spp-mtc. Events shown were predicted by all eight methods and had a CTCF motif within 100 bp. (b) Fraction of binary events recovered vs. the distance between the generated synthetic events for GPS, SISSRs, MACS and QuEST. (c) Example of a predicted binary CTCF event that contains coordinately located CTCF motifs. (d) Number of GABP events discovered by GPS, SISSRs, MACS, cisGenome, and QuEST in regions that contain clustered GABP motifs within 500 bp.
Fig. 4.
Fig. 4.
GPS in alignment mode. (a) Histogram of distance between predicted human CTCF events across two conditions shows that GPS in alignment mode aligns proximal events while continuing to discover separated discrete events. (b) Histogram of distance between predicted CTCF events across two conditions when GPS is run independently on each condition.

References

    1. Bailey TL, et al. The value of position-specific priors in motif discovery using MEME. BMC Bioinformatics. 2010;11:179. - PMC - PubMed
    1. Barski A, et al. High-resolution profiling of histone methylations in the human genome. Cell. 2007;129:823–837. - PubMed
    1. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B. 1995;57:289–300.
    1. Bicego M, et al. Proceedings of the 14th International Conference on Image Analysis and Processing (ICIAP 2007). Modena: IEEE Computer Society; 2007. Sparseness achievement in hidden Markov models; pp. 67–72.
    1. Birney E, et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007;447:799–816. - PMC - PubMed

Publication types