Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Aug 16;21(8):802.
doi: 10.3390/e21080802.

A Clustering Approach for Motif Discovery in ChIP-Seq Dataset

Affiliations

A Clustering Approach for Motif Discovery in ChIP-Seq Dataset

Chun-Xiao Sun et al. Entropy (Basel). .

Abstract

Chromatin immunoprecipitation combined with next-generation sequencing (ChIP-Seq) technology has enabled the identification of transcription factor binding sites (TFBSs) on a genome-wide scale. To effectively and efficiently discover TFBSs in the thousand or more DNA sequences generated by a ChIP-Seq data set, we propose a new algorithm named AP-ChIP. First, we set two thresholds based on probabilistic analysis to construct and further filter the cluster subsets. Then, we use Affinity Propagation (AP) clustering on the candidate cluster subsets to find the potential motifs. Experimental results on simulated data show that the AP-ChIP algorithm is able to make an almost accurate prediction of TFBSs in a reasonable time. Also, the validity of the AP-ChIP algorithm is tested on a real ChIP-Seq data set.

Keywords: ChIP-Seq; motif discovery; planted motif search; transcription factor binding sites.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Prediction accuracy for different values of α.

Similar articles

Cited by

References

    1. Stormo G.D. DNA binding sites: Representation and discovery. Bioinformatics. 2000;16:16–23. doi: 10.1093/bioinformatics/16.1.16. - DOI - PubMed
    1. Pevzner P.A., Sze S.H. ISMB. Volume 8. American Association for Artificial Intelligence; Menlo Park, CA, USA: 2000. Combinatorial approaches to finding subtle signals in DNA sequences; pp. 269–278. - PubMed
    1. Bailey T., Elkan C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 1994;2:28–36. - PubMed
    1. Yu Q., Huo H., Zhang Y., Guo H. PairMotif: A new pattern-driven algorithm for planted (l,d) DNA motif search. PLoS ONE. 2012;7:e48442. doi: 10.1371/journal.pone.0048442. - DOI - PMC - PubMed
    1. Chin F.Y., Leung H.C. Voting algorithms for discovering long motifs; Proceedings of the 3rd Asia-Pacific Bioinformatics Conference; Singapore. 17–21 January 2005; pp. 261–271.

LinkOut - more resources