A Clustering Approach for Motif Discovery in ChIP-Seq Dataset
- PMID: 33267515
- PMCID: PMC7515331
- DOI: 10.3390/e21080802
A Clustering Approach for Motif Discovery in ChIP-Seq Dataset
Abstract
Chromatin immunoprecipitation combined with next-generation sequencing (ChIP-Seq) technology has enabled the identification of transcription factor binding sites (TFBSs) on a genome-wide scale. To effectively and efficiently discover TFBSs in the thousand or more DNA sequences generated by a ChIP-Seq data set, we propose a new algorithm named AP-ChIP. First, we set two thresholds based on probabilistic analysis to construct and further filter the cluster subsets. Then, we use Affinity Propagation (AP) clustering on the candidate cluster subsets to find the potential motifs. Experimental results on simulated data show that the AP-ChIP algorithm is able to make an almost accurate prediction of TFBSs in a reasonable time. Also, the validity of the AP-ChIP algorithm is tested on a real ChIP-Seq data set.
Keywords: ChIP-Seq; motif discovery; planted motif search; transcription factor binding sites.
Conflict of interest statement
The authors declare no conflict of interest.
Figures
Similar articles
-
A Fast Cluster Motif Finding Algorithm for ChIP-Seq Data Sets.Biomed Res Int. 2015;2015:218068. doi: 10.1155/2015/218068. Epub 2015 Jul 5. Biomed Res Int. 2015. PMID: 26236718 Free PMC article.
-
Application of experimentally verified transcription factor binding sites models for computational analysis of ChIP-Seq data.BMC Genomics. 2014 Jan 29;15(1):80. doi: 10.1186/1471-2164-15-80. BMC Genomics. 2014. PMID: 24472686 Free PMC article.
-
Sequential Integration of Fuzzy Clustering and Expectation Maximization for Transcription Factor Binding Site Identification.J Comput Biol. 2018 Nov;25(11):1247-1256. doi: 10.1089/cmb.2017.0230. Epub 2018 Aug 22. J Comput Biol. 2018. PMID: 30133315
-
An algorithmic perspective of de novo cis-regulatory motif finding based on ChIP-seq data.Brief Bioinform. 2018 Sep 28;19(5):1069-1081. doi: 10.1093/bib/bbx026. Brief Bioinform. 2018. PMID: 28334268 Review.
-
Role of ChIP-seq in the discovery of transcription factor binding sites, differential gene regulation mechanism, epigenetic marks and beyond.Cell Cycle. 2014;13(18):2847-52. doi: 10.4161/15384101.2014.949201. Cell Cycle. 2014. PMID: 25486472 Free PMC article. Review.
Cited by
-
Suppression of motion artifacts in intravascular photoacoustic image sequences.Biomed Opt Express. 2021 Oct 14;12(11):6909-6927. doi: 10.1364/BOE.440975. eCollection 2021 Nov 1. Biomed Opt Express. 2021. PMID: 34858688 Free PMC article.
References
-
- Pevzner P.A., Sze S.H. ISMB. Volume 8. American Association for Artificial Intelligence; Menlo Park, CA, USA: 2000. Combinatorial approaches to finding subtle signals in DNA sequences; pp. 269–278. - PubMed
-
- Bailey T., Elkan C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 1994;2:28–36. - PubMed
-
- Chin F.Y., Leung H.C. Voting algorithms for discovering long motifs; Proceedings of the 3rd Asia-Pacific Bioinformatics Conference; Singapore. 17–21 January 2005; pp. 261–271.
Grants and funding
LinkOut - more resources
Full Text Sources