Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jun 12;7(1):3217.
doi: 10.1038/s41598-017-03554-7.

WSMD: weakly-supervised motif discovery in transcription factor ChIP-seq data

Affiliations

WSMD: weakly-supervised motif discovery in transcription factor ChIP-seq data

Hongbo Zhang et al. Sci Rep. .

Abstract

Although discriminative motif discovery (DMD) methods are promising for eliciting motifs from high-throughput experimental data, due to consideration of computational expense, most of existing DMD methods have to choose approximate schemes that greatly restrict the search space, leading to significant loss of predictive accuracy. In this paper, we propose Weakly-Supervised Motif Discovery (WSMD) to discover motifs from ChIP-seq datasets. In contrast to the learning strategies adopted by previous DMD methods, WSMD allows a "global" optimization scheme of the motif parameters in continuous space, thereby reducing the information loss of model representation and improving the quality of resultant motifs. Meanwhile, by exploiting the connection between DMD framework and existing weakly supervised learning (WSL) technologies, we also present highly scalable learning strategies for the proposed method. The experimental results on both real ChIP-seq datasets and synthetic datasets show that WSMD substantially outperforms former DMD methods (including DREME, HOMER, XXmotif, motifRG and DECOD) in terms of predictive accuracy, while also achieving a competitive computational speed.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1
Figure 1
An overview of object detection and discriminative motif discovery. (a) Object detection. (b) Discriminative motif discovery.
Figure 2
Figure 2
3-fold cross-validation test performance on three reference-free evaluation criteria over 134 datasets. The performances of six methods on same dataset were plotted on one horizontal bar while differing in colors. In this way, the lines with different colors in one horizontal bar present the performance archived by corresponding tools, and the height of box with different colors can show the performance improvement of corresponding tools compared with the one performing more poorly. (a) 3-fold cross-validation test performance on AUC over 134 datasets. (b) 3-fold cross-validation test performance on Fisher’s Exact Test score over 134 datasets. (c) 3-fold cross-validation test performance on Minimal Hyper-Geometric score over 134 datasets.
Figure 3
Figure 3
Performance comparison of different refinement and extension strategies. For each IC value, we show the average performance obtained by using each tools over 10 distinct synthetic datasets. (a) Performance comparison of different refinement strategies. (b) Performance comparison of different extension strategies.
Figure 4
Figure 4
Comparison of running time (seconds) for DREME, HOMER, motifRG, DECOD and WSMD.

References

    1. Elnitski L, Jin VX, Farnham PJ, Jones SJM. Locating mammalian transcription factor binding sites: A survey of computational and experimental techniques. Genome Research. 2006;16:1455–1464. doi: 10.1101/gr.4140006. - DOI - PubMed
    1. Zhao, Y., Granas, D. & Stormo, G. D. Inferring Binding Energies from Selected Binding Sites. Plos Computational Biology5 (2009). - PMC - PubMed
    1. Wang B, Valentine S, Raghuraman S, Plasencia M, Zhang X. Prediction of peptide drift time in ion mobility-mass spectrometry. BMC Bioinformatics. 2009;10:S9. - PMC - PubMed
    1. Zhang ZZ, Chang CW, Hugo W, Cheung E, Sung WK. Simultaneously Learning DNA Motif Along with Its Position and Sequence Rank Preferences Through Expectation Maximization Algorithm. Journal Of Computational Biology. 2013;20:237–248. doi: 10.1089/cmb.2012.0233. - DOI - PubMed
    1. Ji Z, et al. Systemic modeling myeloma-osteoclast interactions under normoxic/hypoxic condition using a novel computational approach. Scientific Reports. 2014;5:13291. doi: 10.1038/srep13291. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources