Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Dec:2016:178-183.
doi: 10.1109/bibm.2016.7822515. Epub 2017 Jan 19.

DeeperBind: Enhancing Prediction of Sequence Specificities of DNA Binding Proteins

Affiliations

DeeperBind: Enhancing Prediction of Sequence Specificities of DNA Binding Proteins

Hamid Reza Hassanzadeh et al. Proceedings (IEEE Int Conf Bioinformatics Biomed). 2016 Dec.

Abstract

Transcription factors (TFs) are macromolecules that bind to cis-regulatory specific sub-regions of DNA promoters and initiate transcription. Finding the exact location of these binding sites (aka motifs) is important in a variety of domains such as drug design and development. To address this need, several in vivo and in vitro techniques have been developed so far that try to characterize and predict the binding specificity of a protein to different DNA loci. The major problem with these techniques is that they are not accurate enough in prediction of the binding affinity and characterization of the corresponding motifs. As a result, downstream analysis is required to uncover the locations where proteins of interest bind. Here, we propose DeeperBind, a long short term recurrent convolutional network for prediction of protein binding specificities with respect to DNA probes. DeeperBind can model the positional dynamics of probe sequences and hence reckons with the contributions made by individual sub-regions in DNA sequences, in an effective way. Moreover, it can be trained and tested on datasets containing varying-length sequences. We apply our pipeline to the datasets derived from protein binding microarrays (PBMs), an in-vitro high-throughput technology for quantification of protein-DNA binding preferences, and present promising results. To the best of our knowledge, this is the most accurate pipeline that can predict binding specificities of DNA sequences from the data produced by high-throughput technologies through utilization of the power of deep learning for feature generation and positional dynamics modeling.

PubMed Disclaimer

Figures

Fig. 1:
Fig. 1:
Recurrent neural networks. (a) Diagram of an LSTM memory cell used in this paper. Small pink circles denote pointwise operations, arrows indicate vector transfer, merged arrows mean concatenation of vectors and split arrows indicate vector copying operation. (b) concatenation of a few LSTM cells to build a deep LSTM network.
Fig. 2:
Fig. 2:
Block diagram of DeeperBind. The input sequences are first represented as 2D binary matrices via one-hot coding. A convolutional layer generates the feature map by applying several PWM-like filters followed by rectified linear units. No pooling layer is used. Two stacks of LSTM layers then capture the sequential dependencies of the sub-motifs on probes.
Fig. 3:
Fig. 3:
The predicted rank of the top 100 positive probes (black lines) in array #1 for the DeepBind (left) and the DeeperBind (right) per each TF: a) CEH-22 and b) Oct-1.
Fig. 4:
Fig. 4:
Scatter plot of predicted and measured intensities: DeepBind vs DeeperBind. a) CEH-22 and b) Oct-1.
Fig. 4:
Fig. 4:
Scatter plot of predicted and measured intensities: DeepBind vs DeeperBind. a) CEH-22 and b) Oct-1.
Fig. 5:
Fig. 5:
Receiver Operating Characteristic (ROC) curves on array #2. Samples are marked as positive if their measured intensity exceed m + 4σ where m is the and σ is the median absolute deviation of all probe normalized intensities divided by 0.6745.

References

    1. Alipanahi Babak, Delong Andrew, Matthew T Weirauch, and Brendan J Frey. Predicting the sequence specificities of dna-and rna-binding proteins by deep learning. Nature biotechnology, 2015. - PubMed
    1. Berger Michael F. and Bulyk Martha L. Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors. Nature protocols, 4(3):393–411, 2009. - PMC - PubMed
    1. Berger Michael F., Philippakis Anthony A., Qureshi Aaron M., He Fangxue S., Estep Preston W., and Bulyk Martha L. Compact, universal DNA microarrays to comprehensively determine transcriptionfactor binding site specificities. Nature biotechnology, 24(11):1429–1435, 2006. - PMC - PubMed
    1. Chen Xiaoyu, Hughes Timothy R., and Morris Quaid. RankMotif++: a motif-search algorithm that accounts for relative ranks of K-mers in binding transcription factors. Bioinformatics, 23(13):i72–i79, 2007. - PubMed
    1. Donahue Jeffrey, Lisa Anne Hendricks Sergio Guadarrama, Rohrbach Marcus, Venugopalan Subhashini, Saenko Kate, and Darrell Trevor. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2625–2634, 2015. - PubMed

LinkOut - more resources