Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Feb 28;18(1):136.
doi: 10.1186/s12859-017-1561-8.

RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach

Affiliations

RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach

Xiaoyong Pan et al. BMC Bioinformatics. .

Abstract

Background: RNAs play key roles in cells through the interactions with proteins known as the RNA-binding proteins (RBP) and their binding motifs enable crucial understanding of the post-transcriptional regulation of RNAs. How the RBPs correctly recognize the target RNAs and why they bind specific positions is still far from clear. Machine learning-based algorithms are widely acknowledged to be capable of speeding up this process. Although many automatic tools have been developed to predict the RNA-protein binding sites from the rapidly growing multi-resource data, e.g. sequence, structure, their domain specific features and formats have posed significant computational challenges. One of current difficulties is that the cross-source shared common knowledge is at a higher abstraction level beyond the observed data, resulting in a low efficiency of direct integration of observed data across domains. The other difficulty is how to interpret the prediction results. Existing approaches tend to terminate after outputting the potential discrete binding sites on the sequences, but how to assemble them into the meaningful binding motifs is a topic worth of further investigation.

Results: In viewing of these challenges, we propose a deep learning-based framework (iDeep) by using a novel hybrid convolutional neural network and deep belief network to predict the RBP interaction sites and motifs on RNAs. This new protocol is featured by transforming the original observed data into a high-level abstraction feature space using multiple layers of learning blocks, where the shared representations across different domains are integrated. To validate our iDeep method, we performed experiments on 31 large-scale CLIP-seq datasets, and our results show that by integrating multiple sources of data, the average AUC can be improved by 8% compared to the best single-source-based predictor; and through cross-domain knowledge integration at an abstraction level, it outperforms the state-of-the-art predictors by 6%. Besides the overall enhanced prediction performance, the convolutional neural network module embedded in iDeep is also able to automatically capture the interpretable binding motifs for RBPs. Large-scale experiments demonstrate that these mined binding motifs agree well with the experimentally verified results, suggesting iDeep is a promising approach in the real-world applications.

Conclusion: The iDeep framework not only can achieve promising performance than the state-of-the-art predictors, but also easily capture interpretable binding motifs. iDeep is available at http://www.csbio.sjtu.edu.cn/bioinf/iDeep.

Keywords: CLIP-seq; Convolutional neural network; Deep belief network; Multimodal deep learning; RNA-binding protein.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
The flowchart of proposed iDeep for predicting RNA-protein binding sites on RNAs. It firstly extracted different representation for RNA-protein binding sites within a windows size 101, then use multimodal deep learning consisting of DBNs and CNNs to integrate these extracted representations to predict RBP interaction sites
Fig. 2
Fig. 2
ROC Performance. The ROC curve for predicting RNA-protein binding sites on 31 experiment dataset
Fig. 3
Fig. 3
Performance of individual modalities. The comparison for predicting RNA-protein binding sites on 31 experiment dataset using iDeep and individual modalities
Fig. 4
Fig. 4
The correlation between different modalities on 31 experiment dataset. The pearson correlation coefficient values are calculated using the AUCs from 31 experiments for individual modalities
Fig. 5
Fig. 5
iDeep captures known motifs in [34] from CISBP-RNA for proteins. We only compared our predicted motifs against known motifs in study [34] and the motif name is from CISBP-RNA. If there is no motifs for this protein, then we ignore them. - means no matched motifs in our predictions with e-value cut-off 0.05
Fig. 6
Fig. 6
The identified binding motifs by iDeep. a The heatmap of learned weights of convolve filters of CNN and corresponding matched known motifs for this filter. From the left to the right, they are motifs of protein TDP-43, IGFBP1-3, and Ago2. b The hierarchical clustering using the cosine distance of 102 filters for protein TDP-43. c The heatmap of learned weights of two convolve filters and corresponding motif logos for protein TDP-43, they are still not verified novel motifs detected by iDeep

Similar articles

Cited by

References

    1. Ferrè F, Colantoni A, Helmer-Citterich M. Revealing protein-lncRNA interaction. Brief Bioinform. 2015;17:106–16. doi: 10.1093/bib/bbv031. - DOI - PMC - PubMed
    1. Bartel DP. MicroRNAs: target recognition and regulatory functions. Cell. 2009;136:215–33. doi: 10.1016/j.cell.2009.01.002. - DOI - PMC - PubMed
    1. Ray D, Kazan H, Chan ET, Peña Castillo L, Chaudhry S, Talukder S, et al. Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins. Nat Biotechnol. 2009;27:667–70. doi: 10.1038/nbt.1550. - DOI - PubMed
    1. Hafner M, Landthaler M, Burger L, Khorshid M, Hausser J, Berninger P, et al. Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell. 2010;141:129–41. doi: 10.1016/j.cell.2010.03.009. - DOI - PMC - PubMed
    1. Stražr M, žitnik M, Zupan B, Ule J, Curk T. Orthogonal matrix factorization enables integrative analysis of multiple RNA binding proteins. Bioinformatics. 2016;32:1527–35. doi: 10.1093/bioinformatics/btw003. - DOI - PMC - PubMed

Publication types