Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Apr 4;7(1):614.
doi: 10.1038/s41598-017-00795-4.

RPI-Bind: a structure-based method for accurate identification of RNA-protein binding sites

Affiliations

RPI-Bind: a structure-based method for accurate identification of RNA-protein binding sites

Jiesi Luo et al. Sci Rep. .

Erratum in

Abstract

RNA and protein interactions play crucial roles in multiple biological processes, while these interactions are significantly influenced by the structures and sequences of protein and RNA molecules. In this study, we first performed an analysis of RNA-protein interacting complexes, and identified interface properties of sequences and structures, which reveal the diverse nature of the binding sites. With the observations, we built a three-step prediction model, namely RPI-Bind, for the identification of RNA-protein binding regions using the sequences and structures of both proteins and RNAs. The three steps include 1) the prediction of RNA binding regions on protein, 2) the prediction of protein binding regions on RNA, and 3) the prediction of interacting regions on both RNA and protein simultaneously, with the results from steps 1) and 2). Compared with existing methods, most of which employ only sequences, our model significantly improves the prediction accuracy at each of the three steps. Especially, our model outperforms the catRAPID by >20% at the 3rd step. All of these results indicate the importance of structures in RNA-protein interactions, and suggest that the RPI-Bind model is a powerful theoretical framework for studying RNA-protein interactions.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1
Figure 1
The step-wise work flow of the RPI-Bind prediction method. The whole work flow consists of two steps: training classification models and the applications. The model training process includes various processes, such as construction of the training dataset, feature extraction from sequences and structures in the training data set and development of ‘RPI-Bind’ method, consisting of three models. The developed models were then applied to solve three problems, including 1) the prediction of RNA binding regions on protein, 2) the prediction of protein binding regions on RNA, and 3) the prediction of interacting regions onboth RNA and protein simultaneously.
Figure 2
Figure 2
Statistical analysis of protein local conformations (PLCs) and RNA local conformations (RLCs) at and outside the interface for the four types of protein functional classes. (A) and (C) show the composition percentages of PLCs and RLCs at and outside the interfaces. The corresponding log-adds ratio to represent over and less at and outside the interfaces for PLCs and RLCs are given in (B) and (D). The mutual interaction propensity matrices between PLCs and RLCs are shown in (E). The left side values in (E) represent the total number of contacts between PLCs and RLCs and right side boxes represent their corresponding log-odds values. The four classes are shown from top to bottom, are enzymes, structural, regulatory and ‘other’, respectively.
Figure 3
Figure 3
The occurrences of amino acid and nucleotide sequences at and outside the interface for all types of protein functional classes (enzymes, structural, regulatory, and other). The occurrences of amino acid and nucleotide at the interface and outside are shown in (A) and (C), respectively. In (B) and (D), the log-odds ratio of amino acid and nucleotide shows the over and less populated amino acid and nucleotides at and outside the interfaces. Further, the mutual interaction propensities (log-odds value) between amino acids and nucleotides are given for all four classes in (E–H), respectively. In each figure, the values on the left side represent the total number of contacts between residue and nucleotide and right side boxes represent their corresponding log-odds values.
Figure 4
Figure 4
The performance of RNA binding site prediction. (A) Comparison of ROC curves for binding site prediction using different features on our constructed database. (B) Comparison of ROC curves for binding site prediction using different classifiers. (C) Comparison of ROC curves for binding site prediction on an independent dataset. (D) The importance and individual contribution ratio of each feature type.
Figure 5
Figure 5
Examples of predicted RNA-protein interacting complexes. The two examples are tRNA Pseudouridine Synthase B and CCA-Adding Enzyme Predicted RNA binding sites are shown in red and predicted non-binding sites in gray (left panels). Actual RNA binding sites in red and actual non-binding sites in gray (middle panels). The performance of prediction for individual residues, with true positives (TP) shown in red, false positives (FP) in blue, false negatives (FN) in orange, and true negatives (TNs) in gray (right panels). Thus, red + orange residues correspond to the actual binding residues; red + blue residues correspond to the predicted binding residues. All structure diagrams were generated using PyMol (http://www.pymol.org).
Figure 6
Figure 6
The performance of protein binding site prediction. (A) Comparison of ROC curves for binding site prediction using different features on our constructed database. (B) Comparison of ROC curves for binding site prediction using different classifiers. (C) Comparison of ROC curves for binding site prediction on an independent dataset. (D) The importance and individual contribution ratio of each feature type.
Figure 7
Figure 7
Examples of predicted RNA-protein interacting complexes. Examples of prediction results for four different RNA are shown from top to bottom, are T-RNA, 7S.S SRP RNA, Fragment of 23S rRNA and dsRNA, respectively. Predicted protein binding sites are shown in purple and predicted non-binding sites in yellow (left panels). Actual RNA binding sites in purple and actual non-binding sites in yellow (middle panels). The performance of prediction for individual nucleotides, with true positives (TP) shown in purple, false positives (FP) in blue, false negatives (FN) in green, and true negatives (TNs) in yellow (right panels). Thus, purple + green nucleotides correspond to the actual binding nucleotides; purple + blue nucleotides correspond to the predicted binding nucleotides. All structure diagrams were generated using PyMol (http://www.pymol.org).
Figure 8
Figure 8
Two example of protein-RNA complexes (PDB id: 1I6U and 3IAB). (A) and (C) Protein and RNA binding sites prediction results of 1I6U and 3IAB, respectively. The results are mapped onto the original structure where different prediction catalogs are represented by different colors; (B) and (D) Comparison of residue-nucleotide contacts prediction results by our 3rd step model and the catRAPID method (http://service.tartaglialab.com/page/catrapid_group).

References

    1. Lee JT. Epigenetic regulation by long noncoding RNAs. Science. 2012;338:1435–1439. doi: 10.1126/science.1231776. - DOI - PubMed
    1. Eddy SR. Non-coding RNA genes and the modern RNA world. Nature reviews. Genetics. 2001;2:919–929. doi: 10.1038/35103511. - DOI - PubMed
    1. Huttenhofer A, Schattner P, Polacek N. Non-coding RNAs: hope or hype? Trends in genetics: TIG. 2005;21:289–297. doi: 10.1016/j.tig.2005.03.007. - DOI - PubMed
    1. Hirota K, et al. Stepwise chromatin remodelling by a cascade of transcription initiation of non-coding RNAs. Nature. 2008;456:130–134. doi: 10.1038/nature07348. - DOI - PubMed
    1. Morris, K. V. Non-coding RNAs and epigenetic regulation of gene expression: Drivers of natural selection. (Horizon Scientific Press, 2012).

Publication types

LinkOut - more resources