Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jun 7;17(1):231.
doi: 10.1186/s12859-016-1110-x.

Accurate prediction of RNA-binding protein residues with two discriminative structural descriptors

Affiliations

Accurate prediction of RNA-binding protein residues with two discriminative structural descriptors

Meijian Sun et al. BMC Bioinformatics. .

Abstract

Background: RNA-binding proteins participate in many important biological processes concerning RNA-mediated gene regulation, and several computational methods have been recently developed to predict the protein-RNA interactions of RNA-binding proteins. Newly developed discriminative descriptors will help to improve the prediction accuracy of these prediction methods and provide further meaningful information for researchers.

Results: In this work, we designed two structural features (residue electrostatic surface potential and triplet interface propensity) and according to the statistical and structural analysis of protein-RNA complexes, the two features were powerful for identifying RNA-binding protein residues. Using these two features and other excellent structure- and sequence-based features, a random forest classifier was constructed to predict RNA-binding residues. The area under the receiver operating characteristic curve (AUC) of five-fold cross-validation for our method on training set RBP195 was 0.900, and when applied to the test set RBP68, the prediction accuracy (ACC) was 0.868, and the F-score was 0.631.

Conclusions: The good prediction performance of our method revealed that the two newly designed descriptors could be discriminative for inferring protein residues interacting with RNAs. To facilitate the use of our method, a web-server called RNAProSite, which implements the proposed method, was constructed and is freely available at http://lilab.ecust.edu.cn/NABind .

Keywords: Protein-RNA interactions; Random forest classifier; Residue electrostatic surface potential; Residue triplet interface propensity; Structural analysis.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
The distribution of electrostatic surface potentials for both positive (RNA-binding) and negative (non-RNA-binding) samples in RBP195, the cross point of the two distribution curves is at (0.014, 0.044)
Fig. 2
Fig. 2
The distribution of patch types in positive samples and negative samples. The residues neither in the largest positive patch nor in the largest negative patch of each chain in RBP195 are labeled as “Other residues”
Fig. 3
Fig. 3
The triplet interface propensities for residues in protein 1QTQ_A (a and b) and 2ZZM_A (c and d). In A and C, the residues colored from white to blue (stands for propensity values from −0.0149 to 3.1023), and the darker the blue color of the residues, the more likely the residues are involved in RNA-protein interactions. In B and D, the residues having triplet interface propensities larger than the average propensity value are colored blue, the residues interacting with RNA is colored red, those residues colored yellow are the overlaps of the residues colored blue and red. All of the RNA molecules are colored orange
Fig. 4
Fig. 4
Interactions between RNAs and four types of interface triplets in different proteins. The two interface triplets in (a) are same as ERG and the first is from protein 2HVY_D, the second is from 2ZIO_A. The two interface triplets in (b) are same as DRV (the first and the second interface triplets are from 4LGT_A and 4GOA_A, respectively). In (c), two interface triplets HKF are from 3BX2_A and 3K49_A, respectively. The two interface triplets in (d) are same as KRR and the first is from 1VQ4_1, the second is from 2ZIO_A. The main chains of RNAs are colored orange
Fig. 5
Fig. 5
The ROC curve of five-fold cross-validation for our method on RBP195
Fig. 6
Fig. 6
The prediction results of RNAProSite on four RBP chains. A residue is colored blue when it is falsely predicted as RNA-binding and green when it is truly predicted as RNA-binding. The residues colored by yellow mean they are truly predicted as RBRs but not predicted by other methods. The RNA is colored orange. The PDB codes of the four RBP chain in (a), (b), (c) and (d) are 4GLT (chain A), 2AZX (chain A), 3QJJ (chain A) and 3ZGZ (chain A), respectively

Similar articles

Cited by

References

    1. Glisovic T, Bachorik JL, Yong J, Dreyfuss G. RNA-binding proteins and post-transcriptional gene regulation. FEBS Lett. 2008;582(14):1977–1986. doi: 10.1016/j.febslet.2008.03.004. - DOI - PMC - PubMed
    1. Lukong KE, Chang KW, Khandjian EW, Richard S. RNA-binding proteins in human genetic disease. Trends Genet. 2008;24(8):416–425. doi: 10.1016/j.tig.2008.05.004. - DOI - PubMed
    1. Konig J, Zarnack K, Luscombe NM, Ule J. Protein-RNA interactions: new genomic technologies and perspectives. Nat Rev Genet. 2011;13(2):77–83. doi: 10.1038/nrg3141. - DOI - PubMed
    1. Ascano M, Gerstberger S, Tuschl T. Multi-disciplinary methods to define RNA-protein interactions and regulatory networks. Curr Opin Genet Dev. 2013;23(1):20–28. doi: 10.1016/j.gde.2013.01.003. - DOI - PMC - PubMed
    1. Obayashi E, Oubridge C, Pomeranz Krummel D, Nagai K. Crystallization of RNA-protein complexes. Methods Mol Biol. 2007;363:259–276. doi: 10.1007/978-1-59745-209-0_13. - DOI - PubMed

LinkOut - more resources