Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011;12 Suppl 13(Suppl 13):S5.
doi: 10.1186/1471-2105-12-S13-S5. Epub 2011 Nov 30.

Prediction of dinucleotide-specific RNA-binding sites in proteins

Affiliations

Prediction of dinucleotide-specific RNA-binding sites in proteins

Michael Fernandez et al. BMC Bioinformatics. 2011.

Abstract

Background: Regulation of gene expression, protein synthesis, replication and assembly of many viruses involve RNA-protein interactions. Although some successful computational tools have been reported to recognize RNA binding sites in proteins, the problem of specificity remains poorly investigated. After the nucleotide base composition, the dinucleotide is the smallest unit of RNA sequence information and many RNA-binding proteins simply bind to regions enriched in one dinucleotide. Interaction preferences of protein subsequences and dinucleotides can be inferred from protein-RNA complex structures, enabling a training-based prediction approach.

Results: We analyzed basic statistics of amino acid-dinucleotide contacts in protein-RNA complexes and found their pairing preferences could be identified. Using a standard approach to represent protein subsequences by their evolutionary profile, we trained neural networks to predict multiclass target vectors corresponding to 16 possible contacting dinucleotide subsequences. In the cross-validation experiments, the accuracies of the optimum network, measured as areas under the curve (AUC) of the receiver operating characteristic (ROC) graphs, were in the range of 65-80%.

Conclusions: Dinucleotide-specific contact predictions have also been extended to the prediction of interacting protein and RNA fragment pairs, which shows the applicability of this method to predict targets of RNA-binding proteins. A web server predicting the 16-dimensional contact probability matrix directly from a user-defined protein sequence was implemented and made available at: http://tardis.nibio.go.jp/netasa/srcpred.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Chi-squared values of amino-acid dinucleotide contacts (a negative sign means the observed number was less than the expected value in that contact class). (a) Dinucleotide (x-axis) contact preferences with individual amino-acid residues (y-axis) in protein-RNA-complexes are displayed (high positive score means contacts are preferred). (b) Same as (a) but protein-DNA complex preferences are shown instead. (c) A scatter plot of contact preferences in protein-RNA versus protein-DNA complexes.
Figure 2
Figure 2
Performance of predicting contacts with 16 unique dinucleotides. Area under the ROC curve, specificity and sensitivity at peak F-score are plotted for the cross-validated models in terms of their ability to predict protein-RNA contacts corresponding to each of the possible 16 specific dinucleotides.
Figure 3
Figure 3
Comparison of prediction results of traditional non-specific RNA-binding site prediction approaches [12,13] with the proposed method. Figure shows that incorporating dinucleotide information improves resolution of RNA-binding surface and reduced the false positive rate.
Figure 4
Figure 4
Prediction performance for various RNA-binding protein classes. Protein-RNA complexes were grouped by their functional class and the prediction performance of our models within each category were evaluated.
Figure 5
Figure 5
Performance of model trained to predict RNA targets of RBPs. Using various RNA-sequences, dinucleotide contact prediction scores from proteins were transferred to each position on the RNA sequence, based on dinucleotide composition and corresponding peak prediction score. The ability of the model to score RNA sequences better in correspondence to correct protein partners was evaluated in contrast to high scoring RNA sequences in reference to wrong partners.

Similar articles

Cited by

References

    1. Hall K. RNA-protein interactions. Curr Opin Struct Biol. 2002;12:283–288. doi: 10.1016/S0959-440X(02)00323-8. - DOI - PubMed
    1. Tian B, Bevilacqua P, Diegelman-Parente A, Mathews M. The double-stranded-RNA-binding motif: Interference and much more. Nature (Rev Mol Cell Biol) 2004;5:1013–1023. doi: 10.1038/nrm1528. - DOI - PubMed
    1. Morozova N, Allers J, Myers J, Shamoo Y. Protein-RNA interactions: exploring binding patterns with a three-dimensional superposition analysis of high resolution structures. Bioinformatics. 2006;22:2746–2752. doi: 10.1093/bioinformatics/btl470. - DOI - PubMed
    1. Zheng S, Robertson T, Varani G. A knowledge-based potential function predicts the specificity and relative binding energy of RNA-binding proteins. FEBS J. 2007;274:6378–6391. - PubMed
    1. Chen Y, Kortemme T, Robertson T, Baker D, Varani G. A new hydrogen-bonding potential for the design of protein-RNA interactions predicts specific contacts and discriminates decoy. Nucl Acids Res. 2004;32:5147–5162. doi: 10.1093/nar/gkh785. - DOI - PMC - PubMed

Publication types