Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2006 Aug;12(8):1450-62.
doi: 10.1261/rna.2197306. Epub 2006 Jun 21.

Prediction of RNA binding sites in proteins from amino acid sequence

Affiliations
Comparative Study

Prediction of RNA binding sites in proteins from amino acid sequence

Michael Terribilini et al. RNA. 2006 Aug.

Abstract

RNA-protein interactions are vitally important in a wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses. We have developed a computational tool for predicting which amino acids of an RNA binding protein participate in RNA-protein interactions, using only the protein sequence as input. RNABindR was developed using machine learning on a validated nonredundant data set of interfaces from known RNA-protein complexes in the Protein Data Bank. It generates a classifier that captures primary sequence signals sufficient for predicting which amino acids in a given protein are located in the RNA-protein interface. In leave-one-out cross-validation experiments, RNABindR identifies interface residues with >85% overall accuracy. It can be calibrated by the user to obtain either high specificity or high sensitivity for interface residues. RNABindR, implementing a Naive Bayes classifier, performs as well as a more complex neural network classifier (to our knowledge, the only previously published sequence-based method for RNA binding site prediction) and offers the advantages of speed, simplicity and interpretability of results. RNABindR predictions on the human telomerase protein hTERT are in good agreement with experimental data. The availability of computational tools for predicting which residues in an RNA binding protein are likely to contact RNA should facilitate design of experiments to directly test RNA binding function and contribute to our understanding of the diversity, mechanisms, and regulation of RNA-protein complexes in biological systems. (RNABindR is available as a Web tool from http://bindr.gdcb.iastate.edu.).

PubMed Disclaimer

Figures

FIGURE 1.
FIGURE 1.
Certain amino acids are highly favored in RNA–protein interfaces. Interface propensities for the indicated amino acids are shown as solid bars; the hatched bars to the left and right of the solid bar are the propensities for the amino acid to occur in the position immediately before or after an interface residue, respectively. The residues are placed in the order of increasing hydrophobicity based on the (Kyte and Doolittle 1982) hydropathy index.
FIGURE 2.
FIGURE 2.
RNA binding residues tend to occur in clusters within primary sequence. The log likelihood that a position neighboring an interface residue also contains an interface residue based on the nonredundant data set of 109 RNA binding proteins. The hatched portion of the bars represents the log likelihood for the entire data set of 109 proteins. The solid portion of the bars represents the log likelihood for the ribosomal protein subset of 55 proteins. Likelihood values >0 mean that the position has higher probability than random of also being an interface residue.
FIGURE 3.
FIGURE 3.
Receiver operating characteristic (ROC) curve for RNABindR predictions. The ROC curve illustrates how varying the cutoff threshold θ determines the trade-off between sensitivity+ and false positive rate (1-specificity), where specificity is defined as FP/(FP + TN). Results shown are for an input window of 25 amino acids.
FIGURE 5.
FIGURE 5.
RNABindR sensitivity and specificity trade-off. Changing the value of the threshold parameter θ causes a trade-off between specificity and sensitivity in predicting RNA binding residues. The example shown here is the double-stranded RNA binding protein from Xenopus, PDB ID 1DI2:A, also shown in Figure 4B. The color scheme in this figure is the same as in Figure 4.
FIGURE 4.
FIGURE 4.
Predictions mapped onto three-dimensional structures of RNA binding proteins. Examples of RNABindR results for four different types of RNA–protein complexes are shown: (A) ribosomal protein L15, PDB 1JJ2:K (Klein et al. 2001); (B) Xenopus dsRNA binding protein, PDB 1DI2:A (Ryter and Schultz 1998); (C) Ebola virus Vp40, PDB 1H2C:A (Gomis-Ruth et al. 2003); (D) tRNA pseudouridine synthase, PDB 1R3E:A (Pan et al. 2003). Predicted RNA binding sites, with predicted interface residues shown in red and predicted noninterface residues in gray (left panels). Actual RNA binding sites, with actual interface residues in red and actual noninterface residues in gray (middle panels). The performance of RNABindR for individual residues, with true positives (TPs) shown in red, false positives (FPs) in blue, false negatives (FNs) in yellow, and true negatives (TNs) in gray (right panels). Thus, in this representation, red + yellow residues correspond to the actual interface (derived from the PDB structure), red + gray residues correspond to correctly predicted residues (both interface and noninterface), and blue + yellow residues correspond to misclassified residues. Results shown were predicted with RNABindR using an input window of 25 amino acids and θ = 0.5. All structure diagrams were generated using PyMol (http://www.pymol.org).
FIGURE 6.
FIGURE 6.
RNABindR predictions on telomerase reverse transcriptase (TERT). Mapped functional domains and conserved motifs of TERT are shown at the top. Shaded boxes on lines labeled “Predictions” show clusters of predicted RNA interface residues. (A) Human telomerase reverse transcriptase (hTERT). Boundaries of two major RNA interaction domains (RIDs) indicated by open boxes (Moriarty et al. 2005). The amino acid sequence that includes one of the clusters of predicted RNA-interface residues, located in RID2, is shown at the bottom. Two boxed regions, amino acids 481–490 and amino acids 508–517, correspond to deletion mutations that have been shown to decrease hTERT RNA binding activity by 60% and 70%, respectively (Moriarty et al. 2002). Individual interface residues predicted by RNABindR are indicated by + below the sequence. (B) Tetrahymena thermophila telomerase reverse transcriptase (tTERT). The two RNA binding domains are indicated by open boxes. The amino acid sequence of the C-terminal end of the TEN RNA binding domain is shown, with individual interface residues predicted by RNABindR indicated by + below the sequence. Removing residues 1–12 and 182–191 (boxed in the sequence view) abolished RNA binding of the TEN domain construct (Jacobs et al. 2005, 2006). RNABindR predicts a cluster of interface residues in residues 182–191, but no interface residues are predicted in residues 1–12. (N) N terminus, (TEN) telomerase essential N-terminal domain, (GQ, CP, QFP, and T) conserved sequence motifs, (RT) reverse transcriptase domain.

References

    1. Allers J., Shamoo Y. Structure-based analysis of protein–RNA interactions using the program ENTANGLE. J. Mol. Biol. 2001;311:75–86. - PubMed
    1. Autexier C., Lue N.F. The structure and function of telomerase reverse transcriptase. Annu. Rev. Biochem. 2006;75:493–517. - PubMed
    1. Bachand F., Autexier C. Functional regions of human telomerase reverse transcriptase and human telomerase RNA required for telomerase activity and RNA–protein interactions. Mol. Cell. Biol. 2001;21:1888–1897. - PMC - PubMed
    1. Baldi P., Brunak S., Chauvin Y., Andersen C.A., Nielsen H. Assessing the accuracy of prediction algorithms for classification: An overview. Bioinformatics. 2000;16:412–424. - PubMed
    1. Berman H.M., Westbrook J., Feng Z., Gilliland G., Bhat T.N., Weissig H., Shindyalov I.N., Bourne P.E. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources