Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov;28(11):1469-1480.
doi: 10.1261/rna.079365.122. Epub 2022 Aug 25.

De novo prediction of RNA-protein interactions with graph neural networks

Affiliations

De novo prediction of RNA-protein interactions with graph neural networks

Viplove Arora et al. RNA. 2022 Nov.

Abstract

RNA-binding proteins (RBPs) are key co- and post-transcriptional regulators of gene expression, playing a crucial role in many biological processes. Experimental methods like CLIP-seq have enabled the identification of transcriptome-wide RNA-protein interactions for select proteins; however, the time- and resource-intensive nature of these technologies call for the development of computational methods to complement their predictions. Here, we leverage recent, large-scale CLIP-seq experiments to construct a de novo predictor of RNA-protein interactions based on graph neural networks (GNN). We show that the GNN method allows us not only to predict missing links in an RNA-protein network, but to predict the entire complement of targets of previously unassayed proteins, and even to reconstruct the entire network of RNA-protein interactions in different conditions based on minimal information. Our results demonstrate the potential of modern machine learning methods to extract useful information on post-transcriptional regulation from large data sets.

Keywords: RNA–protein interactions; graph neural networks; graphs; transfer learning.

PubMed Disclaimer

Figures

FIGURE 1.
FIGURE 1.
Pictorial representation of the framework presented in this paper: the raw data is transformed to obtain node features, positive and negative interactions, which serve as input for the GNN. The trained model is used for making predictions as shown using the genome tracks.
FIGURE 2.
FIGURE 2.
Comparing the performance of the two GCN settings while varying the size of the final embedding (left) and the number of layers in GCN (right). The test set contains 20% edges and the hidden embedding size is set to 50. The error bars show the standard deviation on 10 independent trials.
FIGURE 3.
FIGURE 3.
Comparing the performance of various models for de novo prediction in K562 cell line. Each box shows the distribution of mean AUROC (A) or average precision (B) over the entire set of proteins when the model is tested for a single protein in the test set.
FIGURE 4.
FIGURE 4.
The plots show representative genome tracks produced using the eCLIP data annotated by predictions made by our model under four different outcomes: true positives in A, true negatives in B, false positives in D, and false negatives in E. We consider two proteins, BUD13 and DDX24, in the inductive link prediction task for the K562 cell line. Positive predictions are shown in green and negative predictions in red. We also plot the distribution of reads (Fig. 4C,F) for the two proteins under the four outcomes.
FIGURE 5.
FIGURE 5.
ROC curve for transfer learning from K562 to HepG2 cell line for GCN (RNA). Red dot corresponds to the false positive and true positive rates if the edges from the source cell line are directly transferred to the target cell line.

References

    1. Agostini F, Zanzoni A, Klus P, Marchese D, Cirillo D, Tartaglia GG. 2013. catRAPID omics: a web server for large-scale prediction of protein–RNA interactions. Bioinformatics 29: 2928–2930. 10.1093/bioinformatics/btt495 - DOI - PMC - PubMed
    1. Al Hasan M, Chaoji V, Salem S, Zaki M. 2006. Link prediction using supervised learning. In SDM06: workshop on link analysis, counter-terrorism and security, Vol. 30, pp. 798–805. https://www.cs.rpi.edu/~zaki/PaperDir/LINK06.pdf
    1. Alipanahi B, Delong A, Weirauch MT, Frey BJ. 2015. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol 33: 831–838. 10.1038/nbt.3300 - DOI - PubMed
    1. Asgari E, Mofrad MR. 2015. Continuous distributed representation of biological sequences for deep proteomics and genomics. PLoS ONE 10: e0141287. 10.1371/journal.pone.0141287 - DOI - PMC - PubMed
    1. Brannan KW, Jin W, Huelga SC, Banks CA, Gilmore JM, Florens L, Washburn MP, Van Nostrand EL, Pratt GA, Schwinn MK, et al. 2016. SONAR discovers RNA-binding proteins from analysis of large-scale protein-protein interactomes. Mol Cell 64: 282–293. 10.1016/j.molcel.2016.09.003 - DOI - PMC - PubMed

LinkOut - more resources