Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 12;18(7):e1010293.
doi: 10.1371/journal.pcbi.1010293. eCollection 2022 Jul.

RNANetMotif: Identifying sequence-structure RNA network motifs in RNA-protein binding sites

Affiliations

RNANetMotif: Identifying sequence-structure RNA network motifs in RNA-protein binding sites

Hongli Ma et al. PLoS Comput Biol. .

Abstract

RNA molecules can adopt stable secondary and tertiary structures, which are essential in mediating physical interactions with other partners such as RNA binding proteins (RBPs) and in carrying out their cellular functions. In vivo and in vitro experiments such as RNAcompete and eCLIP have revealed in vitro binding preferences of RBPs to RNA oligomers and in vivo binding sites in cells. Analysis of these binding data showed that the structure properties of the RNAs in these binding sites are important determinants of the binding events; however, it has been a challenge to incorporate the structure information into an interpretable model. Here we describe a new approach, RNANetMotif, which takes predicted secondary structure of thousands of RNA sequences bound by an RBP as input and uses a graph theory approach to recognize enriched subgraphs. These enriched subgraphs are in essence shared sequence-structure elements that are important in RBP-RNA binding. To validate our approach, we performed RNA structure modeling via coarse-grained molecular dynamics folding simulations for selected 4 RBPs, and RNA-protein docking for LIN28B. The simulation results, e.g., solvent accessibility and energetics, further support the biological relevance of the discovered network subgraphs.

PubMed Disclaimer

Conflict of interest statement

none

Figures

Fig 1
Fig 1
(A) Workflow of RNANetMotif (First Part). Step1. Predicting base-pairings of protein-bound RNA sequences and graph representation. The representation includes nucleotide information (vertex feature vector) and base-pairing information (adjacency matrix), here depicted with different colors to distinguish backbone links (grey) from base pairing (red). Step 2. GraphK partition algorithm (see Methods) to obtain final EKS pool. (B). Workflow of RNANetMotif (Second Part). Step3. Calculate HVDM (Heterogeneous Value Difference Metric) distance matrix and construct similarity network of EKSes. Step 4. Detect significant network modules and then identify intrinsic EKS motifs. Step5. RNA 3D structure modeling via discrete molecular dynamics folding simulations and protein-RNA docking with simulation for validation.
Fig 2
Fig 2. Definition and classification of EKSes.
As displayed, there are three categories of EKS: opposite-direction extensions (right-opened and left-opened), mixed-mode extensions and same-direction extensions. Black edges represent backbone bonds, red edges represent base pair interaction between i and j, green dashed edges represent possible base pair interactions.
Fig 3
Fig 3. Distribution of gap lengths among base-pairings for individual RBPs.
Fig 4
Fig 4. Distribution of the frequency of top sequence- and structure- instances in final EKS pool for 5 RBPs.
Fig 5
Fig 5. Combined sequence and structure logos of significant modules of 16 RBPs.
Left-opened and right-opened EKSes of different sizes are displayed.
Fig 6
Fig 6. Boxplots of average atom counts calculated from DMD modeled 3D structures of discovered RNA network motifs and other regions of the RNA.
Fig 7
Fig 7. Complex simulation ensemble of docking of LIN28B CSD domain and three RNA network motifs.
MD ensembles and snapshot of the protein-RNA interfaces are show at the top and bottom respectively. LIN28B CSD protein is shown in red and 100 nt RNA shown in cyan and blue. The identified RNA network motifs are shown in yellow in both structure and in sequence.
Fig 8
Fig 8. Comparison with other sequence-structure motif predictors on 16 eCLIP datasets.

Similar articles

Cited by

References

    1. Van Nostrand EL, Freese P, Pratt GA, Wang X, Wei X, Xiao R, et al.. A large-scale binding and functional map of human RNA-binding proteins. Nature. 2020;583(7818):711–9. doi: 10.1038/s41586-020-2077-3 - DOI - PMC - PubMed
    1. Neelamraju Y, Hashemikhabir S, Janga SC. The human RBPome: from genes and proteins to human disease. J Proteomics. 2015;127(Pt A):61–70. doi: 10.1016/j.jprot.2015.04.031 - DOI - PubMed
    1. Matia-Gonzalez AM, Laing EE, Gerber AP. Conserved mRNA-binding proteomes in eukaryotic organisms. Nat Struct Mol Biol. 2015;22(12):1027–33. doi: 10.1038/nsmb.3128 - DOI - PMC - PubMed
    1. Gerstberger S, Hafner M, Tuschl T. A census of human RNA-binding proteins. Nature Reviews Genetics. 2014;15(12):829–45. doi: 10.1038/nrg3813 - DOI - PMC - PubMed
    1. Castello A, Fischer B, Eichelbaum K, Horos R, Beckmann BM, Strein C, et al.. Insights into RNA Biology from an Atlas of Mammalian mRNA-Binding Proteins. Cell. 2012:1–14. doi: 10.1016/j.cell.2012.04.031 - DOI - PubMed

Publication types