Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jun 7;70(5):854-867.e9.
doi: 10.1016/j.molcel.2018.05.001. Epub 2018 Jun 7.

Sequence, Structure, and Context Preferences of Human RNA Binding Proteins

Affiliations

Sequence, Structure, and Context Preferences of Human RNA Binding Proteins

Daniel Dominguez et al. Mol Cell. .

Abstract

RNA binding proteins (RBPs) orchestrate the production, processing, and function of mRNAs. Here, we present the affinity landscapes of 78 human RBPs using an unbiased assay that determines the sequence, structure, and context preferences of these proteins in vitro by deep sequencing of bound RNAs. These data enable construction of "RNA maps" of RBP activity without requiring crosslinking-based assays. We found an unexpectedly low diversity of RNA motifs, implying frequent convergence of binding specificity toward a relatively small set of RNA motifs, many with low compositional complexity. Offsetting this trend, however, we observed extensive preferences for contextual features distinct from short linear RNA motifs, including spaced "bipartite" motifs, biased flanking nucleotide composition, and bias away from or toward RNA structure. Our results emphasize the importance of contextual features in RNA recognition, which likely enable targeting of distinct subsets of transcripts by different RBPs that recognize the same linear motif.

Keywords: KH domain; Pum domain; RBNS; RNA binding protein; RNA context; RNA recognition motif; RNA secondary structure; alternative splicing; mRNA stability; zinc finger.

PubMed Disclaimer

Conflict of interest statement

DECLARATION OF INTERESTS

The authors declare no competing interests.

Figures

Figure 1
Figure 1. Overview of the high-throughput RNA Bind-n-Seq assay and computational analysis pipeline
A. Schematic of RBNS assay and pipeline. B. Number of RBPs with one or more of the three most common RBD types assayed. C. Cumulative distribution of amino acid identity between the most similar pairs of RBDs across all RBPs and those assayed by RBNS. D. Pearson r of R values between RBNS assays of the same RBP at different protein concentrations. Inset: correlation of 5mer R values of HNRNPL at 20 nM (most enriched concentration) and 80 nM.
Figure 2
Figure 2. RBPs bind a small subset of the sequence space, characterized by low-entropy motifs
A. From left to right: Dendrogram of hierarchical clustering of RBPs by sequence logo similarity and 15 clusters at indicated branch length cutoff (dashed line); protein name; colored circles representing nucleotide content of RBP motif (one circle if motif is >66% one base, two half-circles if motif is >33% two bases); top motif logo for each protein; RBD(s), with expressed region underlined; * indicates a natural isoform lacking a canonical RBD. B. Network map of RBPs with overlapping specificities. Line thickness increases with number of overlapping 6mers as indicated. Node outline indicates RBD type of each protein. C. Number of unique top 6mers among subsamplings of the 78 RBNS experiments versus randomly selected 6mers. D. Edge count between nodes for network maps as shown in B, drawn using groups of 15 6mers with decreasing affinity ranks. E. Entropy of nucleotide composition of RBNS motifs and simulated motifs. P-value determined by Wilcoxon rank-sum test. F. Enrichment of RBNS motifs over simulated motifs among partitions of a 2D simplex of motif nucleotide composition. Significance along margins was determined by bootstrap Z-score (number of asterisks = Z-score).
Figure 3
Figure 3. RBNS-derived motifs are associated with regulation of mRNA splicing and stability in vivo
A. Overlap of RBNS 6mers and 6mers with splicing regulatory activity (P-value determined by hypergeometric test). B. Comparison of splicing regulatory scores of, left: RBNS 6mers and all other 6mers; right: all 6mers binned by their maximum R value Z-score across all RBNS experiments (P-values determined by Wilcoxon rank-sum test). C. Left: Number of alternative exons regulated by RBM25 as determined by RNA-seq after RBM25 KD in HepG2 cells. Right: Proportion of events covered by RBNS 5mers in exonic and flanking intronic regions near alternative exons excluded upon RBM25 KD (red), included upon RBM25 KD (blue), and a control set of exons (black). Positions of significant difference from control exons upon KD determined by Wilcoxon rank-sum test and marked below the x-axis. D. Overlap of RBNS 6mers and 6mers with 3′UTR regulatory activity (hypergeometric test). E. Same as B, but comparison with 3′UTR regulation rather than splicing regulation. F. Pearson r of eCLIP densities across 100 nt windows of 3′ UTRs for all pairs of eCLIP experiments. Pairs of experiments are grouped by category, with all pairs not belonging to “Replicates”, “Paralogs”, or “Similar motifs” (sharing two of top 5 5mers) placed in “Other”. P-values, Wilcoxon rank-sum test, ***P < 5x10−4, N.S.: P > 0.05.
Figure 4
Figure 4. RNA secondary structural preferences of RBPs
A. The log2(pulldown Ppaired/input Ppaired) for the most enriched pulldown library over each position of the top 6mer plus 10 flanking positions on each side; RBPs are grouped by motif clusters in Fig. 2A and from greatest to least mean log2(pulldown Ppaired/input Ppaired) over the top 6mer from top to bottom within each cluster. B. Mean change (log2) in Ppaired over each position of the top 6mer at different concentrations of NUPL2 (top) and PRR3 (bottom) relative to the input library. C. Enrichment of the top 6mer of NUPL2 (top) and PRR3 (bottom) in 5 bins into which all 6mers were assigned based on their average Ppaired. D. Top: Three types of structural contexts considered and the percentage of all 6mers and RBNS 6mers (top 6mer for each of 78 RBPs) found in each context in pulldown reads. Bottom: Log-fold change of the top 6mer’s recalculated R among 6mers restricted to each structural context relative to the original R. E, F: Left: Percentage of each position of the top 6mer found in the four structural elements for RBM22 (E) and ZNF326 (F) in pulldown reads. Structure logo for top 6mer is shown above. Right: Representative structure of the top 6mer pairing with the 5′ sequencing adapter (gray) for 6mers found at the most enriched positions within the random 20mer (RBM22, position 5; ZNF326, position 6). G. Enrichment of the percentage of pulldown vs. input reads containing hairpin loops of various lengths, separated by RBPs that contain (n=13) or do not contain (n=65) at least one KH domain (P < 0.05, Wilcoxon rank-sum test). H. Average Ppaired in random sequence for all 6mers binned by maximum R value Z-score across all RBNS experiments (***P < 0.0005 by Wilcoxon rank-sum test; overall Spearman ρ = −0.18, P < 10−22).
Figure 5
Figure 5. Many RBPs bind bipartite motifs or prefer specific flanking nucleotide compositions
A, B. Top: Sequence logos of bipartite motifs for DAZAP1 (A) and RBM45 (B). Bottom: Nucleotide composition of the spacer between both motif cores (left) and enrichment as a function of the spacing between cores (right). C. Core spacing preferences of all RBPs. Each row indicates enrichment as a function of the spacing between cores. Enrichments normalized to maximum value in each row (outlined in black). * Indicates non-zero spacing is significantly preferred over the best linear 6mer. RBPs are grouped by motif clusters in Fig. 2. D. Pearson correlation between RBD identity within an RBP and the similarity between the core motifs (only RBDs of the same type were compared). E, F. Flanking nucleotide compositional preferences surrounding the top five 5mers for NOVA1 (E) and FUBP3 (F). Inset: mean enrichments across all positions flanking the motif. G. Flanking compositional preferences of all RBPs. Enrichment or depletion for each nucleotide surrounding the RBP’s top five 5mers. Boxes indicate significant enrichment (log2(enrichment) > 0.1, P < 0.001). H. Enrichment of HNRNPK’s top 10 linear 6mers (right) and top 10 degenerate sequences of length 12 with 6 Cs and 6 Ns (left). I. Filter assay validation of HNRNPK binding to the oligo UUU(CCUCUCUUUUCC)UUU (blue) and the oligo U12 (black) as a negative control. Dot blot of filter assay shown above with fraction of RNA bound quantified below.
Figure 6
Figure 6. RBPs that bind similar motifs often diverge in sequence context preferences
A. Dispersal of specificities between cluster 1 RBPs. X- and y- axes represent preference for secondary structure over the motif (x) or flanking regions (y). Circle color denotes preference for flanking nucleotide composition. Split semicircles indicate preference for a bipartite motif over a linear motif with the distance between semicircles reflecting preferred spacing of cores. B. Pairwise distances (1 – Pearson r) of feature-specific R values for pairs of RBPs within a motif cluster (“intra-cluster”) compared to distances between controls (“reps”). *P < 0.05, **P < 0.005, ***P < 0.0005, Wilcoxon rank-sum test. C. Log2 ratio of Ppaired over U5 occurrences and nucleotides directly upstream and downstream in: RBNS motifs relative to input (top), intronic motifs found eCLIP peaks relative to motifs in control peaks (middle), intronic motifs near exons with increased inclusion upon RBP KD relative to control introns (bottom). *P < 0.05, **P < 0.005, ***P < 0.0005, Wilcoxon rank-sum test.

Similar articles

Cited by

References

    1. Afroz T, Cienikova Z, Cléry A, Allain FH-T. One, Two, Three, Four! How Multiple RRMs Read the Genome Sequence. Meth Enzymol. 2015;558:235–278. doi: 10.1016/bs.mie.2015.01.015. - DOI - PubMed
    1. Auweter SD, Oberstrass FC, Allain FHT. Sequence-specific binding of single-stranded RNA: is there a code for recognition? Nucleic Acids Research. 2006;34:4943–4959. doi: 10.1093/nar/gkl620. - DOI - PMC - PubMed
    1. Barreau C, Paillard L, Osborne HB. AU-rich elements and associated factors: are there unifying principles? Nucleic Acids Research. 2005;33:7138–7150. doi: 10.1093/nar/gki1012. - DOI - PMC - PubMed
    1. Botti V, McNicoll F, Steiner MC, Richter FM, Solovyeva A, Wegener M, Schwich OD, Poser I, Zarnack K, Wittig I, Neugebauer KM, Müller-McNicoll M. Cellular differentiation state modulates the mRNA export activity of SR proteins. J Cell Biol. 2017;216:1993–2009. doi: 10.1083/jcb.201610051. - DOI - PMC - PubMed
    1. Carlson SM, Soulette CM, Yang Z, Elias JE, Brooks AN, Gozani O. RBM25 is a global splicing factor promoting inclusion of alternatively spliced exons and is itself regulated by lysine mono-methylation. J Biol Chem. 2017;292:13381–13390. doi: 10.1074/jbc.M117.784371. - DOI - PMC - PubMed

Publication types