Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jun;32(6):562-8.
doi: 10.1038/nbt.2880. Epub 2014 Apr 13.

Quantitative analysis of RNA-protein interactions on a massively parallel array reveals biophysical and evolutionary landscapes

Affiliations

Quantitative analysis of RNA-protein interactions on a massively parallel array reveals biophysical and evolutionary landscapes

Jason D Buenrostro et al. Nat Biotechnol. 2014 Jun.

Abstract

RNA-protein interactions drive fundamental biological processes and are targets for molecular engineering, yet quantitative and comprehensive understanding of the sequence determinants of affinity remains limited. Here we repurpose a high-throughput sequencing instrument to quantitatively measure binding and dissociation of a fluorescently labeled protein to >10(7) RNA targets generated on a flow cell surface by in situ transcription and intermolecular tethering of RNA to DNA. Studying the MS2 coat protein, we decompose the binding energy contributions from primary and secondary RNA structure, and observe that differences in affinity are often driven by sequence-specific changes in both association and dissociation rates. By analyzing the biophysical constraints and modeling mutational paths describing the molecular evolution of MS2 from low- to high-affinity hairpins, we quantify widespread molecular epistasis and a long-hypothesized, structure-dependent preference for G:U base pairs over C:A intermediates in evolutionary trajectories. Our results suggest that quantitative analysis of RNA on a massively parallel array (RNA-MaP) provides generalizable insight into the biophysical basis and evolutionary consequences of sequence-function relationships.

PubMed Disclaimer

Figures

Figure 1
Figure 1. A massively parallel RNA array for quantitative, high-throughput biochemistry
(a) Steps for generating RNA tethered to DNA clusters on a high-throughput DNA sequencing flow cell. (b) Structure of the MS2 coat protein homodimer bound to the 19 nt hairpin RNA (PDB ID: 2BU1). (c) Images of fluorescently labeled MS2 bound to RNA clusters at increasing concentrations of protein and at time points following perfusion of unlabeled MS2 competitor. Below, fitted sum of Gaussians used to assign fluorescence to clusters. Scale bars (white) represent 2.5 μm. (d) Fluorescence decay of MS2 dissociating from clusters containing the consensus sequence (-5C) (t1/2=8.39 minutes). (e) Fit binding curves to clusters labeled in panel (c). (f) The probability distribution of binding energies from all clusters with labeled variants; mean Kd = 2.57 nM, 36.8 nM, and 415 nM for the -5C, -5U, and -5A variants, respectively. (g) Correlation between binding energies reported in the literature and measured on the RNA array (squares, Carey et al., circles, Romaniuk et al.). (Dashed line indicates our affinity measurement cutoff.)
Figure 2
Figure 2. A quantitative map of MS2 binding across RNA sequence variants
(a) Distribution of observed RNA variants by number of mutations. (b) Clusters measured per molecular variant as a function of mutation number. A median of ~11 clusters are observed for sequences with ≥4 mutations. Affinities for the consensus sequence come from NC=909,385 clusters. (c) Average −ΔΔG of point mutations per position. The −ΔΔG of alanine substitutions to the MS2 binding surface are shown in parentheses (kBT). Solid and dashed lines represent base and phosphate interactions, respectively. (d) Matrix ofΔΔG for single and double mutants of the consensus sequence. Inset contains the matrix of ΔΔG for single and double mutants of the +1G variant. All energies are calculated relative to the consensus (-5C) sequence (arrow, −ΔΔG=0), and the number of quality-filtered double mutants in each matrix is indicated (M2). (e) Epistasis matrix derived from (d) allows de novo reconstruction of the hairpin structure.
Figure 3
Figure 3. Binding affinity is dependent on primary sequence and secondary RNA structure
(a) Fit parameters for linear regression model showing position-specific contributions. Energetic components for all possible base pair combinations are shown below. (b) Predicted binding energies of variants with second (M2) and third mutations (M3) in both single- and double-stranded regions. Primary (i.e. mean energetic contributions of transitions and transversions) (c) and secondary (d) structure contributions to affinity derived from a, were mapped onto the hairpin (PDB ID: 1ZDH).
Figure 4
Figure 4. Sequence-specific contributions of association and dissociation rates to binding affinity
(a) Fractional contribution of dissociation rates for 31 single and 289 double mutants with measurable affinities and dissociation rates. Positions at the base of the hairpin are highlighted. (b) Δlog(koff) and (c) Δlog(kon) at the base of the hairpin. M2 = number of qualityfiltered double mutants. (d) Distribution of fractional contributions of association (blue, μ=0.57) and dissociation (red, μ=0.43) rates to −ΔΔG for all measured mutants (N=3,029).
Figure 5
Figure 5. Evolutionary landscapes are highly constrained by biophysical requirements
(a) Tesseracts describe traversal probabilities for the complete set (N=24) of mutational paths between low and high-affinity variants within 4 mutations. The AUC of the cumulative probability of ranked paths measures evolutionary constraint (EAUC), as modulated by epistasis (ε). (b) Density of cumulative probabilities for the ranked paths of 1,997 measured tesseracts. The fraction of the total path probabilities captured per individual path is shown as a function of path rank in the inset. The cumulative sum of these individual values is integrated to calculate EAUC. (c) Distribution of EAUC scores from observed tesseracts (red), tesseracts with uniform path probabilities (blue) and tesseracts with random affinities (purple) imply a highlystructured epistatic landscape. The number of variants significantly constrained (P < 0.01, Benjamini-Hochberg) is indicated for both models. Average evolutionary probability (d) and constraint (e) for paths with changes at each position of the hairpin. (f) Intermediate trajectories for base pair A:U→G:C and U:A→G:C transitions. (g) Probability ratio of evolutionary paths passing through G:U vs. A:C intermediates by base derived from 696 tesseracts with A:U→G:C base pair transformations.

References

    1. Keene JD. RNA regulons: coordination of post-transcriptional events. Nature Reviews Genetics. 2007;8:533–543. - PubMed
    1. Carey J, Cameron V, De Haseth PL, Uhlenbeck OC. Sequence-specific interaction of R17 coat protein with its ribonucleic acid binding site. Biochemistry. 1983;22:2601–2610. - PubMed
    1. Tsvetanova NG, Klass DM, Salzman J, Brown PO. Proteome-Wide Search Reveals Unexpected RNA-Binding Proteins in Saccharomyces cerevisiae. PLoS ONE. 2010;5:e12671. - PMC - PubMed
    1. Scherrer T, Mittal N, Janga SC, Gerber AP. A Screen for RNA-Binding Proteins in Yeast Indicates Dual Functions for Many Enzymes. PLoS ONE. 2010;5:e15499. - PMC - PubMed
    1. Butter F, Scheibe M, Morl M, Mann M. Unbiased RNA-protein interaction screen by quantitative proteomics. Proceedings of the National Academy of Sciences. 2009;106:10626–10631. - PMC - PubMed

Publication types

Associated data