Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Dec 1;28(23):3013-20.
doi: 10.1093/bioinformatics/bts569. Epub 2012 Sep 28.

Site identification in high-throughput RNA-protein interaction data

Affiliations

Site identification in high-throughput RNA-protein interaction data

Philip J Uren et al. Bioinformatics. .

Abstract

Motivation: Post-transcriptional and co-transcriptional regulation is a crucial link between genotype and phenotype. The central players are the RNA-binding proteins, and experimental technologies [such as cross-linking with immunoprecipitation- (CLIP-) and RIP-seq] for probing their activities have advanced rapidly over the course of the past decade. Statistically robust, flexible computational methods for binding site identification from high-throughput immunoprecipitation assays are largely lacking however.

Results: We introduce a method for site identification which provides four key advantages over previous methods: (i) it can be applied on all variations of CLIP and RIP-seq technologies, (ii) it accurately models the underlying read-count distributions, (iii) it allows external covariates, such as transcript abundance (which we demonstrate is highly correlated with read count) to inform the site identification process and (iv) it allows for direct comparison of site usage across cell types or conditions.

Availability and implementation: We have implemented our method in a software tool called Piranha. Source code and binaries, licensed under the GNU General Public License (version 3) are freely available for download from http://smithlab.usc.edu.

Contact: andrewds@usc.edu

Supplementary information: Supplementary data available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
CLIP read counts are fit well by zero-truncated negative binomial. (A) The average read count density for all datasets is shown in red (error bars are 95% confidence interval). The fitted densities for a negative binomial on all of the datasets is shown in blue (note that all densities are shown, rather than an average for each read count). Only read counts <20 are shown. (B) As with (A), but replacing the fit densities from the negative binomial with those of a zero-truncated negative binomial distribution. (C) Histogram of read counts from an iCLIP experiment for TIA1 (Wang et al., 2010b) showing fit zero-truncated Poisson, negative binomial and zero-truncated negative binomial distributions. (D) Histogram showing the count of datasets for which 80% of the locations receiving reads have no more reads than the given count; the majority of datasets have >80% of their locations with <5 reads. Four outliers are not shown, with read counts of 79, 93, 88 and 228
Fig. 2.
Fig. 2.
CLIP- and RIP-seq read counts are correlated with transcript abundance. (A) Distribution of Spearman correlation coefficients for RNA-seq and immunoprecipitation read counts at transcript level over all examined datasets shows frequent strong correlation (B) Example hexbin plot showing transcript-level correlation between IP read count for HuR (selected at random from the set of highly correlated datasets; data from Mukherjee et al., 2011) and RNA-seq read count in HEK293 cells. Spearman correlation coefficient: 0.67 (C) As in (A), but with 200 nt-wide non-overlapping bins; correlation is reduced in smaller bins, but still present
Fig. 3.
Fig. 3.
(A) Top identified motif and motif occurrence histogram for hTra2 identified from RIP-seq data using ZTNBR with non-specific control (red) and using ZTNB with no control (blue). (B) The top six enriched motifs and their positional occurrence histograms from the HITS-CLIP Ago2/miR-124 data. All motifs match to the miR-124 reverse-complement. Seed highlighted in red. (C) Number of target sequences in Ago2/miR-124 with a match to any 7-mer from the reverse-complement miR-124 sequence. One nucleotide miss-match was allowed. Blue box: 164 sites (51.2%) contain a match within 90 nt of the peak centre

References

    1. Anders G, et al. Dorina: a database of RNA interactions in post-transcriptional regulation. Nucleic Acids Res. 2012;40:D180–D186. - PMC - PubMed
    1. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B Methodological. 1995;57:289–300.
    1. Cameron AC, Trivedi PK. Regression Analysis of Count Data. Cambridge MA, UK: Cambridge University Press; 2008.
    1. Chénard CA, Richard S. New implications for the QUAKING RNA binding protein in human disease. J. Neurosci. Res. 2008;86:233–242. - PubMed
    1. Chi SW, et al. Argonaute HITS-CLIP decodes microRNA-mRNA interaction maps. Nature. 2009;460:479–486. - PMC - PubMed

Publication types