. 2012 Dec 1;28(23):3013-20.

doi: 10.1093/bioinformatics/bts569. Epub 2012 Sep 28.

Site identification in high-throughput RNA-protein interaction data

Philip J Uren¹, Emad Bahrami-Samani, Suzanne C Burns, Mei Qiao, Fedor V Karginov, Emily Hodges, Gregory J Hannon, Jeremy R Sanford, Luiz O F Penalva, Andrew D Smith

Affiliations

PMID: 23024010
PMCID: PMC3509493
DOI: 10.1093/bioinformatics/bts569

Site identification in high-throughput RNA-protein interaction data

Philip J Uren et al. Bioinformatics. 2012.

. 2012 Dec 1;28(23):3013-20.

doi: 10.1093/bioinformatics/bts569. Epub 2012 Sep 28.

Authors

Philip J Uren¹, Emad Bahrami-Samani, Suzanne C Burns, Mei Qiao, Fedor V Karginov, Emily Hodges, Gregory J Hannon, Jeremy R Sanford, Luiz O F Penalva, Andrew D Smith

Affiliation

¹ Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA.

PMID: 23024010
PMCID: PMC3509493
DOI: 10.1093/bioinformatics/bts569

Abstract

Motivation: Post-transcriptional and co-transcriptional regulation is a crucial link between genotype and phenotype. The central players are the RNA-binding proteins, and experimental technologies [such as cross-linking with immunoprecipitation- (CLIP-) and RIP-seq] for probing their activities have advanced rapidly over the course of the past decade. Statistically robust, flexible computational methods for binding site identification from high-throughput immunoprecipitation assays are largely lacking however.

Results: We introduce a method for site identification which provides four key advantages over previous methods: (i) it can be applied on all variations of CLIP and RIP-seq technologies, (ii) it accurately models the underlying read-count distributions, (iii) it allows external covariates, such as transcript abundance (which we demonstrate is highly correlated with read count) to inform the site identification process and (iv) it allows for direct comparison of site usage across cell types or conditions.

Availability and implementation: We have implemented our method in a software tool called Piranha. Source code and binaries, licensed under the GNU General Public License (version 3) are freely available for download from http://smithlab.usc.edu.

Contact: andrewds@usc.edu

Supplementary information: Supplementary data available at Bioinformatics online.

PubMed Disclaimer

Figures

**Fig. 1.**
CLIP read counts are fit well by zero-truncated negative binomial. (A) The average read count density for all datasets is shown in red (error bars are 95% confidence interval). The fitted densities for a negative binomial on all of the datasets is shown in blue (note that all densities are shown, rather than an average for each read count). Only read counts <20 are shown. (B) As with (A), but replacing the fit densities from the negative binomial with those of a zero-truncated negative binomial distribution. (C) Histogram of read counts from an iCLIP experiment for TIA1 (Wang *et al.*, 2010b) showing fit zero-truncated Poisson, negative binomial and zero-truncated negative binomial distributions. (D) Histogram showing the count of datasets for which 80% of the locations receiving reads have no more reads than the given count; the majority of datasets have >80% of their locations with <5 reads. Four outliers are not shown, with read counts of 79, 93, 88 and 228

**Fig. 2.**
CLIP- and RIP-seq read counts are correlated with transcript abundance. (A) Distribution of Spearman correlation coefficients for RNA-seq and immunoprecipitation read counts at transcript level over all examined datasets shows frequent strong correlation (B) Example hexbin plot showing transcript-level correlation between IP read count for HuR (selected at random from the set of highly correlated datasets; data from Mukherjee *et al.*, 2011) and RNA-seq read count in HEK293 cells. Spearman correlation coefficient: 0.67 (C) As in (A), but with 200 nt-wide non-overlapping bins; correlation is reduced in smaller bins, but still present

**Fig. 3.**
(A) Top identified motif and motif occurrence histogram for hTra2 identified from RIP-seq data using ZTNBR with non-specific control (red) and using ZTNB with no control (blue). (B) The top six enriched motifs and their positional occurrence histograms from the HITS-CLIP Ago2/miR-124 data. All motifs match to the miR-124 reverse-complement. Seed highlighted in red. (C) Number of target sequences in Ago2/miR-124 with a match to any 7-mer from the reverse-complement miR-124 sequence. One nucleotide miss-match was allowed. Blue box: 164 sites (51.2%) contain a match within 90 nt of the peak centre

See this image and copyright information in PMC

References

1. Anders G, et al. Dorina: a database of RNA interactions in post-transcriptional regulation. Nucleic Acids Res. 2012;40:D180–D186. - PMC - PubMed
1. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B Methodological. 1995;57:289–300.
1. Cameron AC, Trivedi PK. Regression Analysis of Count Data. Cambridge MA, UK: Cambridge University Press; 2008.
1. Chénard CA, Richard S. New implications for the QUAKING RNA binding protein in human disease. J. Neurosci. Res. 2008;86:233–242. - PubMed
1. Chi SW, et al. Argonaute HITS-CLIP decodes microRNA-mRNA interaction maps. Nature. 2009;460:479–486. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

R01 GM085121/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Site identification in high-throughput RNA-protein interaction data

Affiliation

Site identification in high-throughput RNA-protein interaction data

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous