Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Sep 19;45(16):9260-9271.
doi: 10.1093/nar/gkx646.

CLIP-seq analysis of multi-mapped reads discovers novel functional RNA regulatory sites in the human transcriptome

Affiliations

CLIP-seq analysis of multi-mapped reads discovers novel functional RNA regulatory sites in the human transcriptome

Zijun Zhang et al. Nucleic Acids Res. .

Abstract

Crosslinking or RNA immunoprecipitation followed by sequencing (CLIP-seq or RIP-seq) allows transcriptome-wide discovery of RNA regulatory sites. As CLIP-seq/RIP-seq reads are short, existing computational tools focus on uniquely mapped reads, while reads mapped to multiple loci are discarded. We present CLAM (CLIP-seq Analysis of Multi-mapped reads). CLAM uses an expectation-maximization algorithm to assign multi-mapped reads and calls peaks combining uniquely and multi-mapped reads. To demonstrate the utility of CLAM, we applied it to a wide range of public CLIP-seq/RIP-seq datasets involving numerous splicing factors, microRNAs and m6A RNA methylation. CLAM recovered a large number of novel RNA regulatory sites inaccessible by uniquely mapped reads. The functional significance of these sites was demonstrated by consensus motif patterns and association with alternative splicing (splicing factors), transcript abundance (AGO2) and mRNA half-life (m6A). CLAM provides a useful tool to discover novel protein-RNA interactions and RNA modification sites from CLIP-seq and RIP-seq data, and reveals the significant contribution of repetitive elements to the RNA regulatory landscape of the human transcriptome.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Motivation and schematic overview of CLAM. (A) In immunoprecipitation-based techniques for analyzing RBP–RNA interactions (CLIP-seq, RIP-seq), RNA associated with the target RBP is subject to fragmentation after the RBP–RNA complex is immunoprecipitated by specific antibody, followed by high-throughput sequencing to generate short reads typically ranging between 35 and 50 bp. An appreciable fraction of reads, such as those originated from repetitive element derived RBP–RNA interaction sites, are mapped to multiple genomic regions and subsequently discarded by conventional data analysis pipelines. Shown here is a read mapped to two genomic copies of a repetitive element (orange boxes), which have identical sequences where the read is aligned but have mutations elsewhere between these two copies (green vertical lines). (B) CLAM identifies a set of genomic regions sharing multi-mapped reads. It then uses an expectation–maximization (EM) algorithm to rescue multi-mapped reads and assign them to specific genomic regions, followed by a permutation based procedure for peak calling with gene-specific FDR control. The rescued peaks are then assessed via downstream analyses of RNA regulatory features, including enrichment of consensus motifs and evaluations of RBP-specific regulatory features.
Figure 2.
Figure 2.
Summary statistics of CLAM results on three CLIP-seq/RIP-seq datasets. (A) Percentage of multi-mapped reads (blue) and percentage of multi-mapped reads rescued by CLAM (orange) among all mapped reads in analyzed datasets. (B) Sensitivity analysis at various FDR thresholds. The majority of lost peaks can be recovered using the combination of uniquely and multi-mapped reads at higher (more relaxed) FDR thresholds (bar graphs on the left), while a significantly smaller fraction of rescued peaks can be identified using only uniquely mapped reads at higher FDR thresholds (bar graphs on the right). (C) Fraction of rescued and common peaks derived from various types of repetitive elements. A significantly higher fraction of rescued peaks are derived from repetitive elements across all three datasets.
Figure 3.
Figure 3.
Functional evaluation of CLAM on the hnRNPC CLIP-seq data. (A) Identification of the known consensus hnRNPC motif by de novo motif discovery in rescued and common hnRNPC peaks. (B) Enrichment analysis of hnRNPC-dependent alternative exons for rescued and common hnRNPC peaks. X-axis represents alternative exons ranked by rMATS formula image values (the difference in exon inclusion levels between control and knockdown). Y-axis is the enrichment score (ES) calculated via the Kolmogorov–Smirnov statistic. Both rescued and common hnRNPC peaks are strongly enriched for hnRNPC-repressed alternative exons. (C) Example of a rescued hnRNPC peak in DDIAS. (D) Example of a rescued hnRNPC peak in SNHG17.
Figure 4.
Figure 4.
Functional evaluation of CLAM on the AGO2 CLIP-seq data. For each microRNA, three classes of genes are compiled: genes with common peaks containing microRNA target sites (common, blue), genes with rescued peaks containing microRNA target sites (rescued, red) and background genes without AGO2 CLIP-seq peaks (background, black). Cumulative density function is plotted for the log2 gene expression fold change upon (A) inhibition of miR-21 or (B) ectopic expression of miR-107. For both microRNAs, rescued and common target genes show the same significant shift in cumulative density function as compared to background genes.
Figure 5.
Figure 5.
Functional evaluation of CLAM on the m6A RIP-seq data. (A) Identification of the known consensus m6A motif by de novo motif discovery in rescued and common m6A peaks. The conserved m6A RRACU motif in (B) anti-sense and (C) sense sequences of major Alu subfamilies. (D) Cumulative density function of mRNA half-life in iPSCs. Both genes with common and rescued m6A peaks have significantly lower mRNA half-life as compared to background genes without m6A peaks. Topological distribution of (E) rescued and (F) common m6A peaks across the 5′-UTR, CDS and 3′-UTR of protein-coding genes. (G) Example of a rescued Alu-derived m6A peak in the 3′-UTR of NME6.
Figure 6.
Figure 6.
CLAM analysis of 17 splicing factors with ENCODE eCLIP data and matching RNA-seq data following splicing factor knockdown in the HepG2 cell line. In visualizing the negative log10 of nominal P-values, we added a pseudo-count of 1e-3 to all P-values to truncate the −log10 (P-value) at an upper limit of 3, while the same pattern was observed for pseudo-count of 1e-4 and 1e-5. (A) Negative log10 enrichment P-values of known splicing factor motifs within common (blue) and rescued (red) peaks. The frequency of motif occurrences were compared to randomly sampled genomic sequences and Student’s t-distribution was fitted to measure the statistical significance of enrichment. (B) Barplots of negative log10 P-values of GSEA test on the enrichment of splicing factor-dependent alternative exons for common or rescued peaks within the upstream 250 bp intronic region (blue), the exon body (red) and the downstream 250 bp intronic region (orange). For common peaks, the −log10 P-value of enrichment was calculated as the average from 20 random iterations of down-sampling to the same number of rescued peaks. If the −log10 P-value of rescued peaks is within the mean ± standard deviation of that of common peaks, an asterisk is added next to the bar. (C) Enrichment analysis of hnRNPC-dependent exons for common and rescued hnRNPC exon-overlapping peaks in the ENCODE HepG2 data. Both common and rescued hnRNPC peaks are strongly enriched for hnRNPC-repressed exons. (D) Enrichment analysis of U2AF2-dependent exons for common and rescued U2AF2 exon-overlapping peaks. Both common and rescued peaks are strongly enriched for U2AF2-enhanced exons in the ENCODE HepG2 data.

Similar articles

Cited by

References

    1. Ray D., Kazan H., Cook K.B., Weirauch M.T., Najafabadi H.S., Li X., Gueroussov S., Albu M., Zheng H., Yang A. et al. . A compendium of RNA-binding motifs for decoding gene regulation. Nature. 2013; 499:172–177. - PMC - PubMed
    1. Gerstberger S., Hafner M., Tuschl T.. A census of human RNA-binding proteins. Nat. Rev. Genet. 2014; 15:829–845. - PMC - PubMed
    1. Glisovic T., Bachorik J.L., Yong J., Dreyfuss G.. RNA-binding proteins and post-transcriptional gene regulation. FEBS Lett. 2008; 582:1977–1986. - PMC - PubMed
    1. Fu X.D., Ares M.. Context-dependent control of alternative splicing by RNA-binding proteins. Nat. Rev. Genet. 2014; 15:689–701. - PMC - PubMed
    1. Dittmar K.A., Jiang P., Park J.W., Amirikian K., Wan J., Shen S., Xing Y., Carstens R.P.. Genome-wide determination of a broad ESRP-regulated posttranscriptional network by high-throughput sequencing. Mol. Cell. Biol. 2012; 32:1468–1482. - PMC - PubMed

Publication types

MeSH terms