Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Dec 17;21(1):894.
doi: 10.1186/s12864-020-07297-0.

Improving CLIP-seq data analysis by incorporating transcript information

Affiliations

Improving CLIP-seq data analysis by incorporating transcript information

Michael Uhl et al. BMC Genomics. .

Abstract

Background: Current peak callers for identifying RNA-binding protein (RBP) binding sites from CLIP-seq data take into account genomic read profiles, but they ignore the underlying transcript information, that is information regarding splicing events. So far, there are no studies available that closer observe this issue.

Results: Here we show that current peak callers are susceptible to false peak calling near exon borders. We quantify its extent in publicly available datasets, which turns out to be substantial. By providing a tool called CLIPcontext for automatic transcript and genomic context sequence extraction, we further demonstrate that context choice affects the performances of RBP binding site prediction tools. Moreover, we show that known motifs of exon-binding RBPs are often enriched in transcript context sites, which should enable the recovery of more authentic binding sites. Finally, we discuss possible strategies on how to integrate transcript information into future workflows.

Conclusions: Our results demonstrate the importance of incorporating transcript information in CLIP-seq data analysis. Taking advantage of the underlying transcript information should therefore become an integral part of future peak calling and downstream analysis tools.

Keywords: CLIP-seq; Peak calling; RBP binding site prediction; eCLIP.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
IGV snapshot of two genomic regions with mapped YBX3 K562 eCLIP data. 1: read profile (coverage), 2: read alignments, 3: crosslink positions profile, 4: input control profile, 5: gene annotations (thick blue regions are exons, thin blue regions introns), CLIPper / CLIPper IDR: CLIPper replicate 1 and IDR peaks, PEAKachu: PEAKachu peaks, PureCLIP: PureCLIP peaks (nearby crosslink positions merged). For clarity only gene strand reads from replicate 1 are displayed. aPRDX6 whole gene region (length 11 kb, maximum read coverage 1141). bDDOST gene exons 6 and 7 region (length 563 bp, maximum read coverage 167)
Fig. 2
Fig. 2
Exon binding statistics of eCLIP datasets and prediction results for different sequence contexts. a Distribution of exonic site ratios for 223 eCLIP datasets over four percentage ranges. For each range, the percentage (number) of sets with ratios falling into this range is given. b Correlation plot of exonic site ratios for RBPs present in two cell lines (HepG2 and K562). c Site score distributions for all exonic sites and exonic sites that form pairs by being located at adjacent exon borders. log2 fold change values of the sites determined by CLIPper were taken as site scores. Only pair sites with a distance of <10 nt to exon borders were considered. d Average classification accuracies over 6 eCLIP datasets for 3 RBP binding site prediction methods, comparing genome and transcript context
Fig. 3
Fig. 3
IGV snapshot of two genomic regions with mapped IGF2BP3 K562 and PUM2 K562 eCLIP data. 1: read profile (coverage), 2: read alignments, 3: crosslink positions profile, 4: input control profile, 5: gene annotations (thick blue regions are exons, thin blue regions introns), IGF2BP3 / PUM2 motif: RBP motifs mapped with CLIPcontext, CLIPper IDR: CLIPper IDR peaks, PEAKachu: PEAKachu peaks, PureCLIP: PureCLIP peaks (nearby crosslink positions merged). For clarity only gene strand reads from replicate 1 are displayed. aRACK1 gene exons 7 and 8 region (length 911 bp, maximum read coverage 150) with split IGF2BP3 motif hit. bRTRAF gene exons 4 and 5 region (length 1.599 bp, maximum read coverage 58) with split PUM2 motif hit

References

    1. Licatalosi DD, Mele A, Fak JJ, Ule J, Kayikci M, Chi SW, Clark TA, Schweitzer AC, Blume JE, Wang X, et al. HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature. 2008;456(7221):464. - PMC - PubMed
    1. Hafner M, Landthaler M, Burger L, Khorshid M, Hausser J, Berninger P, Rothballer A, Ascano Jr M, Jungkamp A-C, Munschauer M, et al. Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell. 2010;141(1):129–41. - PMC - PubMed
    1. König J, Zarnack K, Rot G, Curk T, Kayikci M, Zupan B, Turner DJ, Luscombe NM, Ule J. iCLIP reveals the function of hnRNP particles in splicing at individual nucleotide resolution. Nat Struct Mol Biol. 2010;17(7):909. - PMC - PubMed
    1. Van Nostrand EL, Pratt GA, Shishkin AA, Gelboin-Burkhart C, Fang MY, Sundararaman B, Blue SM, Nguyen TB, Surka C, Elkins K, et al. Robust transcriptome-wide discovery of RNA-binding protein binding sites with enhanced CLIP (eCLIP) Nat Methods. 2016;13(6):508. - PMC - PubMed
    1. Uren PJ, Bahrami-Samani E, Burns SC, Qiao M, Karginov FV, Hodges E, Hannon GJ, Sanford JR, Penalva LO, Smith AD. Site identification in high-throughput RNA–protein interaction data. Bioinformatics. 2012;28(23):3013–20. - PMC - PubMed

Substances

LinkOut - more resources