Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Feb;65(3):263-73.
doi: 10.1016/j.ymeth.2013.10.015. Epub 2013 Nov 6.

Hyb: a bioinformatics pipeline for the analysis of CLASH (crosslinking, ligation and sequencing of hybrids) data

Affiliations

Hyb: a bioinformatics pipeline for the analysis of CLASH (crosslinking, ligation and sequencing of hybrids) data

Anthony J Travis et al. Methods. 2014 Feb.

Abstract

Associations between proteins and RNA-RNA duplexes are important in post-transcriptional regulation of gene expression. The CLASH (Cross-linking, Ligation and Sequencing of Hybrids) technique captures RNA-RNA interactions by physically joining two RNA molecules associated with a protein complex into a single chimeric RNA molecule. These events are relatively rare and considerable effort is needed to detect a small number of chimeric sequences amongst millions of non-chimeric cDNA reads resulting from a CLASH experiment. We present the "hyb" bioinformatics pipeline, which we developed to analyse high-throughput cDNA sequencing data from CLASH experiments. Although primarily designed for use with AGO CLASH data, hyb can also be used for the detection and annotation of chimeric reads in other high-throughput sequencing datasets. We examined the sensitivity and specificity of chimera detection in a test dataset using the BLAST, BLAST+, BLAT, pBLAT and Bowtie2 read alignment programs. We obtained the most reliable results in the shortest time using a combination of preprocessing with Flexbar and subsequent read-mapping using Bowtie2. The "hyb" software is distributed under the GNU GPL (General Public License) and can be downloaded from https://github.com/gkudla/hyb.

Keywords: Bioinformatics; CLASH; High-throughput sequencing; RNA–RNA interactions.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Schematic of CLASH experiment and hyb analysis pipeline.
Fig. 2
Fig. 2
Processing of 5′ barcodes. The multiplexing parts of barcodes are used to split input data into appropriate files, whereas the random parts of barcodes are used to monitor PCR amplification artefacts. The sequence identifiers (FASTA headers) of collapsed reads are in the format “K-L_M”, where K is the frequency rank of the sequence in the input file, L is the number of unique random barcodes associated with the sequence, and M is the number of times the sequence has been found in the input file. When 5′ barcodes are absent, the identifiers of collapsed reads are in the format “K_M”.
Fig. 3
Fig. 3
Benchmarking of preprocessing and mapping parameters. Numbers of miRNA–mRNA chimeras recovered from the E6 test dataset (from Ref. [11]) as a function of preprocessing and mapping parameters. The following parameters were explored: the choice of adapter trimming program (flexbar or fastx-clipper), the choice of mapping program (blastall, blastn, blat, pblat, bowtie2), the base quality threshold (0, 10, 20, or 30), and linker length threshold (4, or 0 which indicates no linker trimming).
Fig. 4
Fig. 4
Effects of preprocessing and mapping parameters on analysis times. Data and parameter choices as in Fig. 3.
Fig. 5
Fig. 5
Characteristics of chimeras recovered as a function of the mapping program used. (a) Distribution of folding energies of miRNA–mRNA chimeras identified with blastall, blastn, blat, and bowtie2. (b) Types of RNA–RNA interactions recovered with each mapping program. (c) Numbers of chimeras recovered with different combinations of mapping programs, analysed with VENNY . A total of 12762 interactions are found with all four mapping programs, whereas 21537 interactions are found with at least one of the programs. (d) Fractions of chimeras recovered with one or more, two or more, three or more, and four mapping programs, respectively. Analyses were performed on dataset E4 (Ref. [11]), with the following parameters: trim = 0 filt = 0 min = 4 len = 17.
Fig. 6
Fig. 6
Benchmarking of chimera-calling options. Dataset E4 was analysed with default parameter values, except for the indicated options, which were individually changed. The resulting chimera counts and quality parameters are reported. The percent log2 transcript enrichment values used in the bottom panel are from .
Fig. 7
Fig. 7
Numbers of chimeric and non-chimeric reads as a function of read length. Dataset E4 was analysed with default parameters.
Fig. 8
Fig. 8
Comparison of hyb and tophat fusion. Distribution of folding energies of miRNA–mRNA chimeras recovered with hyb, tophat fusion, and in randomly re-associated miRNA–mRNA pairs from the tophat fusion analysis.

References

    1. Filipowicz W., Bhattacharyya S.N., Sonenberg N. Nat. Rev. Genet. 2008;9:102–114. - PubMed
    1. Aravin A.A., Hannon G.J., Brennecke J. Science. 2007;318:761–764. - PubMed
    1. Gong C., Maquat L.E. Nature. 2011;470:284–288. - PMC - PubMed
    1. Aiba H. Curr. Opin. Microbiol. 2007;10:134–139. - PubMed
    1. Licatalosi D.D., Mele A., Fak J.J., Ule J., Kayikci M., Chi S.W., Clark T.A., Schweitzer A.C., Blume J.E., Wang X., Darnell J.C., Darnell R.B. Nature. 2008;456:464–469. - PMC - PubMed

Publication types