Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2014 Oct;20(10):1519-31.
doi: 10.1261/rna.045088.114. Epub 2014 Aug 20.

IRBIS: a systematic search for conserved complementarity

Affiliations
Comparative Study

IRBIS: a systematic search for conserved complementarity

Dmitri D Pervouchine. RNA. 2014 Oct.

Abstract

IRBIS is a computational pipeline for detecting conserved complementary regions in unaligned orthologous sequences. Unlike other methods, it follows the "first-fold-then-align" principle in which all possible combinations of complementary k-mers are searched for simultaneous conservation. The novel trimming procedure reduces the size of the search space and improves the performance to the point where large-scale analyses of intra- and intermolecular RNA-RNA interactions become possible. In this article, I provide a rigorous description of the method, benchmarking on simulated and real data, and a set of stringent predictions of intramolecular RNA structure in placental mammals, drosophilids, and nematodes. I discuss two particular cases of long-range RNA structures that are likely to have a causal effect on single- and multiple-exon skipping, one in the mammalian gene Dystonin and the other in the insect gene Ca-α1D. In Dystonin, one of the two complementary boxes contains a binding site of Rbfox protein similar to one recently described in Enah gene. I also report that snoRNAs and long noncoding RNAs (lncRNAs) have a high capacity of base-pairing to introns of protein-coding genes, suggesting possible involvement of these transcripts in splicing regulation. I also find that conserved sequences that occur equally likely on both strands of DNA (e.g., transcription factor binding sites) contribute strongly to the false-discovery rate and, therefore, would confound every such analysis. IRBIS is an open-source software that is available at http://genome.crg.es/~dmitri/irbis/.

Keywords: Ca-α1D; Dystonin; RNA–RNA interaction; alternative splicing; evolutionary conservation; exon skipping; lncRNA; long-range RNA structure; snoRNA.

PubMed Disclaimer

Figures

FIGURE 1.
FIGURE 1.
Orthologous segments sij are indexed by segment identifiers j = 1 … n in each of the species i = 1 … m. Gray boxes are complementary to white boxes and, respectively, gray circles to white circles. The positions of boxes and circles within segments do not play any role. All boxes and circles occur in orthologous segments in three species. Note, however, that boxes occur simultaneously in three species, while circles occur simultaneously in only two species.
FIGURE 2.
FIGURE 2.
False-discovery rate (FDR) as a function of length threshold (L) and intersection threshold (t) for intramolecular RNA structure in noncoding segments of mammalian protein-coding genes. Error bars, 95% confidence intervals. Other parameters are as in Table 1.
FIGURE 3.
FIGURE 3.
(Top) Exonic structure of a 9567-nt fragment of the human DST gene (Dystonin, Bullous Pemphigoid Antigen 1, BPAG1) on chr6:56,465,020–56,474,586. Exons 47–52 are spliced as a cluster. Two complementary sequences, box 1 and box 2, are located in introns between exons 46–47 and 52–53, as indicated by gray boxes. (Bottom) Multiple sequence alignment of introns containing boxes 1 and 2. The average sequence conservation rate is <0.5%.
FIGURE 4.
FIGURE 4.
(Top) Exonic structure of Ca-α1D, a L-type voltage-gated calcium channel gene in the fruit fly (fragment chr2L:16,179,790–16,187,319). Exons 20 and 31 can be included or skipped. Although exon 19 is not annotated as a cassette exon, it can also be skipped in a tissue-specific way as evidenced by ESTs (dashed arc). (Bottom) Two pairs of very conserved complementary sequences, box 1/box 2 and box 3/box 4, found in introns surrounding these cassette exons (bottom) are likely to be involved in regulation of these splicing events.
FIGURE 5.
FIGURE 5.
Small nucleolar RNAs have, on average, more conserved complementary targets in intronic segments of protein-coding genes compared with the reverse complements of these segments. The distribution of differences, D = n(B) − n(B′), where n(B) and n(B′) are the number of targets of the same snoRNA on the coding strand and on the opposite strand, respectively. (Inset) The top 10 snoRNA with the largest D.
FIGURE 6.
FIGURE 6.
The predicted target of SNORD116-4 snoRNA (chr15:25,304,685–25,304,779) in the human SBRD1 gene (chr2:45,704,215–45,715,386). Besides C and D boxes, SNORD116-4 contains a conserved sequence box 1 that could potentially mask the acceptor site in SBRD1 and in approximately 280 other mammalian genes.
FIGURE 7.
FIGURE 7.
The first exon (chr1:120,876,263–120,905,153) of RP11-439A17.4 lncRNA (bottom left) contains box 1, a conserved sequence that is complementary to box 2 sequence in HIST3H2BB gene and also to other similar sequences in 3′ termini of at least 22 mammalian histone genes. Some of the target sequences as well as the reverse complement of box 1 are recognized as MEF-2A binding sites, suggesting coincidental complementarity of transcriptional regulatory elements that are located on the opposite strands of DNA. Annotated (predicted) MEF-2A binding sites are indicated by small gray (white) rectangles.
FIGURE 8.
FIGURE 8.
(A) A diagram exemplifying the helix space. Horizontal and vertical axes correspond to the hash tables Hi and Hi*, respectively; widths and heights of the gray rectangles are ni(ω) and ni(ω*), respectively. The total gray area represents the number of (ordered) pairwise combinations of complementary k-mers. (B) Several diagrams as in A need to be parsed simultaneously in order to find conserved complementary k-mers. Pairs equivalent under ≃ are connected by a path.

References

    1. Alkan C, Karakoc E, Nadeau JH, Sahinalp SC, Zhang K 2006. RNA–RNA interaction prediction and antisense RNA target search. J Comput Biol 13: 267–282 - PubMed
    1. Anders G, Mackowiak SD, Jens M, Maaskola J, Kuntzagk A, Rajewsky N, Landthaler M, Dieterich C 2012. doRiNA: a database of RNA interactions in post-transcriptional regulation. Nucleic Acids Res 40: D180–D186 - PMC - PubMed
    1. Derrien T, Johnson R, Bussotti G, Tanzer A, Djebali S, Tilgner H, Guernec G, Martin D, Merkel A, Knowles DG, et al.2012. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res 22: 1775–1789 - PMC - PubMed
    1. Edgar RC 2004a. Local homology recognition and distance measures in linear time using compressed amino acid alphabets. Nucleic Acids Res 32: 380–385 - PMC - PubMed
    1. Edgar RC 2004b. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5: 113. - PMC - PubMed

Publication types

Substances