Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jul;24(7):1169-79.
doi: 10.1101/gr.166819.113. Epub 2014 Apr 7.

LaSSO, a strategy for genome-wide mapping of intronic lariats and branch points using RNA-seq

Affiliations

LaSSO, a strategy for genome-wide mapping of intronic lariats and branch points using RNA-seq

Danny A Bitton et al. Genome Res. 2014 Jul.

Abstract

Both canonical and alternative splicing of RNAs are governed by intronic sequence elements and produce transient lariat structures fastened by branch points within introns. To map precisely the location of branch points on a genomic scale, we developed LaSSO (Lariat Sequence Site Origin), a data-driven algorithm which utilizes RNA-seq data. Using fission yeast cells lacking the debranching enzyme Dbr1, LaSSO not only accurately identified canonical splicing events, but also pinpointed novel, but rare, exon-skipping events, which may reflect aberrantly spliced transcripts. Compromised intron turnover perturbed gene regulation at multiple levels, including splicing and protein translation. Notably, Dbr1 function was also critical for the expression of mitochondrial genes and for the processing of self-spliced mitochondrial introns. LaSSO showed better sensitivity and accuracy than algorithms used for computational branch-point prediction or for empirical branch-point determination. Even when applied to a human data set acquired in the presence of debranching activity, LaSSO identified both canonical and exon-skipping branch points. LaSSO thus provides an effective approach for defining high-resolution maps of branch-site sequences and intronic elements on a genomic scale. LaSSO should be useful to validate introns and uncover branch-point sequences in any eukaryote, and it could be integrated into RNA-seq pipelines.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Scheme of intron splicing and diagnostic sequence reads. (A) Pre-mRNA with diagnostic exon-intron reads (cyan). (B) First transesterification reaction: lariat intermediate with phosphodiester bond between 5′ splice donor (red) and branch-point adenine (A) along with upstream sequence (green). (C) Final splicing reaction: exons are ligated yielding mature mRNA with diagnostic exon-exon junction reads (purple), while the lariat is excised. (D) Intron 3′ tail removal and debranching. (E) Rapid degradation or further processing. (F) In dbr1Δ cells, lariats become stabilized and accumulate, resulting in enhanced intronic sequence reads (orange). The reverse transcriptase also reads through the 2′–5′ linkage (hatched blue arrow). (G) This reverse transcription produces unique lariat reads, where the sequence upstream of the branch point (green) precedes the 5′ segment of the intron (red). The enzyme often mutates the branch-point adenine to any other nucleotide as illustrated. The accumulation of lariat structures would inevitably result in the production of additional intronic reads (orange) that enhance intronic expression level.
Figure 2.
Figure 2.
Increased, length-biased intronic expression in dbr1Δ cells. (A) Box plot showing transcript and intronic expression in dbr1Δ and wild-type cells. Intronic expression is significantly higher in dbr1Δ (P < 2.2 × 10−16, Wilcoxon rank sum test). (B) MA plot showing differentially expressed introns (red) and transcripts (blue) (DESeq; adjusted P < 0.05 and absolute fold change > 2). (X-axis) Mean of normalized counts. (C) Reproducibility of intronic expression between dbr1Δ biological replicates (each dot represents one of 5361 introns; [r] Pearson’s correlation coefficient). Introns were binned according to length as indicated in color legend. (D) Comparison of intronic expression between dbr1Δ and wild type with introns binned as in C. Only one comparison is shown; the other biological replicates produced the same trends. The higher intronic expression in dbr1Δ cells shows a strong length bias (P < 2.2 × 10−16, Wilcoxon rank sum test).
Figure 3.
Figure 3.
Splicing efficiency is decreased in dbr1Δ cells. (A,B) Correlation of splicing efficiency (SE) between biological replicates in dbr1Δ (A) and wild type (B). Each dot represents one of 5361 introns; (r) Pearson’s correlation coefficient. (C) Box plot showing SE in dbr1Δ and wild-type cells (P < 2.2 × 10−16, Wilcoxon rank sum test). (D) Comparison of SE between dbr1Δ and wild type. (Red) 638 introns showing significant changes in SE (CMH test; Q < 0.05). (E) As in D but with introns binned according to their size as indicated in color legend. (F,G) Overlap between introns with lower SE and introns with higher expression (DESeq), both with and without fold-change cutoff. The indicated P-values for overlaps are based on a hypergeometric test.
Figure 4.
Figure 4.
LaSSO (Lariat Sequence Site Origin), an algorithm to build a lariat database along with workflow to identify lariat reads from RNA-seq data. (A) The algorithm pseudocode. LaSSO takes a given intron sequence of length “L” and uses the first “read length-1” bases of this intron as the 3′-lariat segment (if shorter, the whole sequence is used). To generate the 5′-lariat segments, accounting for all possible combinations of lariat structures, LaSSO iteratively produces all possible segments by selecting each base at a time as the putative branch point. LaSSO works from the 3′ end of the intronic sequence toward the 5′ end, until it reaches the first intronic base. LaSSO takes only the last read length-1 bases of the 5′-lariat segment (if shorter, the whole sequence is used again). LaSSO then concatenates the 5′ segment, the branch point, and the 3′ segment of the lariat sequence, yielding a diagnostic lariat signature. To generate all possible exon-skipping lariat sequences for a given transcript, the input sequence and algorithm were slightly altered. Briefly, considering a gene with two introns and three exons, only a single skipping event can occur. Therefore, the input sequence is the upstream intron with the downstream intron attached to its 3′ end. To avoid database redundancy, the algorithm iterates L times, where L only refers to the length of the downstream intron, not the combined introns. Thus, the 5′ segment of the skipping lariat sequence is generated from the downstream intron, while the 3′ segment of the skipping lariat always corresponds to the 5′ end of the upstream intron. For more than two introns, all possible skipping events are considered, i.e., Sn = (I−1) × I/2 (I: number of introns, Sn: number of skipping events). (B) Scheme for all possible lariat signatures accounted for by LaSSO. Intron excision results in diagnostic cDNA products upon reverse transcription, where the sequence upstream of the branch point precedes the 5′ end of the intron (resulting in 5′- and 3′-lariat segments, respectively). (Green) 5′-lariat segment from upstream intron; (red) 3′-lariat segment from upstream intron; (orange) 5′-lariat segment from downstream intron; (blue) 3′-lariat segment from downstream intron. (C) Lariat detection workflow (see main text for details).
Figure 5.
Figure 5.
Characterization of lariat branch points and branch-site sequence. (A) Proportion of lariat reads relative to total number of reads not mapped to genome or transcriptome. Absolute numbers in each sample are indicated, along with P-values (Fisher’s exact test). (B) The base (color-coded as indicated) and position (x-axis) of each branch point identified as a function of read number supporting it (y-axis). The primary branch point is placed at position zero on the x-axis. The numbers of lariats and supporting reads are indicated on top. Only branch points located within 10 bases up- (negative values) or downstream (positive values) from a primary branch point are shown. (C) Consensus branch-site sequences around the primary branch point as probability (left) and bits (right), plotted using WebLogo (Crooks et al. 2004) (default settings except for compositional adjustment, with GC content set to 30%). (Top panels) Using LaSSO based on 1236 introns from our data; (middle panels) using LaSSO based on 930 introns from 2D-Lariat-seq data (Awan et al. 2013) that were supported by ≥3 lariat reads; (bottom panels) using FELINES for the same set of 1236 introns detected in this study. (D) Number of introns of different sizes for which lariat reads were detected by LaSSO when no read-number threshold was applied: 1584 lariats for our data, 1268 lariats for 2D-Lariat-seq data by Awan et al. (2013). Introns were binned according to their size as indicated (5361 introns in total).

References

    1. Anders S, Huber W 2010. Differential expression analysis for sequence count data. Genome Biol 11: R106. - PMC - PubMed
    1. Arenas JE, Abelson JN 1997. Prp43: an RNA helicase-like factor involved in spliceosome disassembly. Proc Natl Acad Sci 94: 11798–11802 - PMC - PubMed
    1. Awan AR, Manfredo A, Pleiss JA 2013. Lariat sequencing in a unicellular yeast identifies regulated alternative splicing of exons that are evolutionarily conserved with humans. Proc Natl Acad Sci 110: 12762–12767 - PMC - PubMed
    1. Boyle EI, Weng S, Gollub J, Jin H, Botstein D, Cherry JM, Sherlock G 2004. GO::TermFinder–open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes. Bioinformatics 20: 3710–3715 - PMC - PubMed
    1. Cheng Z, Menees TM 2011. RNA splicing and debranching viewed through analysis of RNA lariats. Mol Genet Genomics 286: 395–410 - PubMed

MeSH terms