Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Aug;38(14):4570-8.
doi: 10.1093/nar/gkq211. Epub 2010 Apr 5.

Detection of splice junctions from paired-end RNA-seq data by SpliceMap

Affiliations

Detection of splice junctions from paired-end RNA-seq data by SpliceMap

Kin Fai Au et al. Nucleic Acids Res. 2010 Aug.

Abstract

Alternative splicing is a prevalent post-transcriptional process, which is not only important to normal cellular function but is also involved in human diseases. The newly developed second generation sequencing technique provides high-throughput data (RNA-seq data) to study alternative splicing events in different types of cells. Here, we present a computational method, SpliceMap, to detect splice junctions from RNA-seq data. This method does not depend on any existing annotation of gene structures and is capable of finding novel splice junctions with high sensitivity and specificity. It can handle long reads (50-100 nt) and can exploit paired-read information to improve mapping accuracy. Several parameters are included in the output to indicate the reliability of the predicted junction and help filter out false predictions. We applied SpliceMap to analyze 23 million paired 50-nt reads from human brain tissue. The results show at this depth of sequencing, RNA-seq can support reliable detection of splice junctions except for those that are present at very low level. Compared to current methods, SpliceMap can achieve 12% higher sensitivity without sacrificing specificity.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Workflow of standard SpliceMap and outline of junction search based on half-read mapping: (a) SpliceMap consists of four steps: half-read mapping, seeding selection, junction search and paired-end filtering. SpliceMap outputs coverage plot and junctions detected. (b) Each half-read is aligned to the genome and extended to obtain the partial alignment. The remaining part of the read, if at least 10 nt, will be used to search for its matches within a neighborhood (400000 nt). The GT–AG splicing signal is also used to filter the matches.
Figure 2.
Figure 2.
The direction and positional order of the paired-end reads (R1–R2). If the sequencing sample is the same as the original copy, the read R1 should be mappable in forward direction in 5′-end and R2 in reverse direction in 3′-end. If the sequencing sample is the complementary copy, the read R1 should be mappable in reverse direction in 3′-end and R1 in forward direction in 5′-end.
Figure 3.
Figure 3.
Schematic of the parameters to assess junctions in SpliceMap. The deep green reads are uniquely mapped supporting reads (nUM = 4) and the wheat reads are multiply mapped supporting reads. Thus, nR of this junction is 6. But some supporting reads are redundant, so nNR = 4. There are four and three uniquely mapped reads (grey green) in upstream and downstream adjacent regions of 40 nt respectively, so nUP = 4 and nDOWN = 3.
Figure 4.
Figure 4.
The Venn diagram of the distribution of redundancy, novelty and EST evidence of the junctions predicted by SpliceMap. Only 1389 known junctions are with single non-redundant read and not supported by EST evidence.
Figure 5.
Figure 5.
Filtering by paired-end information. The top two tracks are the results from single-end SpliceMap and paired-end SpliceMap before nUP, nDOWN and nUM filtering, respectively. The known junctions detected are in black and the novel ones in red. Single read analysis predicts several junctions that are very long and jump across genes. These are false positive results and the paired-end information helps to remove them.

References

    1. Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB. Alternative isoform regulation in human tissue transcriptomes. Nature. 2008;462:470–476. - PMC - PubMed
    1. Matlin AJ, Clark F, Smith CW. Understanding alternative splicing: towards a cellular code. Nat. Rev. Mol. Cell. Biol. 2005;6:386–398. - PubMed
    1. Nagao K, Togawa N, Fujii K, Uchikawa H, Kohno Y, Yamada M, Miyashita T. Detecting tissue-specific alternative splicing and disease-associated aberrant splicing of the PTCH gene with exon junction microarrays. Hum. Mol. Genet. 2005;14:3379–3388. - PubMed
    1. Wang H, Hubbell E, Hu JS, Mei G, Cline M, Lu G, Clark T, Siani-Rose MA, Ares M, Kulp DC, et al. Gene structure-based splice variant deconvolution using a microarray platform. Bioinformatics. 2003;19:315–322. - PubMed
    1. Adams MD, Soares MB, Kerlavage AR, Fields C, Venter JC. Rapid cDNA sequencing (expressed sequence tags) from a directionally cloned human infant brain cDNA library. Nat. Genet. 1993;4:373–380. - PubMed

Publication types

Substances