Supersplat--spliced RNA-seq alignment
- PMID: 20410051
- PMCID: PMC2881391
- DOI: 10.1093/bioinformatics/btq206
Supersplat--spliced RNA-seq alignment
Abstract
Motivation: High-throughput sequencing technologies have recently made deep interrogation of expressed transcript sequences practical, both economically and temporally. Identification of intron/exon boundaries is an essential part of genome annotation, yet remains a challenge. Here, we present supersplat, a method for unbiased splice-junction discovery through empirical RNA-seq data.
Results: Using a genomic reference and RNA-seq high-throughput sequencing datasets, supersplat empirically identifies potential splice junctions at a rate of approximately 11.4 million reads per hour. We further benchmark the performance of the algorithm by mapping Illumina RNA-seq reads to identify introns in the genome of the reference dicot plant Arabidopsis thaliana and we demonstrate the utility of supersplat for de novo empirical annotation of splice junctions using the reference monocot plant Brachypodium distachyon.
Availability: Implemented in C++, supersplat source code and binaries are freely available on the web at http://mocklerlab-tools.cgrb.oregonstate.edu/.
Figures






References
-
- De Bona F, et al. Optimal spliced alignments of short sequence reads. BMC Bioinformatics. 2008;24:i174. - PubMed
-
- Fox S, et al. Applications of ultra high throughput sequencing in plants. Plant Syst. Biol. 2009;553:79–108. - PubMed
-
- Morgulis A, et al. A fast and symmetric DUST implementation to mask low-complexity DNA sequences. J. Comput. Biol. 2006;13:1028–1040. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources