Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Apr 15;28(8):1086-92.
doi: 10.1093/bioinformatics/bts094. Epub 2012 Feb 24.

Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels

Affiliations

Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels

Marcel H Schulz et al. Bioinformatics. .

Abstract

Motivation: High-throughput sequencing has made the analysis of new model organisms more affordable. Although assembling a new genome can still be costly and difficult, it is possible to use RNA-seq to sequence mRNA. In the absence of a known genome, it is necessary to assemble these sequences de novo, taking into account possible alternative isoforms and the dynamic range of expression values.

Results: We present a software package named Oases designed to heuristically assemble RNA-seq reads in the absence of a reference genome, across a broad spectrum of expression values and in presence of alternative isoforms. It achieves this by using an array of hash lengths, a dynamic filtering of noise, a robust resolution of alternative splicing events and the efficient merging of multiple assemblies. It was tested on human and mouse RNA-seq data and is shown to improve significantly on the transABySS and Trinity de novo transcriptome assemblers.

Availability and implementation: Oases is freely available under the GPL license at www.ebi.ac.uk/~zerbino/oases/.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Schematic overview of the Oases pipeline: (1) Individual reads are sequenced from an RNA sample; (2) Contigs are built from those reads, some of them are labeled as long (clear), others short (dark); (3) Long contigs, connected by single reads or read-pairs are grouped into connected components called loci; (4) Short contigs are attached to the loci; and (5) The loci are transitively reduced. Tranfrags are then extracted from the loci. The loci are divided into four categories: (A) chains, (B) bubbles, (C) forks and (D) complex (i.e. all the loci which did not fit into the previous categories).
Fig. 2.
Fig. 2.
Comparison of single k-mer Oases assemblies and the merged assembly from kMIN=19 to kMAX=35 by Oases-M, on the human dataset. The total number of Ensembl transcripts assembled to 80 of their length is provided by RPKM gene expression quantiles of 1464 genes each.
Fig. 3.
Fig. 3.
Reconstruction efficiency of Ensembl transcripts for different RNA-seq de novo assembly methods (Oases-M, Trinity, and transABySS) on human and mouse datasets. Reference-based assembly results using Cufflinks are provided on the mouse dataset. All annotated genes have been grouped into quantiles by RPKM expression values of 1464 (resp. 1078) genes for the human data (resp. mouse).
Fig. 4.
Fig. 4.
Venn Diagramm that compares mouse Ensembl transcripts reconstructed to full length by Trinity, Trans-AbySS and Oases-M for the mouse RNA-seq data.

References

    1. Birol I., et al. De novo transcriptome assembly with ABySS. Bioinformatics. 2009;25:2872–2877. - PubMed
    1. Blencowe B.J., et al. Current-generation high-throughput sequencing: deepening insights into mammalian transcriptomes. Gene. Dev. 2009;23:1379–1386. - PubMed
    1. Butler J., et al. ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res. 2008;18:810–820. - PMC - PubMed
    1. Collins L.J., et al. An approach to transcriptome analysis of non-model organisms using short-read sequences. Genome Inform. 2008;21:3–14. - PubMed
    1. Denoeud F., et al. Annotating genomes with massive-scale RNA sequencing. Genome Biol. 2008;9:R175. - PMC - PubMed

Publication types