Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jan 13;16(1):1.
doi: 10.1186/s13059-014-0572-2.

De novo assembly of bacterial transcriptomes from RNA-seq data

Affiliations

De novo assembly of bacterial transcriptomes from RNA-seq data

Brian Tjaden. Genome Biol. .

Abstract

Transcriptome assays are increasingly being performed by high-throughput RNA sequencing (RNA-seq). For organisms whose genomes have not been sequenced and annotated, transcriptomes must be assembled de novo from the RNA-seq data. Here, we present novel algorithms, specific to bacterial gene structures and transcriptomes, for analysis of bacterial RNA-seq data and de novo transcriptome assembly. The algorithms are implemented in an open source software system called Rockhopper 2. We find that Rockhopper 2 outperforms other de novo transcriptome assemblers and offers accurate and efficient analysis of bacterial RNA-seq data. Rockhopper 2 is available at http://cs.wellesley.edu/~btjaden/Rockhopper .

PubMed Disclaimer

Figures

Figure 1
Figure 1
Rockhopper 2 workflow depicting the various phases of Rockhopper 2’s analyses. As input, Rockhopper 2 requires one or more files of sequencing reads from RNA-seq experiments. In the first analysis stage, Rockhopper 2 determines k-mers from the sequencing reads and builds a de Bruijn graph from the k-mers. The de Bruijn graph is used to assemble candidate transcripts, which are stored in a Burrows-Wheeler index. In the second analysis stage, Rockhopper 2 aligns the sequencing reads to the assembled candidate transcripts to determine a final set of high-quality assembled transcripts. After the second stage, transcriptome assembly is complete and Rockhopper 2 performs several downstream analyses, including normalizing data from different experiments, quantifying transcript abundance, and testing for differential gene expression across multiple conditions.
Figure 2
Figure 2
Performance assembling E. coli genome from DNA-seq data. The performance of Rockhopper 2 as well as two other assemblers, SOAPdenovo2 and Trinity, on three biological replicate DNA-seq experiments from E. coli is illustrated. (A) Specificity is the percentage of assembled contigs that align to the E. coli genome. (B) Sensitivity is the percentage of the E. coli genome sequence that is covered by assembled contigs aligning to the genome. (C) Execution time is the number of minutes that an assembler requires to execute on the DNA-seq data.
Figure 3
Figure 3
Performance assembling transcripts from RNA-seq data. The performance of each of three assemblers on 12 RNA-seq data sets is illustrated. The 12 RNA-seq data sets correspond to nine bacteria, two archaea, and one fungus. (A) Specificity is the percentage of assembled transcripts that align to the genome. (B) Sensitivity is the percentage of the reference gene sequences that is covered by assembled transcripts aligning to the reference genes. (C) Contiguity is the percentage of reference genes that are at least δ = 80% covered by their single longest aligning transcript. (D) RMBT is the percentage of sequencing reads to align to assembled transcripts. (E) Execution time is the number of minutes that an assembler requires to execute on the RNA-seq data set.
Figure 4
Figure 4
Rockhopper 2 performance at different expression deciles. For each of the 12 RNA-seq data sets, the set of reference genes was divided into 10 groups based on their expression levels, with the 10% of reference genes with lowest expression in the first group and the 10% of reference genes with highest expression in the last group. The sensitivity (purple) and contiguity (yellow) of Rockhopper 2’s assemblies across all 12 RNA-seq data sets are illustrated.

References

    1. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomes. Nat Rev Genet. 2009;10:57–63. doi: 10.1038/nrg2484. - DOI - PMC - PubMed
    1. Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011;8:469–77. doi: 10.1038/nmeth.1613. - DOI - PubMed
    1. Flicek P, Birney E. Sense from sequence reads: methods for alignment and assembly. Nat Methods. 2009;6:S6–12. doi: 10.1038/nmeth.1376. - DOI - PubMed
    1. Martin JA, Wang Z. Next-generation transcriptome assembly. Nat Rev Genet. 2011;12:671–82. doi: 10.1038/nrg3068. - DOI - PubMed
    1. Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010;28:511–5. doi: 10.1038/nbt.1621. - DOI - PMC - PubMed

Publication types