Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2011 Jun 16:12:317.
doi: 10.1186/1471-2164-12-317.

Short read Illumina data for the de novo assembly of a non-model snail species transcriptome (Radix balthica, Basommatophora, Pulmonata), and a comparison of assembler performance

Affiliations
Comparative Study

Short read Illumina data for the de novo assembly of a non-model snail species transcriptome (Radix balthica, Basommatophora, Pulmonata), and a comparison of assembler performance

Barbara Feldmeyer et al. BMC Genomics. .

Abstract

Background: Until recently, read lengths on the Solexa/Illumina system were too short to reliably assemble transcriptomes without a reference sequence, especially for non-model organisms. However, with read lengths up to 100 nucleotides available in the current version, an assembly without reference genome should be possible. For this study we created an EST data set for the common pond snail Radix balthica by Illumina sequencing of a normalized transcriptome. Performance of three different short read assemblers was compared with respect to: the number of contigs, their length, depth of coverage, their quality in various BLAST searches and the alignment to mitochondrial genes.

Results: A single sequencing run of a normalized RNA pool resulted in 16,923,850 paired end reads with median read length of 61 bases. The assemblies generated by VELVET, OASES, and SeqMan NGEN differed in the total number of contigs, contig length, the number and quality of gene hits obtained by BLAST searches against various databases, and contig performance in the mt genome comparison. While VELVET produced the highest overall number of contigs, a large fraction of these were of small size (< 200bp), and gave redundant hits in BLAST searches and the mt genome alignment. The best overall contig performance resulted from the NGEN assembly. It produced the second largest number of contigs, which on average were comparable to the OASES contigs but gave the highest number of gene hits in two out of four BLAST searches against different reference databases. A subsequent meta-assembly of the four contig sets resulted in larger contigs, less redundancy and a higher number of BLAST hits.

Conclusion: Our results document the first de novo transcriptome assembly of a non-model species using Illumina sequencing data. We show that de novo transcriptome assembly using this approach yields results useful for downstream applications, in particular if a meta-assembly of contig sets is used to increase contig quality. These results highlight the ongoing need for improvements in assembly methodology.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Distribution of contig number and length (contigs > 200bp). Total number of contigs obtained by each assembler: VELVET = 220.154; NGEN = 57,986; OASESkmer-21 = 52,477; OASESkmer-31 = 41,590; Meta-assembly = 54,450.
Figure 2
Figure 2
Contig BLASTX hits against the UniProt database. Overview BLAST results of contigs produced by three different assemblers. BLASTX cutoff values were set to either < e-5 or < e-10. Results are shown for the total number of hits, the number of UniGen hits, and for contigs larger than 200bp. (ORK21/31: OASESkmer-21/31).
Figure 3
Figure 3
Contig BLASTN hits against the RefSeq database. Overview BLAST results of contigs produced by three different assemblers. BLASTN cutoff values were set to either < e-5 or < e-10. Results are shown for the total number of hits, the number of UniGen hits, and for contigs larger than 200bp. (ORK21/31: OASESkmer-21/31).
Figure 4
Figure 4
Contig BLASTX hits against the Biomphalaria glabrata EST database. Overview BLAST results of contigs produced by three different assemblers. BLASTX cutoff values were set to either < e-5 or < e-10. Results are shown for the total number of hits, the number of UniGen hits, and for contigs larger than 200bp. (ORK21/31: OASESkmer-21/31).
Figure 5
Figure 5
Distribution of contig length versus average coverage, shown for the NGEN assembled contig set.
Figure 6
Figure 6
Number of unique and identical UniProt gene hits of the four different contig sets. BLASTX applied cutoff values a) < e-5 and b) < e-10 with contigs > 200bp.
Figure 7
Figure 7
Contig quality assessment by BLASTN against Radix mt genes. Relationship aligned contig length versus total contig length (left panel), as well as proportion of aligned contig length versus total contig length and cutoff value < e-5 (right panel). As the pattern of OASESkmer-21 and 31 contigs are similar only OASESkmer-31 is depicted here.

Similar articles

Cited by

References

    1. Bouck A, Vision T. The molecular ecologist's guide to expressed sequence tags. Mol Ecol. 2007;16(5):907–924. - PubMed
    1. Reusch TBH, Wood TE. Molecular ecology of global change. Mol Ecol. 2007;16(19):3973–3992. doi: 10.1111/j.1365-294X.2007.03454.x. - DOI - PubMed
    1. Wheat CW. Rapidly developing functional genomics in ecological model systems via 454 transcriptome sequencing. Genetica. 2010;138:433–451. doi: 10.1007/s10709-008-9326-y. - DOI - PubMed
    1. Abernathy JW, Xu P, Li P, Xu DH, Kucuktas H, Klesius P, Arias C, Liu ZJ. Generation and analysis of expressed sequence tags from the ciliate protozoan parasite Ichthyophthirius multifiliis. BMC Genomics. 2007;8:176. doi: 10.1186/1471-2164-8-176. - DOI - PMC - PubMed
    1. Bacchetti De Gregoris T, Borra M, Biffali E, Bekel T, Burgess J, Kirby R, Clare A. Construction of an adult barnacle (Balanus amphitrite) cDNA library and selection of reference genes for quantitative RT-PCR studies. BMC Molecular Biology. 2009;10(1):62. doi: 10.1186/1471-2199-10-62. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances