Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study
- PMID: 22373417
- PMCID: PMC3287467
- DOI: 10.1186/1471-2105-12-S14-S2
Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study
Abstract
Background: With the fast advances in nextgen sequencing technology, high-throughput RNA sequencing has emerged as a powerful and cost-effective way for transcriptome study. De novo assembly of transcripts provides an important solution to transcriptome analysis for organisms with no reference genome. However, there lacked understanding on how the different variables affected assembly outcomes, and there was no consensus on how to approach an optimal solution by selecting software tool and suitable strategy based on the properties of RNA-Seq data.
Results: To reveal the performance of different programs for transcriptome assembly, this work analyzed some important factors, including k-mer values, genome complexity, coverage depth, directional reads, etc. Seven program conditions, four single k-mer assemblers (SK: SOAPdenovo, ABySS, Oases and Trinity) and three multiple k-mer methods (MK: SOAPdenovo-MK, trans-ABySS and Oases-MK) were tested. While small and large k-mer values performed better for reconstructing lowly and highly expressed transcripts, respectively, MK strategy worked well for almost all ranges of expression quintiles. Among SK tools, Trinity performed well across various conditions but took the longest running time. Oases consumed the most memory whereas SOAPdenovo required the shortest runtime but worked poorly to reconstruct full-length CDS. ABySS showed some good balance between resource usage and quality of assemblies.
Conclusions: Our work compared the performance of publicly available transcriptome assemblers, and analyzed important factors affecting de novo assembly. Some practical guidelines for transcript reconstruction from short-read RNA-Seq data were proposed. De novo assembly of C. sinensis transcriptome was greatly improved using some optimized methods.
Figures




Similar articles
-
Comparison of De Novo Transcriptome Assemblers and k-mer Strategies Using the Killifish, Fundulus heteroclitus.PLoS One. 2016 Apr 7;11(4):e0153104. doi: 10.1371/journal.pone.0153104. eCollection 2016. PLoS One. 2016. PMID: 27054874 Free PMC article.
-
Comprehensive evaluation of de novo transcriptome assembly programs and their effects on differential gene expression analysis.Bioinformatics. 2017 Feb 1;33(3):327-333. doi: 10.1093/bioinformatics/btw625. Bioinformatics. 2017. PMID: 28172640
-
Optimizing de novo assembly of short-read RNA-seq data for phylogenomics.BMC Genomics. 2013 May 14;14:328. doi: 10.1186/1471-2164-14-328. BMC Genomics. 2013. PMID: 23672450 Free PMC article.
-
Current and Future Methods for mRNA Analysis: A Drive Toward Single Molecule Sequencing.Methods Mol Biol. 2018;1783:209-241. doi: 10.1007/978-1-4939-7834-2_11. Methods Mol Biol. 2018. PMID: 29767365 Review.
-
A simple guide to de novo transcriptome assembly and annotation.Brief Bioinform. 2022 Mar 10;23(2):bbab563. doi: 10.1093/bib/bbab563. Brief Bioinform. 2022. PMID: 35076693 Free PMC article. Review.
Cited by
-
De novo assembly of bacterial transcriptomes from RNA-seq data.Genome Biol. 2015 Jan 13;16(1):1. doi: 10.1186/s13059-014-0572-2. Genome Biol. 2015. PMID: 25583448 Free PMC article.
-
Evaluation of de novo transcriptome assemblies from RNA-Seq data.Genome Biol. 2014 Dec 21;15(12):553. doi: 10.1186/s13059-014-0553-5. Genome Biol. 2014. PMID: 25608678 Free PMC article.
-
The Complete Chloroplast Genome of Arabidopsis thaliana Isolated in Korea (Brassicaceae): An Investigation of Intraspecific Variations of the Chloroplast Genome of Korean A. thaliana.Int J Genomics. 2020 Sep 5;2020:3236461. doi: 10.1155/2020/3236461. eCollection 2020. Int J Genomics. 2020. PMID: 32964010 Free PMC article.
-
Additive multiple k-mer transcriptome of the keelworm Pomatoceros lamarckii (Annelida; Serpulidae) reveals annelid trochophore transcription factor cassette.Dev Genes Evol. 2012 Nov;222(6):325-39. doi: 10.1007/s00427-012-0416-6. Epub 2012 Oct 9. Dev Genes Evol. 2012. PMID: 23053624
-
Transcriptomic analysis of the red and green light responses in Columba livia domestica.3 Biotech. 2019 Jan;9(1):20. doi: 10.1007/s13205-018-1551-1. Epub 2019 Jan 2. 3 Biotech. 2019. PMID: 30622858 Free PMC article.
References
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous