Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2016 Jan 11:6:361.
doi: 10.3389/fgene.2015.00361. eCollection 2015.

Assembly, Assessment, and Availability of De novo Generated Eukaryotic Transcriptomes

Affiliations
Review

Assembly, Assessment, and Availability of De novo Generated Eukaryotic Transcriptomes

Joanna Moreton et al. Front Genet. .

Abstract

De novo assembly of a complete transcriptome without the need for a guiding reference genome is attractive, particularly where the cost and complexity of generating a eukaryote genome is prohibitive. The transcriptome should not however be seen as just a quick and cheap alternative to building a complete genome. Transcriptomics allows the understanding and comparison of spatial and temporal samples within an organism, and allows surveying of multiple individuals or closely related species. De novo assembly in theory allows the building of a complete transcriptome without any prior knowledge of the genome. It also allows the discovery of alternate splice forms of coding RNAs and also non-coding RNAs, which are often missed by proteomic approaches, or are incompletely annotated in genome studies. The limitations of the method are that the generation of a truly complete assembly is unlikely, and so we require some methods for the assessment of the quality and appropriateness of a generated transcriptome. Whilst no single consensus pipeline or tool is agreed as optimal, various algorithms, and easy to use software do exist making transcriptome generation a more common approach. With this expansion of data, questions still exist relating to how do we make these datasets fully discoverable, comparable and most useful to understand complex biological systems?

Keywords: annotation; assessment; availability; de novo transcriptome assembly; high-throughput sequencing.

PubMed Disclaimer

Figures

Figure 1
Figure 1
An overview of the two transcriptome assembly pipelines. The key parts of two transcriptome assembly pipelines are shown depending on whether a reference genome is available. This review is focused on de novo transcriptome assembly; more information on the pipeline for reference-based transcriptome assembly can be found in review papers such as Martin and Wang (2011).
Figure 2
Figure 2
An example of a simple de Bruijn graph. (A) Read sequences (B) All subsequence k-mers of length 5 from the reads (C) A de Bruijn graph constructed from unique k-mers as the nodes and overlapping k-mers connected by edges (a k-mer shifted by one base overlaps another k-mer by k-1 bases) (D) Assembled transcripts by traversing the two paths in the graph.

References

    1. Altschul S. F., Madden T. L., Schäffer A. A., Zhang J., Zhang Z., Miller W., et al. . (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402. 10.1093/nar/25.17.3389 - DOI - PMC - PubMed
    1. Arun-Chinnappa K. S., McCurdy D. W. (2015). De novo assembly of a genome-wide transcriptome map of Vicia faba (L.) for transfer cell research. Front. Plant Sci. 6:217. 10.3389/fpls.2015.00217 - DOI - PMC - PubMed
    1. Aya K., Kobayashi M., Tanaka J., Ohyanagi H., Suzuki T., Yano K., et al. . (2015). De novo transcriptome assembly of a fern, Lygodium japonicum, and a web resource database, Ljtrans DB. Plant Cell Physiol. 56, e5. 10.1093/pcp/pcu184 - DOI - PubMed
    1. Bao W., Kojima K. K., Kohany O. (2015). Repbase update, a database of repetitive elements in eukaryotic genomes. Mob. DNA 6:11. 10.1186/s13100-015-0041-9 - DOI - PMC - PubMed
    1. Benson G. (1999). Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580. 10.1093/nar/27.2.573 - DOI - PMC - PubMed

LinkOut - more resources