Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2014 Feb 1:7:69.
doi: 10.1186/1756-0500-7-69.

RNA sequencing read depth requirement for optimal transcriptome coverage in Hevea brasiliensis

Affiliations
Comparative Study

RNA sequencing read depth requirement for optimal transcriptome coverage in Hevea brasiliensis

Keng-See Chow et al. BMC Res Notes. .

Abstract

Background: One of the concerns of assembling de novo transcriptomes is determining the amount of read sequences required to ensure a comprehensive coverage of genes expressed in a particular sample. In this report, we describe the use of Illumina paired-end RNA-Seq (PE RNA-Seq) reads from Hevea brasiliensis (rubber tree) bark to devise a transcript mapping approach for the estimation of the read amount needed for deep transcriptome coverage.

Findings: We optimized the assembly of a Hevea bark transcriptome based on 16 Gb Illumina PE RNA-Seq reads using the Oases assembler across a range of k-mer sizes. We then assessed assembly quality based on transcript N50 length and transcript mapping statistics in relation to (a) known Hevea cDNAs with complete open reading frames, (b) a set of core eukaryotic genes and (c) Hevea genome scaffolds. This was followed by a systematic transcript mapping process where sub-assemblies from a series of incremental amounts of bark transcripts were aligned to transcripts from the entire bark transcriptome assembly. The exercise served to relate read amounts to the degree of transcript mapping level, the latter being an indicator of the coverage of gene transcripts expressed in the sample. As read amounts or datasize increased toward 16 Gb, the number of transcripts mapped to the entire bark assembly approached saturation. A colour matrix was subsequently generated to illustrate sequencing depth requirement in relation to the degree of coverage of total sample transcripts.

Conclusions: We devised a procedure, the "transcript mapping saturation test", to estimate the amount of RNA-Seq reads needed for deep coverage of transcriptomes. For Hevea de novo assembly, we propose generating between 5-8 Gb reads, whereby around 90% transcript coverage could be achieved with optimized k-mers and transcript N50 length. The principle behind this methodology may also be applied to other non-model plants, or with reads from other second generation sequencing platforms.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Size distributions of clean paired reads from latex, leaf and bark libraries.
Figure 2
Figure 2
Number of transcripts from the optimized 16 Gb bark assembly with hits to rubber genome scaffolds. Hits are classified into transcript-to-scaffold coverage categories describing the extent of alignment from 10-100%. A total of 84,471 bark transcripts (all categories) were mapped to genome scaffolds. The proportion of hits from 84,471 in each category is shown in brackets.
Figure 3
Figure 3
Methodology of the transcript mapping saturation test.
Figure 4
Figure 4
Colour matrix representing the transcript mapping saturation test results. An asterisk corresponds to the assembly with the optimized k-mer and transcript N50 length for a particular datasize (1 Gb assembly transcripts were not included since the transcript N50 length was not optimized by any k-mer for this datasize). See Additional file 2: Table S2 for full details of BlastN matches by subsets of bark transcripts to the optimized 16 Gb bark transcriptome.

Similar articles

Cited by

References

    1. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10:57–63. doi: 10.1038/nrg2484. - DOI - PMC - PubMed
    1. Ozsolak F, Milos PM. RNA sequencing: advances, challenges and opportunities. Nat Rev Genet. 2011;12:87–98. doi: 10.1038/nrg2934. - DOI - PMC - PubMed
    1. Van Verk MC, Hickman R, Pieterse CMJ, Van Wees SCM. RNA-Seq: revelation of the messengers. Trends Plant Sci. 2013;18:175–179. doi: 10.1016/j.tplants.2013.02.001. - DOI - PubMed
    1. Zerbino DR, Birney E. Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18:821–829. doi: 10.1101/gr.074492.107. - DOI - PMC - PubMed
    1. Schulz MH, Zerbino DR, Vingron M, Birney E. Oases: robust de novo RNA-Seq assembly across the dynamic range of expression labels. Bioinformatics. 2012;28:1086–1092. doi: 10.1093/bioinformatics/bts094. - DOI - PMC - PubMed

Publication types

MeSH terms