Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 May 22;18(1):395.
doi: 10.1186/s12864-017-3757-8.

A survey of the complex transcriptome from the highly polyploid sugarcane genome using full-length isoform sequencing and de novo assembly from short read sequencing

Affiliations

A survey of the complex transcriptome from the highly polyploid sugarcane genome using full-length isoform sequencing and de novo assembly from short read sequencing

Nam V Hoang et al. BMC Genomics. .

Abstract

Background: Despite the economic importance of sugarcane in sugar and bioenergy production, there is not yet a reference genome available. Most of the sugarcane transcriptomic studies have been based on Saccharum officinarum gene indices (SoGI), expressed sequence tags (ESTs) and de novo assembled transcript contigs from short-reads; hence knowledge of the sugarcane transcriptome is limited in relation to transcript length and number of transcript isoforms.

Results: The sugarcane transcriptome was sequenced using PacBio isoform sequencing (Iso-Seq) of a pooled RNA sample derived from leaf, internode and root tissues, of different developmental stages, from 22 varieties, to explore the potential for capturing full-length transcript isoforms. A total of 107,598 unique transcript isoforms were obtained, representing about 71% of the total number of predicted sugarcane genes. The majority of this dataset (92%) matched the plant protein database, while just over 2% was novel transcripts, and over 2% was putative long non-coding RNAs. About 56% and 23% of total sequences were annotated against the gene ontology and KEGG pathway databases, respectively. Comparison with de novo contigs from Illumina RNA-Sequencing (RNA-Seq) of the internode samples from the same experiment and public databases showed that the Iso-Seq method recovered more full-length transcript isoforms, had a higher N50 and average length of largest 1,000 proteins; whereas a greater representation of the gene content and RNA diversity was captured in RNA-Seq. Only 62% of PacBio transcript isoforms matched 67% of de novo contigs, while the non-matched proportions were attributed to the inclusion of leaf/root tissues and the normalization in PacBio, and the representation of more gene content and RNA classes in the de novo assembly, respectively. About 69% of PacBio transcript isoforms and 41% of de novo contigs aligned with the sorghum genome, indicating the high conservation of orthologs in the genic regions of the two genomes.

Conclusions: The transcriptome dataset should contribute to improved sugarcane gene models and sugarcane protein predictions; and will serve as a reference database for analysis of transcript expression in sugarcane.

Keywords: De novo assembly; Hybrid assembly; Isoform sequencing; Polyploid transcriptome; SUGIT database; Sugarcane; Transcriptome assembly.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Comparison between the sugarcane PacBio transcript isoforms and de novo transcript contigs
Fig. 2
Fig. 2
Average coverage of sugarcane de novo contigs and PacBio isoforms obtained from read mapping. a, Coverage of de novo transcript contigs. b, Coverage of PacBio transcript isoforms
Fig. 3
Fig. 3
Full-length analysis between sugarcane PacBio transcript isoforms and de novo transcript contigs. a, Counts of proteins covered by transcripts at different thresholds. b, Comparison between the protein hits from PacBio and de novo transcripts which covered at least 70% Viridiplantae protein length
Fig. 4
Fig. 4
Evidence of different transcript isoforms of sugarcane transcriptome present in the PacBio transcript dataset. a, Isoforms aligned against the sorghum chromosome 1. b, Isoforms aligned to contigs of our in-house sugarcane whole genome de novo assembly. c, Different transcript isoforms aligned to sucrose phosphate synthase gene and cellulase 6 gene contigs. d, Average exons per transcript estimated based on the transcript isoforms aligned against sorghum genome
Fig. 5
Fig. 5
Analysis of ORFs and transcript prediction of sugarcane transcriptome. a, Length distribution of ORF-containing transcripts resulted from TransDecoder and Evigen. b, Length distribution of predicted transcripts by Evigene in PacBio data. c, Length distribution of predicted transcripts by Evigene in de novo contig data
Fig. 6
Fig. 6
Gene ontology enrichment analysis of sugarcane transcript sequences. For de novo transcript contigs, only GO terms represented for 100,000 sequences were used
Fig. 7
Fig. 7
KEGG metabolic pathway classification of sugarcane PacBio transcript isoforms and de novo transcript contigs
Fig. 8
Fig. 8
PacBio transcript isoforms aligned against the sorghum chromosomes. Purple blocks represent for the transcript isoforms distribution along the sorghum chromosomes
Fig. 9
Fig. 9
Sugarcane sample collection from leaf, internodal and root tissues used for this study

Similar articles

Cited by

References

    1. Grivet L, Arruda P. Sugarcane genomics: depicting the complex genome of an important tropical crop. Curr Opin Plant Biol. 2002;5(2):122–127. doi: 10.1016/S1369-5266(02)00234-0. - DOI - PubMed
    1. Hotta C, Lembke C, Domingues D, Ochoa E, Cruz GQ, Melotto-Passarin D, Marconi T, Santos M, Mollinari M, Margarido GA, et al. The biotechnology roadmap for sugarcane improvement. Trop Plant Biol. 2010;3(2):75–87. doi: 10.1007/s12042-010-9050-5. - DOI
    1. Vettore AL, da Silva FR, Kemper EL, Souza GM, da Silva AM, Ferro MI, Henrique-Silva F, Giglioti EA, Lemos MV, Coutinho LL, et al. Analysis and functional annotation of an expressed sequence tag collection for tropical crop sugarcane. Genome Res. 2003;13(12):2725–2735. doi: 10.1101/gr.1532103. - DOI - PMC - PubMed
    1. Souza GM, Berges H, Bocs S, Casu R, D’Hont A, Ferreira JE, Henry R, Ming R, Potier B, Sluys M-A, et al. The sugarcane genome challenge: strategies for sequencing a highly complex genome. Trop Plant Biol. 2011;4(3–4):145–156. doi: 10.1007/s12042-011-9079-0. - DOI
    1. Castleden CK, Aoki N, Gillespie VJ, MacRae EA, Quick WP, Buchner P, Foyer CH, Furbank RT, Lunn JE. Evolution and function of the sucrose-phosphate synthase gene families in wheat and other grasses. Plant Physiol. 2004;135(3):1753–1764. doi: 10.1104/pp.104.042457. - DOI - PMC - PubMed

Publication types

LinkOut - more resources