Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Oct 30:6:e5818.
doi: 10.7717/peerj.5818. eCollection 2018.

Uncovering full-length transcript isoforms of sugarcane cultivar Khon Kaen 3 using single-molecule long-read sequencing

Affiliations

Uncovering full-length transcript isoforms of sugarcane cultivar Khon Kaen 3 using single-molecule long-read sequencing

Jittima Piriyapongsa et al. PeerJ. .

Abstract

Background: Sugarcane is an important global food crop and energy resource. To facilitate the sugarcane improvement program, genome and gene information are important for studying traits at the molecular level. Most currently available transcriptome data for sugarcane were generated using second-generation sequencing platforms, which provide short reads. The de novo assembled transcripts from these data are limited in length, and hence may be incomplete and inaccurate, especially for long RNAs.

Methods: We generated a transcriptome dataset of leaf tissue from a commercial Thai sugarcane cultivar Khon Kaen 3 (KK3) using PacBio RS II single-molecule long-read sequencing by the Iso-Seq method. Short-read RNA-Seq data were generated from the same RNA sample using the Ion Proton platform for reducing base calling errors.

Results: A total of 119,339 error-corrected transcripts were generated with the N50 length of 3,611 bp, which is on average longer than any previously reported sugarcane transcriptome dataset. 110,253 sequences (92.4%) contain an open reading frame (ORF) of at least 300 bp long with ORF N50 of 1,416 bp. The mean lengths of 5' and 3' untranslated regions in 73,795 sequences with complete ORFs are 1,249 and 1,187 bp, respectively. 4,774 transcripts are putatively novel full-length transcripts which do not match with a previous Iso-Seq study of sugarcane. We annotated the functions of 68,962 putative full-length transcripts with at least 90% coverage when compared with homologous protein coding sequences in other plants.

Discussion: The new catalog of transcripts will be useful for genome annotation, identification of splicing variants, SNP identification, and other research pertaining to the sugarcane improvement program. The putatively novel transcripts suggest unique features of KK3, although more data from different tissues and stages of development are needed to establish a reference transcriptome of this cultivar.

Keywords: Full-length transcripts; Iso-Seq; KK3; Khon Kaen 3; PacBio sequencing; Single-molecule long-read sequencing; Sugarcane; Transcriptome.

PubMed Disclaimer

Conflict of interest statement

Warodom Wirojsirasak, Prapat Punpee and Peeraya Klomsa-ard are employed by Mitr Phol Sugarcane Research Center Co., Ltd.

Figures

Figure 1
Figure 1. Distribution of hit plant species from BLAST search of PacBio-isoforms.
Pie chart shows the fraction of hit plant species based on the best hit obtained from BLASTX search of PacBio transcripts against Phytozome plant proteins.
Figure 2
Figure 2. Comparison of BLAST hits from different sequence databases.
Venn diagram shows overlaps of BLAST analysis results among compared databases, namely sugarcane nucleotide, Phytozome plant protein, and NCBI nr protein databases.
Figure 3
Figure 3. Length distribution of predicted ORFs of PacBio-isoforms.
Frequency distribution graphs are displayed for (A) ORF length of complete ORFs (green) and partial ORFs (blue) and (B) UTR length calculated from the complete ORFs separated into 5′ UTR (green) and 3′ UTR (blue).
Figure 4
Figure 4. COG classification of sugarcane PacBio transcripts in comparison to sorghum transcripts.
The frequency distributions of transcripts assigned to each functional class of KOG database were displayed for sugarcane PacBio transcripts (black) and sorghum transcripts available from Phytozome database (blue).
Figure 5
Figure 5. Length comparison of PacBio transcripts and their matched sequences.
The graph shows the frequency distribution for the ratio of the length of PacBio transcript to its matched sequences from (A) sugarcane and (B) Phytozome plant transcript databases. The percentages of coverage on hit plant CDS sequence are shown in (C).
Figure 6
Figure 6. Length distribution of PacBio transcripts.
Distributions of sequence length are displayed for sugarcane PacBio transcripts generated in the present study (green) and in the Hoang et al. (2017) study (blue).
Figure 7
Figure 7. Alternative splicing patterns of PacBio transcripts.
SpliceGrapher diagrams illustrate the splicing patterns of transcripts compared among the sorghum transcript annotation, the matched PacBio transcripts from the Hoang et al. (2017) study, the matched PacBio transcripts generated in the present study, and the matched PacBio transcripts combined from both studies. (A) Peptidase S24/S26A/S26B/S26C family protein (Sobic.002G223200) and (B) sucrose-phosphatase (Sobic.004G151800). Each color represents type of splicing event according to the data label.

Similar articles

Cited by

References

    1. Abdel-Ghany SE, Hamilton M, Jacobi JL, Ngam P, Devitt N, Schilkey F, Ben-Hur A, Reddy AS. A survey of the sorghum transcriptome using single-molecule long reads. Nature Communications. 2016;7 doi: 10.1038/ncomms11706. Article 11706. - DOI - PMC - PubMed
    1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. Journal of Molecular Biology. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. - DOI - PubMed
    1. Amaral PP, Clark MB, Gascoigne DK, Dinger ME, Mattick JS. lncRNAdb: a reference database for long noncoding RNAs. Nucleic Acids Research. 2011;39:D146–D151. doi: 10.1093/nar/gkq1138. - DOI - PMC - PubMed
    1. Au KF, Sebastiano V, Afshar PT, Durruthy JD, Lee L, Williams BA, Van Bakel H, Schadt EE, Reijo-Pera RA, Underwood JG, Wong WH. Characterization of the human ESC transcriptome by hybrid sequencing. Proceedings of the National Academy of Sciences of the United States of America. 2013;110:E4821–E4830. doi: 10.1073/pnas.1320101110. - DOI - PMC - PubMed
    1. Boguski MS, Lowe TM, Tolstoshev CM. dbEST—database for “expressed sequence tags”. Nature Genetics. 1993;4:332–333. doi: 10.1038/ng0893-332. - DOI - PubMed

LinkOut - more resources