. 2018 Oct 30:6:e5818.

doi: 10.7717/peerj.5818. eCollection 2018.

Uncovering full-length transcript isoforms of sugarcane cultivar Khon Kaen 3 using single-molecule long-read sequencing

Affiliations

¹ National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand.
² Mitr Phol Sugarcane Research Center Co., Ltd., Chaiyaphum, Thailand.

PMID: 30397543
PMCID: PMC6214230
DOI: 10.7717/peerj.5818

Uncovering full-length transcript isoforms of sugarcane cultivar Khon Kaen 3 using single-molecule long-read sequencing

Jittima Piriyapongsa et al. PeerJ. 2018.

. 2018 Oct 30:6:e5818.

doi: 10.7717/peerj.5818. eCollection 2018.

Authors

Affiliations

¹ National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand.
² Mitr Phol Sugarcane Research Center Co., Ltd., Chaiyaphum, Thailand.

PMID: 30397543
PMCID: PMC6214230
DOI: 10.7717/peerj.5818

Abstract

Background: Sugarcane is an important global food crop and energy resource. To facilitate the sugarcane improvement program, genome and gene information are important for studying traits at the molecular level. Most currently available transcriptome data for sugarcane were generated using second-generation sequencing platforms, which provide short reads. The de novo assembled transcripts from these data are limited in length, and hence may be incomplete and inaccurate, especially for long RNAs.

Methods: We generated a transcriptome dataset of leaf tissue from a commercial Thai sugarcane cultivar Khon Kaen 3 (KK3) using PacBio RS II single-molecule long-read sequencing by the Iso-Seq method. Short-read RNA-Seq data were generated from the same RNA sample using the Ion Proton platform for reducing base calling errors.

Results: A total of 119,339 error-corrected transcripts were generated with the N50 length of 3,611 bp, which is on average longer than any previously reported sugarcane transcriptome dataset. 110,253 sequences (92.4%) contain an open reading frame (ORF) of at least 300 bp long with ORF N50 of 1,416 bp. The mean lengths of 5' and 3' untranslated regions in 73,795 sequences with complete ORFs are 1,249 and 1,187 bp, respectively. 4,774 transcripts are putatively novel full-length transcripts which do not match with a previous Iso-Seq study of sugarcane. We annotated the functions of 68,962 putative full-length transcripts with at least 90% coverage when compared with homologous protein coding sequences in other plants.

Discussion: The new catalog of transcripts will be useful for genome annotation, identification of splicing variants, SNP identification, and other research pertaining to the sugarcane improvement program. The putatively novel transcripts suggest unique features of KK3, although more data from different tissues and stages of development are needed to establish a reference transcriptome of this cultivar.

Keywords: Full-length transcripts; Iso-Seq; KK3; Khon Kaen 3; PacBio sequencing; Single-molecule long-read sequencing; Sugarcane; Transcriptome.

PubMed Disclaimer

Conflict of interest statement

Warodom Wirojsirasak, Prapat Punpee and Peeraya Klomsa-ard are employed by Mitr Phol Sugarcane Research Center Co., Ltd.

Figures

**Figure 1. Distribution of hit plant species from BLAST search of PacBio-isoforms.**
Pie chart shows the fraction of hit plant species based on the best hit obtained from BLASTX search of PacBio transcripts against Phytozome plant proteins.

**Figure 2. Comparison of BLAST hits from different sequence databases.**
Venn diagram shows overlaps of BLAST analysis results among compared databases, namely sugarcane nucleotide, Phytozome plant protein, and NCBI nr protein databases.

**Figure 3. Length distribution of predicted ORFs of PacBio-isoforms.**
Frequency distribution graphs are displayed for (A) ORF length of complete ORFs (green) and partial ORFs (blue) and (B) UTR length calculated from the complete ORFs separated into 5′ UTR (green) and 3′ UTR (blue).

**Figure 4. COG classification of sugarcane PacBio transcripts in comparison to sorghum transcripts.**
The frequency distributions of transcripts assigned to each functional class of KOG database were displayed for sugarcane PacBio transcripts (black) and sorghum transcripts available from Phytozome database (blue).

**Figure 5. Length comparison of PacBio transcripts and their matched sequences.**
The graph shows the frequency distribution for the ratio of the length of PacBio transcript to its matched sequences from (A) sugarcane and (B) Phytozome plant transcript databases. The percentages of coverage on hit plant CDS sequence are shown in (C).

**Figure 6. Length distribution of PacBio transcripts.**
Distributions of sequence length are displayed for sugarcane PacBio transcripts generated in the present study (green) and in the Hoang et al. (2017) study (blue).

**Figure 7. Alternative splicing patterns of PacBio transcripts.**
SpliceGrapher diagrams illustrate the splicing patterns of transcripts compared among the sorghum transcript annotation, the matched PacBio transcripts from the Hoang et al. (2017) study, the matched PacBio transcripts generated in the present study, and the matched PacBio transcripts combined from both studies. (A) Peptidase S24/S26A/S26B/S26C family protein (Sobic.002G223200) and (B) sucrose-phosphatase (Sobic.004G151800). Each color represents type of splicing event according to the data label.

See this image and copyright information in PMC

Cited by

Taxonomically Restricted Genes Are Associated With Responses to Biotic and Abiotic Stresses in Sugarcane (Saccharum spp.).
Cardoso-Silva CB, Aono AH, Mancini MC, Sforça DA, da Silva CC, Pinto LR, Adams KL, de Souza AP. Cardoso-Silva CB, et al. Front Plant Sci. 2022 Jun 30;13:923069. doi: 10.3389/fpls.2022.923069. eCollection 2022. Front Plant Sci. 2022. PMID: 35845637 Free PMC article.
Characterization of full-length transcriptome in Saccharum officinarum and molecular insights into tiller development.
Yan H, Zhou H, Luo H, Fan Y, Zhou Z, Chen R, Luo T, Li X, Liu X, Li Y, Qiu L, Wu J. Yan H, et al. BMC Plant Biol. 2021 May 22;21(1):228. doi: 10.1186/s12870-021-02989-5. BMC Plant Biol. 2021. PMID: 34022806 Free PMC article.
A high-resolution single-molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis.
Zhang R, Kuo R, Coulter M, Calixto CPG, Entizne JC, Guo W, Marquez Y, Milne L, Riegler S, Matsui A, Tanaka M, Harvey S, Gao Y, Wießner-Kroh T, Paniagua A, Crespi M, Denby K, Hur AB, Huq E, Jantsch M, Jarmolowski A, Koester T, Laubinger S, Li QQ, Gu L, Seki M, Staiger D, Sunkar R, Szweykowska-Kulinska Z, Tu SL, Wachter A, Waugh R, Xiong L, Zhang XN, Conesa A, Reddy ASN, Barta A, Kalyna M, Brown JWS. Zhang R, et al. Genome Biol. 2022 Jul 7;23(1):149. doi: 10.1186/s13059-022-02711-0. Genome Biol. 2022. PMID: 35799267 Free PMC article.
Amino Acid and Carbohydrate Metabolism Are Coordinated to Maintain Energetic Balance during Drought in Sugarcane.
Diniz AL, da Silva DIR, Lembke CG, Costa MDL, Ten-Caten F, Li F, Vilela RD, Menossi M, Ware D, Endres L, Souza GM. Diniz AL, et al. Int J Mol Sci. 2020 Nov 30;21(23):9124. doi: 10.3390/ijms21239124. Int J Mol Sci. 2020. PMID: 33266228 Free PMC article.
A SNP variation in the Sucrose synthase (SoSUS) gene associated with sugar-related traits in sugarcane.
Khanbo S, Somyong S, Phetchawang P, Wirojsirasak W, Ukoskit K, Klomsa-Ard P, Pootakham W, Tangphatsornruang S. Khanbo S, et al. PeerJ. 2023 Dec 15;11:e16667. doi: 10.7717/peerj.16667. eCollection 2023. PeerJ. 2023. PMID: 38111652 Free PMC article.

See all "Cited by" articles

References

1. Abdel-Ghany SE, Hamilton M, Jacobi JL, Ngam P, Devitt N, Schilkey F, Ben-Hur A, Reddy AS. A survey of the sorghum transcriptome using single-molecule long reads. Nature Communications. 2016;7 doi: 10.1038/ncomms11706. Article 11706. - DOI - PMC - PubMed
1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. Journal of Molecular Biology. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. - DOI - PubMed
1. Amaral PP, Clark MB, Gascoigne DK, Dinger ME, Mattick JS. lncRNAdb: a reference database for long noncoding RNAs. Nucleic Acids Research. 2011;39:D146–D151. doi: 10.1093/nar/gkq1138. - DOI - PMC - PubMed
1. Au KF, Sebastiano V, Afshar PT, Durruthy JD, Lee L, Williams BA, Van Bakel H, Schadt EE, Reijo-Pera RA, Underwood JG, Wong WH. Characterization of the human ESC transcriptome by hybrid sequencing. Proceedings of the National Academy of Sciences of the United States of America. 2013;110:E4821–E4830. doi: 10.1073/pnas.1320101110. - DOI - PMC - PubMed
1. Boguski MS, Lowe TM, Tolstoshev CM. dbEST—database for “expressed sequence tags”. Nature Genetics. 1993;4:332–333. doi: 10.1038/ng0893-332. - DOI - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Uncovering full-length transcript isoforms of sugarcane cultivar Khon Kaen 3 using single-molecule long-read sequencing

Affiliations

Uncovering full-length transcript isoforms of sugarcane cultivar Khon Kaen 3 using single-molecule long-read sequencing

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Related information

LinkOut - more resources

Full Text Sources