. 2019 Jul 23:10:654.

doi: 10.3389/fgene.2019.00654. eCollection 2019.

The Impact of cDNA Normalization on Long-Read Sequencing of a Complex Transcriptome

Nam V Hoang¹, Agnelo Furtado², Virginie Perlo², Frederik C Botha^{2

3}, Robert J Henry²

Affiliations

¹ College of Agriculture and Forestry, Hue University, Hue, Vietnam.
² Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St. Lucia, QLD, Australia.
³ Sugar Research Australia, Indooroopilly, QLD, Australia.

PMID: 31396260
PMCID: PMC6664245
DOI: 10.3389/fgene.2019.00654

The Impact of cDNA Normalization on Long-Read Sequencing of a Complex Transcriptome

Nam V Hoang et al. Front Genet. 2019.

. 2019 Jul 23:10:654.

doi: 10.3389/fgene.2019.00654. eCollection 2019.

Authors

Nam V Hoang¹, Agnelo Furtado², Virginie Perlo², Frederik C Botha^{2

3}, Robert J Henry²

Affiliations

¹ College of Agriculture and Forestry, Hue University, Hue, Vietnam.
² Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St. Lucia, QLD, Australia.
³ Sugar Research Australia, Indooroopilly, QLD, Australia.

PMID: 31396260
PMCID: PMC6664245
DOI: 10.3389/fgene.2019.00654

Abstract

Normalization of cDNA is widely used to improve the coverage of rare transcripts in analysis of transcriptomes employing next-generation sequencing. Recently, long-read technology has been emerging as a powerful tool for sequencing and construction of transcriptomes, especially for complex genomes containing highly similar transcripts and transcript-spliced isoforms. Here, we analyzed the transcriptome of sugarcane, a highly polyploidy plant genome, by PacBio isoform sequencing (Iso-Seq) of two different cDNA library preparations, with and without a normalization step. The results demonstrated that, while the two libraries included many of the same transcripts, many longer transcripts were removed, and many new generally shorter transcripts were detected by normalization. For the same input cDNA and data yield, the normalized library recovered more total transcript isoforms and number of predicted gene families and orthologous groups, resulting in a higher representation for the sugarcane transcriptome, compared to the non-normalized library. The non-normalized library, on the other hand, included a wider transcript length range with more longer transcripts above ∼1.25 kb and more transcript isoforms per gene family and gene ontology terms per transcript. A large proportion of the unique transcripts comprising ∼52% of the normalized library were expressed at a lower level than the unique transcripts from the non-normalized library, across three tissue types tested including leaf, stalk, and root. About 83% of the total 5,348 predicted long noncoding transcripts was derived from the normalized library, of which ∼80% was derived from the lowly expressed fraction. Functional annotation of the unique transcripts suggested that each library enriched different functional transcript fractions. This demonstrated the complementation of the two approaches in obtaining a complete transcriptome of a complex genome at the sequencing depth used in this study.

Keywords: isoform sequencing; normalization impact; polyploid transcriptome; sugarcane transcriptome; transcript enrichment; transcriptome normalization.

PubMed Disclaimer

Figures

**Figure 1**
A summary of the analysis workflow used in this study. The RNA sample pooling, cDNA synthesis, normalization, size fractionation, and sequencing data processing were previously reported in Hoang et al. (2017a) in which the data from the two libraries were combined and analyzed. The original gel images of the sugarcane non-normalized and normalized cDNA libraries resolved on 1.2% aragose were adapted from Hoang et al. (2017a). NN denotes the sugarcane non-normalized PacBio Iso-Seq isoforms while NO denotes the sugarcane normalized PacBio Iso-Seq isoforms.

**Figure 2**
Summary statistics of data and comparison between datasets. **(A)** Length distribution of distribution of combined data from two bins 0.2–2.5 kb and 2–3.5 kb from each library. For visualization, only transcripts ≤4 kb were used. **(B)** Two directional comparison between sequences from the two libraries by CD-HIT-EST-2D. The upper number in the intersection of the Venn diagrams represents the transcripts from the non-normalized dataset, while the lower number is from the normalized dataset. NN, sugarcane non-normalized PacBio Iso-Seq isoforms; NO, sugarcane normalized PacBio Iso-Seq isoforms.

**Figure 3**
Transcriptome quality assessment *via* BUSCO, OrthoMCL, and Cogent packages. **(A)** BUSCO completeness assessment of two datasets, combined data and three reference transcriptome databases, SoGI, unigenes, and SUGIT. In the bar charts, C, S, D, F, and M denote complete, single, duplicate, fragmented, and missing BUSCOs. **(B)** Venn diagram showing BUSCOs recovered in each of datasets. **(C)** Number isoforms per gene family identified by Cogen pipeline. **(D)** Length distribution of extracted ORF sequences from the two datasets. **(E)** Venn diagram showing a comparison of orthologous groups between two datasets. **(F)** Long noncoding transcripts identified in the NO dataset (NO_lnc) compared against the unique fraction of transcripts from the NO dataset (NO_uni). NN, sugarcane non-normalized PacBio Iso-Seq isoforms; NO, sugarcane normalized PacBio Iso-Seq isoforms; SoGI, *Saccharum officinarum* gene indices; SUGIT, sugarcane Iso-Seq transcriptome; aa denotes amino acid.

**Figure 4**
Expression analysis. **(A)** Percentage of expressed transcripts from the NN and NO datasets detected in three different tissues: leaf, stalk, and root. **(B)** Comparison of expressed transcripts in each tissues expressed in the NN dataset. **(C)** Comparison of expressed transcripts in each tissues expressed in the NO dataset. **(D)** Mean expression level across all three tissues of the NN and NO datasets. The expression level (RPKM) was log10 transformed for visualization purpose. **(E)** Length distribution of the two fractions of unique transcripts in the NN and NO datasets. **(F)** Percentage of expressed unique transcripts from the NN and NO datasets detected in three different tissues: leaf, stalk and root. **(G)** Comparison of expression level between unique transcripts from the NN and NO datasets, across three tissues: leaf, stalk and root. The expression level (RPKM) was log10 transformed for visualization purpose. NN, sugarcane non-normalized PacBio Iso-Seq isoforms; NO, sugarcane normalized PacBio Iso-Seq isoforms.

**Figure 5**
Functional annotation of transcript isoforms from the two datasets. **(A)** Taxonomic distribution of BLATX hits of the two datasets. **(B)** GO terms per transcripts. **(C)** Significantly different GO terms between the two datasets. **(D)** Significantly different GO terms with highest log₁₀(p value) identified from the two datasets. **(E)** Unique bins annotated using *Arabidopsis* genes that matched the unique fractions of the NN and NO datasets. **(F)** MapMan functional bins identified from the two datasets, red heatmap represents transcripts from the NN dataset, while blue represents those from the NO dataset. Two bins (15 and 18) were added to the bottom of the panel listing all matched genes from two datasets. Each heatmap represents one annotated transcript. NN, sugarcane non-normalized PacBio Iso-Seq isoforms; NO, sugarcane normalized PacBio Iso-Seq isoforms.

**Figure 6**
Transcript isoform distribution on the sugarcane genome **(A)** and the sorghum genome **(B)** using Circos. Outer graphs present mappable transcripts from the NO dataset, while the inner graphs present mappable transcripts from the NN dataset on the genomes. NN, sugarcane non-normalized PacBio Iso-Seq isoforms; NO, sugarcane normalized PacBio Iso-Seq isoforms.

See this image and copyright information in PMC

Cited by

Unraveling the Risk Factors and Etiology of the Canine Oral Mucosal Melanoma: Results of an Epidemiological Questionnaire, Oral Microbiome Analysis and Investigation of Papillomavirus Infection.
de Carvalho JP, Carrilho MC, Dos Anjos DS, Hernandez CD, Sichero L, Dagli MLZ. de Carvalho JP, et al. Cancers (Basel). 2022 Jul 13;14(14):3397. doi: 10.3390/cancers14143397. Cancers (Basel). 2022. PMID: 35884456 Free PMC article.
De novo transcriptome assembly of Dalbergia sissoo Roxb. (Fabaceae) under Botryodiplodia theobromae-induced dieback disease.
Zafar UB, Shahzaib M, Atif RM, Khan SH, Niaz MZ, Shahzad K, Chughtai N, Awan FS, Azhar MT, Rana IA. Zafar UB, et al. Sci Rep. 2023 Nov 22;13(1):20503. doi: 10.1038/s41598-023-45982-8. Sci Rep. 2023. PMID: 37993468 Free PMC article.
An improved repertoire of splicing variants and their potential roles in Arabidopsis photomorphogenic development.
Huang CK, Lin WD, Wu SH. Huang CK, et al. Genome Biol. 2022 Feb 9;23(1):50. doi: 10.1186/s13059-022-02620-2. Genome Biol. 2022. PMID: 35139889 Free PMC article.

References

1. Abdel-Ghany S. E., Hamilton M., Jacobi J. L., Ngam P., Devitt N., Schilkey F., et al. (2016). A survey of the sorghum transcriptome using single-molecule long reads. Nat. Commun. 7, 11706. 10.1038/ncomms11706 - DOI - PMC - PubMed
1. Alberts B., Bray D., Lewis J., Raff M., Roberts K., Watson J. (1994). Molecular biology of the cell. 3rd Oxford: Garland Science.
1. Anisimova V. E., Rebrikov D. V., Zhulidov P. A., Staroverov D. B., Lukyanov S. A., Shcheglov A. S. (2006). Renaturation, activation, and practical use of recombinant duplex-specific nuclease from Kamchatka crab. Biochemistry (Mosc) 71 (5), 513–519. 10.1134/S0006297906050075 - DOI - PubMed
1. Anvar S. Y., Allard G., Tseng E., Sheynkman G. M., de Klerk E., Vermaat M., et al. (2018). Full-length mRNA sequencing uncovers a widespread coupling between transcription initiation and mRNA processing. Genome Biol. 19 (1), 46. 10.1186/s13059-018-1418-0 - DOI - PMC - PubMed
1. Bogdanova E. A., Shagin D. A., Lukyanov S. A. (2008). Normalization of full-length enriched cDNA. Mol. Biosyst. 4 (3), 205–212. 10.1039/b715110c - DOI - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The Impact of cDNA Normalization on Long-Read Sequencing of a Complex Transcriptome

Affiliations

The Impact of cDNA Normalization on Long-Read Sequencing of a Complex Transcriptome

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

Related information

LinkOut - more resources

Full Text Sources