Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul 23:10:654.
doi: 10.3389/fgene.2019.00654. eCollection 2019.

The Impact of cDNA Normalization on Long-Read Sequencing of a Complex Transcriptome

Affiliations

The Impact of cDNA Normalization on Long-Read Sequencing of a Complex Transcriptome

Nam V Hoang et al. Front Genet. .

Abstract

Normalization of cDNA is widely used to improve the coverage of rare transcripts in analysis of transcriptomes employing next-generation sequencing. Recently, long-read technology has been emerging as a powerful tool for sequencing and construction of transcriptomes, especially for complex genomes containing highly similar transcripts and transcript-spliced isoforms. Here, we analyzed the transcriptome of sugarcane, a highly polyploidy plant genome, by PacBio isoform sequencing (Iso-Seq) of two different cDNA library preparations, with and without a normalization step. The results demonstrated that, while the two libraries included many of the same transcripts, many longer transcripts were removed, and many new generally shorter transcripts were detected by normalization. For the same input cDNA and data yield, the normalized library recovered more total transcript isoforms and number of predicted gene families and orthologous groups, resulting in a higher representation for the sugarcane transcriptome, compared to the non-normalized library. The non-normalized library, on the other hand, included a wider transcript length range with more longer transcripts above ∼1.25 kb and more transcript isoforms per gene family and gene ontology terms per transcript. A large proportion of the unique transcripts comprising ∼52% of the normalized library were expressed at a lower level than the unique transcripts from the non-normalized library, across three tissue types tested including leaf, stalk, and root. About 83% of the total 5,348 predicted long noncoding transcripts was derived from the normalized library, of which ∼80% was derived from the lowly expressed fraction. Functional annotation of the unique transcripts suggested that each library enriched different functional transcript fractions. This demonstrated the complementation of the two approaches in obtaining a complete transcriptome of a complex genome at the sequencing depth used in this study.

Keywords: isoform sequencing; normalization impact; polyploid transcriptome; sugarcane transcriptome; transcript enrichment; transcriptome normalization.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A summary of the analysis workflow used in this study. The RNA sample pooling, cDNA synthesis, normalization, size fractionation, and sequencing data processing were previously reported in Hoang et al. (2017a) in which the data from the two libraries were combined and analyzed. The original gel images of the sugarcane non-normalized and normalized cDNA libraries resolved on 1.2% aragose were adapted from Hoang et al. (2017a). NN denotes the sugarcane non-normalized PacBio Iso-Seq isoforms while NO denotes the sugarcane normalized PacBio Iso-Seq isoforms.
Figure 2
Figure 2
Summary statistics of data and comparison between datasets. (A) Length distribution of distribution of combined data from two bins 0.2–2.5 kb and 2–3.5 kb from each library. For visualization, only transcripts ≤4 kb were used. (B) Two directional comparison between sequences from the two libraries by CD-HIT-EST-2D. The upper number in the intersection of the Venn diagrams represents the transcripts from the non-normalized dataset, while the lower number is from the normalized dataset. NN, sugarcane non-normalized PacBio Iso-Seq isoforms; NO, sugarcane normalized PacBio Iso-Seq isoforms.
Figure 3
Figure 3
Transcriptome quality assessment via BUSCO, OrthoMCL, and Cogent packages. (A) BUSCO completeness assessment of two datasets, combined data and three reference transcriptome databases, SoGI, unigenes, and SUGIT. In the bar charts, C, S, D, F, and M denote complete, single, duplicate, fragmented, and missing BUSCOs. (B) Venn diagram showing BUSCOs recovered in each of datasets. (C) Number isoforms per gene family identified by Cogen pipeline. (D) Length distribution of extracted ORF sequences from the two datasets. (E) Venn diagram showing a comparison of orthologous groups between two datasets. (F) Long noncoding transcripts identified in the NO dataset (NO_lnc) compared against the unique fraction of transcripts from the NO dataset (NO_uni). NN, sugarcane non-normalized PacBio Iso-Seq isoforms; NO, sugarcane normalized PacBio Iso-Seq isoforms; SoGI, Saccharum officinarum gene indices; SUGIT, sugarcane Iso-Seq transcriptome; aa denotes amino acid.
Figure 4
Figure 4
Expression analysis. (A) Percentage of expressed transcripts from the NN and NO datasets detected in three different tissues: leaf, stalk, and root. (B) Comparison of expressed transcripts in each tissues expressed in the NN dataset. (C) Comparison of expressed transcripts in each tissues expressed in the NO dataset. (D) Mean expression level across all three tissues of the NN and NO datasets. The expression level (RPKM) was log10 transformed for visualization purpose. (E) Length distribution of the two fractions of unique transcripts in the NN and NO datasets. (F) Percentage of expressed unique transcripts from the NN and NO datasets detected in three different tissues: leaf, stalk and root. (G) Comparison of expression level between unique transcripts from the NN and NO datasets, across three tissues: leaf, stalk and root. The expression level (RPKM) was log10 transformed for visualization purpose. NN, sugarcane non-normalized PacBio Iso-Seq isoforms; NO, sugarcane normalized PacBio Iso-Seq isoforms.
Figure 5
Figure 5
Functional annotation of transcript isoforms from the two datasets. (A) Taxonomic distribution of BLATX hits of the two datasets. (B) GO terms per transcripts. (C) Significantly different GO terms between the two datasets. (D) Significantly different GO terms with highest log10(p value) identified from the two datasets. (E) Unique bins annotated using Arabidopsis genes that matched the unique fractions of the NN and NO datasets. (F) MapMan functional bins identified from the two datasets, red heatmap represents transcripts from the NN dataset, while blue represents those from the NO dataset. Two bins (15 and 18) were added to the bottom of the panel listing all matched genes from two datasets. Each heatmap represents one annotated transcript. NN, sugarcane non-normalized PacBio Iso-Seq isoforms; NO, sugarcane normalized PacBio Iso-Seq isoforms.
Figure 6
Figure 6
Transcript isoform distribution on the sugarcane genome (A) and the sorghum genome (B) using Circos. Outer graphs present mappable transcripts from the NO dataset, while the inner graphs present mappable transcripts from the NN dataset on the genomes. NN, sugarcane non-normalized PacBio Iso-Seq isoforms; NO, sugarcane normalized PacBio Iso-Seq isoforms.

Similar articles

Cited by

References

    1. Abdel-Ghany S. E., Hamilton M., Jacobi J. L., Ngam P., Devitt N., Schilkey F., et al. (2016). A survey of the sorghum transcriptome using single-molecule long reads. Nat. Commun. 7, 11706. 10.1038/ncomms11706 - DOI - PMC - PubMed
    1. Alberts B., Bray D., Lewis J., Raff M., Roberts K., Watson J. (1994). Molecular biology of the cell. 3rd Oxford: Garland Science.
    1. Anisimova V. E., Rebrikov D. V., Zhulidov P. A., Staroverov D. B., Lukyanov S. A., Shcheglov A. S. (2006). Renaturation, activation, and practical use of recombinant duplex-specific nuclease from Kamchatka crab. Biochemistry (Mosc) 71 (5), 513–519. 10.1134/S0006297906050075 - DOI - PubMed
    1. Anvar S. Y., Allard G., Tseng E., Sheynkman G. M., de Klerk E., Vermaat M., et al. (2018). Full-length mRNA sequencing uncovers a widespread coupling between transcription initiation and mRNA processing. Genome Biol. 19 (1), 46. 10.1186/s13059-018-1418-0 - DOI - PMC - PubMed
    1. Bogdanova E. A., Shagin D. A., Lukyanov S. A. (2008). Normalization of full-length enriched cDNA. Mol. Biosyst. 4 (3), 205–212. 10.1039/b715110c - DOI - PubMed

LinkOut - more resources