Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Aug 20;48(14):7700-7711.
doi: 10.1093/nar/gkaa588.

New insights into Arabidopsis transcriptome complexity revealed by direct sequencing of native RNAs

Affiliations

New insights into Arabidopsis transcriptome complexity revealed by direct sequencing of native RNAs

Shoudong Zhang et al. Nucleic Acids Res. .

Abstract

Arabidopsis thaliana transcriptomes have been extensively studied and characterized under different conditions. However, most of the current 'RNA-sequencing' technologies produce a relatively short read length and demand a reverse-transcription step, preventing effective characterization of transcriptome complexity. Here, we performed Direct RNA Sequencing (DRS) using the latest Oxford Nanopore Technology (ONT) with exceptional read length. We demonstrate that the complexity of the A. thaliana transcriptomes has been substantially under-estimated. The ONT direct RNA sequencing identified novel transcript isoforms at both the vegetative (14-day old seedlings, stage 1.04) and reproductive stages (stage 6.00-6.10) of development. Using in-house software called TrackCluster, we determined alternative transcription initiation (ATI), alternative polyadenylation (APA), alternative splicing (AS), and fusion transcripts. More than 38 500 novel transcript isoforms were identified, including six categories of fusion-transcripts that may result from differential RNA processing mechanisms. Aided by the Tombo algorithm, we found an enrichment of m5C modifications in the mobile mRNAs, consistent with a recent finding that m5C modification in mRNAs is crucial for their long-distance movement. In summary, ONT DRS offers an advantage in the identification and functional characterization of novel RNA isoforms and RNA base modifications, significantly improving annotation of the A. thaliana genome.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Statistical analysis of Nanopore reads and coverage as well as replicate reproducibility. (A) Read length distribution. Grey curve: the length distributions of Nanopore read; yellow curve: reference according to the existing longest isoforms of the gene were counted; blue curve: the length distributions of Nanopore reads from two of the seedling samples with 5′ adaptor from the publication of Parker et al. (4). (B) The proportion of the existing isoforms or genes in the Araport11 reference covered by all our long reads (right) or by the same amount of NGS reads (left). Red: not discovered genes or isoforms; green: genes or isoforms discovered by Nanopore read, but with <5 reads support; blue: discovered genes or isoforms with >5 reads support. (C) Correlation between two replicates of sequenced libraries. Shown are the biological replicates from floral buds (left) and seedlings (right).
Figure 2.
Figure 2.
Correction of existing misannotated transcripts of At4g17140 by Nanopore long reads. (A) Diagram showing the difference between three reference isoforms (1–3, black) and our novel transcripts/isoforms (A and B, gray). Six different regions (DRs) are highlighted with box, and the primers used to amplify this region are shown near the DRs. DR 1–3 are the difference between reference isoforms and both the novel isoform A and B; while the DR 3–6 are the difference between isoform B and isoform A. (B–D). A magnified view of DR 1–3 shown in ‘A’. Shown on the top are the primers used for PCR validation and the mapping result of Sanger sequencing. (E). RT-PCR results for the confirmation of DR 1–3. Only one major band for each PCR can be found, and the isoforms supported by the Sanger sequencing results for each PCR are shown under the band. (F). A magnified view of DR 4–6 shown in ‘A’. (G). RT-PCR results for the confirmation of difference 4–6. The weaker band for the expected product of isoform B are indicated by red arrows. B, floral buds; S, seedlings; maker, the DL2000 DNA ladder. DR, different region.
Figure 3.
Figure 3.
TrackCluster identified transcript isoforms and the distributions of read count and length for each category of isoforms. (A) Schematics of each category of novel isoform identified with Nanopore reads and their distribution. (B) Abundance of Nanopore reads mapped to various categories of isoform. (C) Distribution of read length for various categories of isoform in the study.
Figure 4.
Figure 4.
Unexpected fusion transcripts identified with Nanopore reads are confirmed with RT-PCR. (A) Multiple fusion transcripts derived from two discrete existing loci; (B) a single exon transcript covering four gene locus; (C) a transcript covering two distal multi-exon genes together with seven single exon genes; (D) fusion transcripts covering two overlapped genes with the same splicing patterns, and there is no splicing in the overlapped regions of the two covered genes; (E) antisense transcripts covering two adjacent genes; (F) transcripts with different splicing pattern covering parts of two proximal gene loci. Existing exon and intron are depicted as red thin and thin bar respectively, exon and intron derived from Nanopore reads are depicted as blue thin line and thin bar respectively. The position of forward (F) and reverse (R) position are shown in the diagram. The RT-PCR confirmation results of the fusion transcripts are shown in the right of each panel. B, floral buds; S, seedlings; Maker, the 1 kb DNA ruler.
Figure 5.
Figure 5.
Non-annotated DNA sequence confirmed in the second intron of At2g40980. (A) The IGV track view of reads mapping to gene At2g40980. The unannotated sequences fall in the second intron as shown along the red line. (B) PCR confirmation with the primers flanking the unannotated regions, the red line indicates the expected size according to TAIR10 genome. (C) The IGV track view of reads with the corrected Arabidopsis genome. Note: The expected size in TAIR10 is 176 bp, while with the non-annotated DNA fragment, the expected size should be 278 bp.
Figure 6.
Figure 6.
DEG and DEI between seedlings and floral buds. (A) Venn diagram showing the intersection of DEI and DEG, (B) counts of the up- and down- regulated DEG and DEI in seedling compared with floral bud, (C) an example of IGV track showing floral bud-specific ATI in the gene At5g57300, (D) experimental confirmed floral bud specific ATI events. The existing and novel isoforms are depicted in black (1–4) and green (a–c) respectively. A previously undefined ATI that is bud-specific is highlighted by a box. The primers used to validate these isoforms are labelled in the diagram. F, forward primer; R1, reverse primer 1; R2, reverse primer 2.
Figure 7.
Figure 7.
Comparison of m5C modifications between mobile mRNAs, total mRNAs and tRNA-like structure (TLS) contained mRNAs. (A) Comparison of m5C modification among mobile mRNAs, total mRNAs and mRNAs containing TLS in floral buds. (B) Comparison of m5C modification among mobile mRNAs, total mRNAs and mRNAs containing TLS in seedlings. Red: total mRNAs, light yellow: mobile mRNAs, green: TLS-containing (in gene body) mRNAs, light blue line: TLS (in 3′UTR) -containing mRNAs, purple line: TLS (in 5′UTR) -containing mRNAs.

References

    1. Kukurba K.R., Montgomery S.B.. RNA sequencing and analysis. Cold Spring Harb. Protoc. 2015; 2015:951–969. - PMC - PubMed
    1. Garalde D.R., Snell E.A., Jachimowicz D., Sipos B., Lloyd J.H., Bruce M., Pantic N., Admassu T., James P., Warland A. et al.. Highly parallel direct RNA sequencing on an array of nanopores. Nat. Methods. 2018; 15:201–206. - PubMed
    1. Goodwin S., McPherson J.D., McCombie W.R.. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 2016; 17:333–351. - PMC - PubMed
    1. Parker M.T., Knop K., Sherwood A.V., Schurch N.J., Mackinnon K., Gould P.D., Hall A.J.W., Barton G.J., Simpson G.G.. Nanopore direct RNA sequencing maps the complexity of Arabidopsis mRNA processing and m(6)A modification. Elife. 2020; 9:e49658. - PMC - PubMed
    1. Mullen M.A., Olson K.J., Dallaire P., Major F., Assmann S.M., Bevilacqua P.C.. RNA G-Quadruplexes in the model plant species Arabidopsis thaliana: prevalence and possible functional roles. Nucleic Acids Res. 2010; 38:8149–8163. - PMC - PubMed

Publication types