Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Nov 1;6(11):1-13.
doi: 10.1093/gigascience/gix086.

Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts

Affiliations

Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts

Bing Cheng et al. Gigascience. .

Abstract

Polyploidization contributes to the complexity of gene expression, resulting in numerous related but different transcripts. This study explored the transcriptome diversity and complexity of the tetraploid Arabica coffee (Coffea arabica) bean. Long-read sequencing (LRS) by Pacbio Isoform sequencing (Iso-seq) was used to obtain full-length transcripts without the difficulty and uncertainty of assembly required for reads from short-read technologies. The tetraploid transcriptome was annotated and compared with data from the sub-genome progenitors. Caffeine and sucrose genes were targeted for case analysis. An isoform-level tetraploid coffee bean reference transcriptome with 95 995 distinct transcripts (average 3236 bp) was obtained. A total of 88 715 sequences (92.42%) were annotated with BLASTx against NCBI non-redundant plant proteins, including 34 719 high-quality annotations. Further BLASTn analysis against NCBI non-redundant nucleotide sequences, Coffea canephora coding sequences with UTR, C. arabica ESTs, and Rfam resulted in 1213 sequences without hits, were potential novel genes in coffee. Longer UTRs were captured, especially in the 5΄UTRs, facilitating the identification of upstream open reading frames. The LRS also revealed more and longer transcript variants in key caffeine and sucrose metabolism genes from this polyploid genome. Long sequences (>10 kilo base) were poorly annotated. LRS technology shows the limitation of previous studies. It provides an important tool to produce a reference transcriptome including more of the diversity of full-length transcripts to help understand the biology and support the genetic improvement of polyploid species such as coffee.

Keywords: UTR; coffee; full-length cDNA; isoform; long sequences; polyploid; transcriptome.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Coffee fruits of immature, intermediate, and mature stages.
Figure 2:
Figure 2:
Putative transcript variants from long-read sequencing aligned to reference caffeine genes. (a) Main caffeine biosynthesis pathway in coffee, adaptive from Cheng, Furtado [31]. (b) Alignment of three Arabica putative XMT1 variants from long-read sequencing (c69597/f1p2/1412, c154338/f1p2/1360 and c71416/f3p3/1376), Coffea arabica and Coffea canephora XMT1 (CaXMT1 and CcXMT1) to Arabica XMT1 genomic DNA sequence (G-CaXMT1). (c) Possible alternative polyadenylation of putative XMT1 Iso-seq variant (c25904/f2p0977) from long-read sequencing; G-CaDXMT1, Arabica DXMT1 genomic DNA sequence; CaDXMT1, DXMT1 coding sequence. (d) Two polyadenylation signals were identified in 3΄ends of c25904/f2p0/977. (e) Possible alternative splicing (intron retention) in one of the putative DXMT2 variants (c48759/f1p1/1517) from long-read sequencing transcripts; G-CaDXMT2, Arabica DXMT2 genomic DNA sequence; CaDXMT2, Arabica DXMT2 coding sequence. (Note: black colour in the alignment means different nucleotides to reference sequence, Arabica genomic XMT1, while grey colour means the same nucleotides as the reference.).
Figure 3:
Figure 3:
Motif search results of putative sucrose synthase gene 1 from long read sequencing. (a) Ten motifs were annotated in 9 putative sucrose synthase 1 variants from long-read sequencing, analysed by MEME 4.11.2. (b) Motif location of 9 putative sucrose synthase 1 variants. Different motifs were highlighted with red arrows and intron retention was shown with dashed boxes.
Figure 4:
Figure 4:
Putative variants from long-read sequencing aligned to the reference sucrose genes. (a) Possible sucrose metabolism in coffee; SS, sucrose synthase; SPS, sucrose phosphate synthase; SP, sucrose phosphatase; INV, invertase; CINV, cell wall invertase (modified from Cheng B. et al. (2016)). (b) Alignment of nine Putative Sucrose synthase variants from long-read sequencing and C. arabica sucrose synthase gene 1 (CaSS1) to Coffea canephora genomic sucrose synthase 1 (exons 1–13) (G-CcSS1 (1–13)); Green boxes highlights variants result from different sub-genome copies, while intron retention events were marked with the blue boxes. (c) Polyploid expression when zooming green area in 100%. (d) Possible alternative splicing (intron retention) from a C. canephora sub-genome copy when zooming blue box in 100%. (e) Possible intron retention from a C. eugenioides sub-genome copy when zooming blue area in 100%. red line classifies two groups of variants as different sub-genome copies. Different nucleotides compared to the consensus were highlighted in black in the alignment. (f) Putative variants from long read sequencing aligned with C. canephora genomic sucrose phosphate synthase. 2 sequence (G-CcSPS2); FWD, forward sequence; REV, reverse sequence. Different nucleotides compared to the consensus were highlighted in black in the alignment.
Figure 5:
Figure 5:
The distribution of the number of coffee long read sequencing sequences (coffee LRS-sequences), C. canephora coding sequences with UTR, C. arabica contigs with length. The horizontal axis is formatted in logarithmic scale.

References

    1. Yoo M, Liu X, Pires JC et al. Nonadditive gene expression in polyploids. Annu Rev Genet 2014;48(1):485–517. - PubMed
    1. Adams KL, Cronn R, Percifield R et al. Genes duplicated by polyploidy show unequal contributions to the transcriptome and organ-specific reciprocal silencing. Proc Natl Acad Sci USA 2003;100(8):4649–54. - PMC - PubMed
    1. Levasseur A, Pontarotti P. The role of duplications in the evolution of genomes highlights the need for evolutionary-based approaches in comparative genomics. Biol Direct 2011;6(1):11. - PMC - PubMed
    1. Wang B, Tseng E, Regulski M et al. Unveiling the complexity of the maize transcriptome by single-molecule long-read sequencing. Nat Commun 2016;7:11708. - PMC - PubMed
    1. Abdel-Ghany SE, Hamilton M, Jacobi JL et al. A survey of the sorghum transcriptome using single-molecule long reads. Nat Commun 2016;7:11706. - PMC - PubMed

Publication types

LinkOut - more resources