. 2011 Feb 28:12:131.

doi: 10.1186/1471-2164-12-131.

Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds

Cheng-Ying Shi¹, Hua Yang, Chao-Ling Wei, Oliver Yu, Zheng-Zhu Zhang, Chang-Jun Jiang, Jun Sun, Ye-Yun Li, Qi Chen, Tao Xia, Xiao-Chun Wan

Affiliations

Affiliation

¹ Key laboratory of Tea Biochemistry and Biotechnology, Ministry of Education, Ministry of Agriculture, Anhui Agricultural University, Hefei, 230036, PR China.

PMID: 21356090
PMCID: PMC3056800
DOI: 10.1186/1471-2164-12-131

Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds

Cheng-Ying Shi et al. BMC Genomics. 2011.

. 2011 Feb 28:12:131.

doi: 10.1186/1471-2164-12-131.

Authors

Cheng-Ying Shi¹, Hua Yang, Chao-Ling Wei, Oliver Yu, Zheng-Zhu Zhang, Chang-Jun Jiang, Jun Sun, Ye-Yun Li, Qi Chen, Tao Xia, Xiao-Chun Wan

Affiliation

¹ Key laboratory of Tea Biochemistry and Biotechnology, Ministry of Education, Ministry of Agriculture, Anhui Agricultural University, Hefei, 230036, PR China.

PMID: 21356090
PMCID: PMC3056800
DOI: 10.1186/1471-2164-12-131

Abstract

Background: Tea is one of the most popular non-alcoholic beverages worldwide. However, the tea plant, Camellia sinensis, is difficult to culture in vitro, to transform, and has a large genome, rendering little genomic information available. Recent advances in large-scale RNA sequencing (RNA-seq) provide a fast, cost-effective, and reliable approach to generate large expression datasets for functional genomic analysis, which is especially suitable for non-model species with un-sequenced genomes.

Results: Using high-throughput Illumina RNA-seq, the transcriptome from poly (A)+ RNA of C. sinensis was analyzed at an unprecedented depth (2.59 gigabase pairs). Approximate 34.5 million reads were obtained, trimmed, and assembled into 127,094 unigenes, with an average length of 355 bp and an N50 of 506 bp, which consisted of 788 contig clusters and 126,306 singletons. This number of unigenes was 10-fold higher than existing C. sinensis sequences deposited in GenBank (as of August 2010). Sequence similarity analyses against six public databases (Uniprot, NR and COGs at NCBI, Pfam, InterPro and KEGG) found 55,088 unigenes that could be annotated with gene descriptions, conserved protein domains, or gene ontology terms. Some of the unigenes were assigned to putative metabolic pathways. Targeted searches using these annotations identified the majority of genes associated with several primary metabolic pathways and natural product pathways that are important to tea quality, such as flavonoid, theanine and caffeine biosynthesis pathways. Novel candidate genes of these secondary pathways were discovered. Comparisons with four previously prepared cDNA libraries revealed that this transcriptome dataset has both a high degree of consistency with previous EST data and an approximate 20 times increase in coverage. Thirteen unigenes related to theanine and flavonoid synthesis were validated. Their expression patterns in different organs of the tea plant were analyzed by RT-PCR and quantitative real time PCR (qRT-PCR).

Conclusions: An extensive transcriptome dataset has been obtained from the deep sequencing of tea plant. The coverage of the transcriptome is comprehensive enough to discover all known genes of several major metabolic pathways. This transcriptome dataset can serve as an important public information platform for gene expression, genomics, and functional genomic studies in C. sinensis.

PubMed Disclaimer

Figures

**Figure 1**
**Overview of the *C. sinensis* transcriptome assembly**. (a) Size distribution of the contigs obtained from *do novo* assembly of high-quality clean reads. (b) Size distribution of the unigenes produced from further assembly of contigs by contig joining, gap filling, and scaffold clustering. (c) Ratio distribution of the gap's length to the length of assembled unigenes. The x-axis indicates the ratio of the gap's length to the length of assembled unigenes. The y-axis indicates the number of unigenes containing gaps. (d) Random distribution of Illumina sequencing reads in the assembled unigenes. The x-axis indicates the relative position of sequencing reads in the assembled unigenes. The orientation of unigene is from 5' end to 3' end.

**Figure 2**
**Characteristics of homology search of unigenes against the NR database**. (a) Effects of query sequence length on percentage of significant matches. The cut-off value was set at 1.0e^-5. The proportion of sequences with matches in the NR database at NCBI is greater among the longer assembled sequences. (b) E-value distribution of the top BLAST hits for each unigene (E-value of 1.0e^-5). (c) Similarity distribution of the best BLAST hits for each unigene. (d) Species distribution is shown as the percentage of the total homologous sequences (with an E-value ≤ 1.0e^-5). We used all plant proteins in the NCBI NR database for homology search and extracted the best hit of each sequence for analysis.

**Figure 3**
**Venn diagram showing distribution of similarity search results**. (a) The number of unique sequence-based annotations is the sum of unique best BLASTX hits from the NR (NR counts including the unique BLASTX hits from the plant proteins and *Arabidopsis* proteins), Uniprot and KEGG databases (E-value ≤ 1.0e^-5), respectively. The overlap regions among the three circles contain the number of unigenes that share BLASTX similarity with respective databases. (b) The number of unique domain-based annotations is the integration of unique similarity search results against the InterPro, Pfam and COGs databases (E-value ≤ 1.0e^-5), respectively. (c) Number of all annotated *C. sinensis* unigenes is figured out based on the summation of both unique sequence-based annotations and unique domain-based annotations. The circle "a" and "b" indicate the two subsets of *C. sinensis* unigenes with sequence-based annotations (53,966 counts in Figure 3a) and domain-based annotations (44,705 counts in Figure 3b), respectively.

**Figure 4**
**COG Function Classification of the *C. sinensis* transcriptome**. A total of 11,241 unigenes showing significant homology to the COGs database at NCBI (E-value ≤ 1.0e^-5) have a COG classification among the 24 categories.

**Figure 5**
**Gene Ontology Classification of the *C. sinensis* transcriptome**. (a) Gene ontology (GO) term assignments to *C. sinensis* unigenes based on significant plant species hits against the NR database are summarized into three main GO categories (biological process, cellular component, molecular function) and 43 sub-categories. (b) Gene ontology (GO) term assignments to *C. sinensis* unigenes based on high-score BLASTX matches to the *Arabidopsis* proteins of NR database are classified into three main GO categories and 41 sub-categories. The left y-axis indicates the percentage of a specific category of genes in that main category. The right y-axis indicates the number of genes in the same category.

**Figure 6**
**Overall flow chart of the analysis of metabolic pathway genes using *C. sinensis* unigenes**.

**Figure 7**
*C. sinensis* unigenes involved in three secondary metabolic pathways. (a) *C. sinensis* unigenes involved in the pathway of flavonoid biosynthesis. (b) *C. sinensis* unigenes involved in the pathway of theanine biosynthesis. Putative theanine biosynthesis pathway is based on Sasaoka K (No. 48 in References). (c) *C. sinensis* unigenes involved in the pathway of caffeine biosynthesis. The red number in the bracket following each gene name indicates the number of corresponding *C. sinensis* unigenes.

**Figure 8**
**BLAST comparisons of the *C. sinensis* transcriptome with four UniEST databases**. (a) Comparisons of the *C. sinensis* transcriptome to four uniEST databases using BLASTN algorithm. (b) Comparisons of *C. sinensis* transcriptome to four uniEST databases using TBLASTX algorithm. (c) Comparisons of four uniEST databases to *C. sinensis* transcriptome using both BLASTN and TBLASTX algorithms.

**Figure 9**
**Validation of candidate unigenes in *C. sinensis* transcriptome by qRT-PCR**. (a) Seven candidate unigenes involved in the theanine metabolic pathway show differential expression patterns by qRT-PCR in three organs. (b) Six candidate unigenes involved in the flavonoid biosynthesis show differential expression patterns by qRT-PCR in three organs. Results represent the mean (± SD) of three experiments.

See this image and copyright information in PMC

References

1. Yamamoto T, Juneja LR, Chu DC, Kim M, (Eds) Chemistry and Application of Green Tea. CRC Press, New York; 1998.
1. Rogers PJ, Smith JE, Heatherley SV, Pleydell-Pearce CW. Time for tea: mood, blood pressure and cognitive performance effects of caffeine and theanine administered alone and together. Psychopharmacology. 2008;195:569–577. doi: 10.1007/s00213-007-0938-1. - DOI - PubMed
1. Wang Y, Jiang CJ, Zhang HY. Observation on the Self-incompatibility of Pollen Tubes in Self-pollination of Tea Plant in Style in vivo. Tea Sci. 2008;28:429–435.
1. Tanaka J, Taniguchi F. Estimation of the genome size of tea (Camellia sinensis), camellia (C. japonica), and their interspecific hybrids by flow cytometry. Journal of the Remote Sensing Society of Japan. 2006;101:1–7.
1. Park JS, Kim JB, Hahn BS, Kim KH, Ha SH, Kim YH. EST analysis of genes involved in secondary metabolism in Camellia sinensis (tea), using suppression subtractive hybridization. Plant Sci. 2004;166:953–961. doi: 10.1016/j.plantsci.2003.12.010. - DOI

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds

Affiliation

Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials