Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Oct 30:9:512.
doi: 10.1186/1471-2164-9-512.

Towards the understanding of the cocoa transcriptome: Production and analysis of an exhaustive dataset of ESTs of Theobroma cacao L. generated from various tissues and under various conditions

Affiliations

Towards the understanding of the cocoa transcriptome: Production and analysis of an exhaustive dataset of ESTs of Theobroma cacao L. generated from various tissues and under various conditions

Xavier Argout et al. BMC Genomics. .

Abstract

Background: Theobroma cacao L., is a tree originated from the tropical rainforest of South America. It is one of the major cash crops for many tropical countries. T. cacao is mainly produced on smallholdings, providing resources for 14 million farmers. Disease resistance and T. cacao quality improvement are two important challenges for all actors of cocoa and chocolate production. T. cacao is seriously affected by pests and fungal diseases, responsible for more than 40% yield losses and quality improvement, nutritional and organoleptic, is also important for consumers. An international collaboration was formed to develop an EST genomic resource database for cacao.

Results: Fifty-six cDNA libraries were constructed from different organs, different genotypes and different environmental conditions. A total of 149,650 valid EST sequences were generated corresponding to 48,594 unigenes, 12,692 contigs and 35,902 singletons. A total of 29,849 unigenes shared significant homology with public sequences from other species.Gene Ontology (GO) annotation was applied to distribute the ESTs among the main GO categories.A specific information system (ESTtik) was constructed to process, store and manage this EST collection allowing the user to query a database.To check the representativeness of our EST collection, we looked for the genes known to be involved in two different metabolic pathways extensively studied in other plant species and important for T. cacao qualities: the flavonoid and the terpene pathways. Most of the enzymes described in other crops for these two metabolic pathways were found in our EST collection.A large collection of new genetic markers was provided by this ESTs collection.

Conclusion: This EST collection displays a good representation of the T. cacao transcriptome, suitable for analysis of biochemical pathways based on oligonucleotide microarrays derived from these ESTs. It will provide numerous genetic markers that will allow the construction of a high density gene map of T. cacao. This EST collection represents a unique and important molecular resource for T. cacao study and improvement, facilitating the discovery of candidate genes for important T. cacao trait variation.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Distribution of T. cacao EST members in contigs after the assembly process.
Figure 2
Figure 2
Number of contigs composed from sequence originated from one ore more libraries.
Figure 3
Figure 3
Species distribution among the Blast results of T. cacao unigenes. A – Distribution of species represented in the 10 first Blast hits against NCBI Non redundant protein database. B – Number of best Blast hits against Arabidopsis thaliana and Vitis vinifera proteomes. C – Arabidopsis thaliana (black columns) and Vitis vinifera (grey columns) proteome coverage.
Figure 4
Figure 4
Gene Ontology annotation results. A – Distribution of the unigenes among the main Gene Ontology categories (Biological Process, Cellular Component and Molecular Function). B – Distribution of the unigenes among the 10 best Gene Ontology terms.
Figure 5
Figure 5
Schematic overview of the general flavonoid biosynthesis pathway (according to Schijlen et al., 2004; Marles et al., 2003). The number of contigs and singletons present in our EST dataset was added between brackets for each enzyme.
Figure 6
Figure 6
The biosynthesis pathway of isoprenoïdes. (according Liu et al., 2005). Pathway Mevalonate (MVA) cytoplasmic in left and pathway 1-deoxyxylulose-5-phosphate (DXP) chloroplastic in right. AACT, acetoacetyl-coenzyme A (CoA) thiolase; CMS, 2-C-methyl-D-erythritol 4-phosphaate cytidyl transferase; DTS, diterpene synthase; DXR, 1- deoxy-D-xylulose 5-phosphate reductoisomerase; DXS, 1-deoxy-D-xylulose 5-phosphate synthase; FPPS, farnesyl diphosphate synthase; GGPPS, geranylgeranyl diphosphate synthase; GPPS, geranyl diphosphate synthase; HMGR, 3-hydroxy-3-methylglutaryl coenzyme A (HMG-CoA) reductase; IPPi, isopentenyl diphosphate isomerase; MTS, monoterpene synthase; SES, sesquiterpene synthase; SQS squalene synthase; MK, mevalonate kinase; MPK, mevalonate-5-phosphate kinase; CMK, 4-(cytidine 5'-diphospho)-2-C-methyl-D-erythritol kinase; MDD, mevalonate diphosphate decarboxylase; IDS, isopentenyl diphosphate/dimethylallyl diphosphate synthase; MCS, 2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase; HDS, 1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate synthase; PSY, phytoene synthase; HMGS, HMG-CoA synthase; HMG-CoA, 3S-hydroxy-3-methylglutaryl coenzyme A; DXP, 1-deoxy-D-xylulose 5-phosphate; MVA, 3R-Mevalonic acid; MEP, 2-C-methyl-D-erythritol 4-phosphate; CDP-ME, 4-(cytidine 5'-diphospho)-2C-methyl-D-erythritol; CDP-MEP, 4-(cytidine 5'-diphospho)-2C-methyl-D-erythritol 2-phosphate; cMEPP, 2C-methyl-D-erythritol 2,4-cyclodiphosphate; DMAPP, Dimethylallyl diphosphate; HMBPP, 1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate; IPP, isopentenyl diphosphate; GPP, geranyl diphosphate; FPP, farnesyl diphosphate; GGPPS, geranylgeranyl diphosphate. The number of contigs and singletons present in our EST dataset was added between brackets for each enzyme.
Figure 7
Figure 7
Schematic overview of the ESTtik information System.

References

    1. Figueira A, Janik J, Goldsbrough P. Genome size and DNA polymorphism in Theobroma cacao. Journal of the American Society for Horticultural Science. 1992;117:673–677.
    1. Lanaud C, Hamon P, Duperray C. Estimation of nuclear DNA content of Theobroma cacao L. by flow cytometry. Café, Cacao, Thé. 1992;36:3–8.
    1. Cheesman EE. Notes on the nomenclature, classification possible and relationships of cocoa populations. Tropical Agriculture. 1944;21:144–159.
    1. Bowers JH, Bailey BA, Hebbar PK, Sanogo S, Lumsden RD. The impact of plant diseases on world chocolate production. Plant Health Progress. 2001.
    1. Ampuero E. Monilia pod rot of cocoa. Cocoa Grower's Bulletin. 1967;9:1518.

Publication types