Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Sep 15;18(1):730.
doi: 10.1186/s12864-017-4120-9.

The cacao Criollo genome v2.0: an improved version of the genome for genetic and functional genomic studies

Affiliations

The cacao Criollo genome v2.0: an improved version of the genome for genetic and functional genomic studies

X Argout et al. BMC Genomics. .

Abstract

Background: Theobroma cacao L., native to the Amazonian basin of South America, is an economically important fruit tree crop for tropical countries as a source of chocolate. The first draft genome of the species, from a Criollo cultivar, was published in 2011. Although a useful resource, some improvements are possible, including identifying misassemblies, reducing the number of scaffolds and gaps, and anchoring un-anchored sequences to the 10 chromosomes.

Methods: We used a NGS-based approach to significantly improve the assembly of the Belizian Criollo B97-61/B2 genome. We combined four Illumina large insert size mate paired libraries with 52x of Pacific Biosciences long reads to correct misassembled regions and reduced the number of scaffolds. We then used genotyping by sequencing (GBS) methods to increase the proportion of the assembly anchored to chromosomes.

Results: The scaffold number decreased from 4,792 in assembly V1 to 554 in V2 while the scaffold N50 size has increased from 0.47 Mb in V1 to 6.5 Mb in V2. A total of 96.7% of the assembly was anchored to the 10 chromosomes compared to 66.8% in the previous version. Unknown sites (Ns) were reduced from 10.8% to 5.7%. In addition, we updated the functional annotations and performed a new RefSeq structural annotation based on RNAseq evidence.

Conclusion: Theobroma cacao Criollo genome version 2 will be a valuable resource for the investigation of complex traits at the genomic level and for future comparative genomics and genetics studies in cacao tree. New functional tools and annotations are available on the Cocoa Genome Hub ( http://cocoa-genome-hub.southgreen.fr ).

Keywords: Criollo B97–61/B2 genome; GBS; Genome Assembly; Mate Paired sequences; Theobroma cacao.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

The T. cacao mapping population used in this study was planted at CIRAD experimental station of Paracou-Combi. The clones ICS95 and UF676 used as parents of the progenies came from the french Guiana CIRAD cocoa collection.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
CIRCOS graphical representation of paired reads mapping on a misassembled contig. The blue circle represents the contig sequence. In the inner circle, grey lines represent concordant links (orientation and insert size) between read pairs. The black arrow points to the misassembled region
Fig. 2
Fig. 2
Chromosome reconstruction. Linkage dot plots between markers along non-ordered scaffolds (a) and ordered scaffolds (b) on chromosome 1. Each dot represents the recombination frequency between two markers. The intensity of the linkage is color coded. Warm colors indicate strong linkage and cold colors indicate weak linkage. Grey bars in the dot plots divide markers belonging to a same scaffold
Fig. 3
Fig. 3
Scaffolds anchored to the 10 Theobroma cacao chromosomes. Black boxes represent scaffolds with orientation. Gene and SNP marker densities are in blue and orange, respectively, and were computed with a window size of 400 kb
Fig. 4
Fig. 4
Comparison of Theobroma cacao Criollo assembly version 1 and version 2. a Graphical representation of insertions and reduction of the unknown chromosome version 1 (Tc00) in chromosomes version 2 (chr1–10). b Graphical representation of regions previously anchored to a different chromosome in the first version of the assemblies. “Tc” chromosomes refer to assembly version 1 and “chr” chromosomes to assembly version 2
Fig. 5
Fig. 5
Dot plot comparing Criollo B97–61/B2 version 2 and Amelonado Matina 1–6 genomes computed with Last [29]. Red and blue dots indicate forward and reverse alignments, respectively

References

    1. Argout X, Salse J, Aury JM, Guiltinan MJ, Droc G, Gouzy J, et al. The genome of Theobroma cacao. Nat Genet. 2011;43:101–108. doi: 10.1038/ng.736. - DOI - PubMed
    1. Motamayor JC, Lachenaud P, da Silva E Mota JW, Loor R, Kuhn DN, Brown JS, et al. Geographic and Genetic Population Differentiation of the Amazonian Chocolate Tree (Theobroma cacao L) PLoS One. 2008;3:e3311. doi: 10.1371/journal.pone.0003311. - DOI - PMC - PubMed
    1. Argout X, Fouet O, Wincker P, Gramacho K, Legavre T, Sabau X, et al. Towards the understanding of the cocoa transcriptome: Production and analysis of an exhaustive dataset of ESTs of Theobroma cacao L. generated from various tissues and under various conditions. BMC Genomics. 2008;9:512. - PMC - PubMed
    1. Motamayor JC, Mockaitis K, Schmutz J, Haiminen N, III DL, Cornejo O, et al. The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color. Genome Biol. 2013;14:r53. doi: 10.1186/gb-2013-14-6-r53. - DOI - PMC - PubMed
    1. Chain PSG, Grafham DV, Fulton RS, FitzGerald MG, Hostetler J, Muzny D, et al. Genome Project Standards in a New Era of Sequencing. Science. 2009;326:236–237. doi: 10.1126/science.1180614. - DOI - PMC - PubMed

MeSH terms