Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan 2:13:giae027.
doi: 10.1093/gigascience/giae027.

Genomic decoding of Theobroma grandiflorum (cupuassu) at chromosomal scale: evolutionary insights for horticultural innovation

Affiliations

Genomic decoding of Theobroma grandiflorum (cupuassu) at chromosomal scale: evolutionary insights for horticultural innovation

Rafael Moysés Alves et al. Gigascience. .

Abstract

Background: Theobroma grandiflorum (Malvaceae), known as cupuassu, is a tree indigenous to the Amazon basin, valued for its large fruits and seed pulp, contributing notably to the Amazonian bioeconomy. The seed pulp is utilized in desserts and beverages, and its seed butter is used in cosmetics. Here, we present the sequenced telomere-to-telomere genome of cupuassu, disclosing its genomic structure, evolutionary features, and phylogenetic relationships within the Malvaceae family.

Findings: The cupuassu genome spans 423 Mb, encodes 31,381 genes distributed in 10 chromosomes, and exhibits approximately 65% gene synteny with the Theobroma cacao genome, reflecting a conserved evolutionary history, albeit punctuated with unique genomic variations. The main changes are pronounced by bursts of long-terminal repeat retrotransposons at postspecies divergence, retrocopied and singleton genes, and gene families displaying distinctive patterns of expansion and contraction. Furthermore, positively selected genes are evident, particularly among retained and dispersed tandem and proximal duplicated genes associated with general fruit and seed traits and defense mechanisms, supporting the hypothesis of potential episodes of subfunctionalization and neofunctionalization following duplication, as well as impact from distinct domestication process. These genomic variations may underpin the differences observed in fruit and seed morphology, ripening, and disease resistance between cupuassu and the other Malvaceae species.

Conclusions: The cupuassu genome offers a foundational resource for both breeding improvement and conservation biology, yielding insights into the evolution and diversity within the genus Theobroma.

Keywords: Amazon basin; bioeconomy; cupuassu; fruit pulp and seed development; gene loss and retention; genome evolution; plant secondary metabolites; positive selection.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1:
Figure 1:
(A) T. grandiflorum tree displaying fruits. (B) Detailed view of a cupuassu fruit. (C) Cupuassu fruit opened to reveal the internal pulp. Image credits: Ronaldo Rosas.
Figure 2:
Figure 2:
(A) Depiction of the genomic landscape of T. grandiflorum, illustrating gene and TE density across the 10 chromosomes. (B) High-throughput chromosome conformation capture (Hi-C) contact map revealing the assembled chromosomes of T. grandiflorum. (C) Whole-genome duplication analyses indicating the shared whole-genome triplication among T. grandiflorum, T. cacao, and H. umbratica and confirming the absence of additional WGD events in these species.
Figure 3:
Figure 3:
Transposable element distribution in T. grandiflorum. (A) Distribution of autonomous and nonautonomous TE from class I and class II. (B) Distribution of all evolutionary lineages of LTR elements. (C) Phylogenetic analysis and distribution of each full-length LTR element identified in T. grandiflorum. The age of LTR insertions was estimated using the default rate of 1.3 × 10–8 substitutions per site per year, making this calculation an approximate estimation.
Figure 4:
Figure 4:
Comparative genomic analysis of T. grandiflorum with T. cacao and H. umbratica. (A) Macrosyntenic patterns between T. grandiflorum and T. cacao, revealing conserved genome structures. (B) Comparative idiogram map between T. grandiflorum and T. cacao, as well as between T. grandiflorum and H. umbratica. The idiograms illustrate gene-rich regions (blue), TE-rich regions (red), and potential location of centromeres (black circles) identified by the quarTeT and Centromics tools. Blue bars on the left of each idiogram represent microsynteny between T. grandiflorum and T. cacao, while red bars on the right indicate microsynteny between T. grandiflorum and H. umbratica.
Figure 5:
Figure 5:
Microsyntenic analysis of the self-incompatibility loci (CH1 and CH4) in Theobroma and Herrania. (A) CH1 loci. (B) CH4 loci. Genes marked in bold are considered central to self-incompatibility reactions, as previously described [79]. The GEX1 locus, containing the complete and homologous genes, is marked with dotted lines.
Figure 6:
Figure 6:
Comparative analyses across Malvaceae species focusing on functions related to plant differentiation, fruit and seed development, and organoleptic and physicochemical qualities. (A) A Venn diagram illustrates the shared and exclusive orthologous clusters (gene families) identified across 4 Malvaceae species and Arabidopsis thaliana. (B) The identification of gene families and singletons encompasses a range of functions with predicted roles in various aspects of plant and fruit development. These include cytochrome P450 and ABC transporters, which are pivotal in synthesizing secondary metabolites and nutrient uptake, respectively, influencing plant growth and fruit quality. Plant mobile domain (PMD) proteins and disease resistance genes play roles in stress response and plant health, indirectly impacting fruit quality. Serine/threonine kinase, protein kinase domain–containing proteins, and several metabolism-related genes (flavonoid, chalcone, terpene, sesquiterpenes) regulate pathways critical for plant growth, development, and the organoleptic properties of fruits. Genes related to defense mechanisms (chitin receptor/chitinase, defensin, ubiquitin-like protease) and cell wall composition (methylesterase, polygalacturonase, pectinesterase, expansin, laccase, xyloglucan endotransglucosylase/hydrolase) are also identified, reflecting their roles in maintaining plant health and influencing fruit texture and firmness. Furthermore, genes involved in seed development (vicilin, legume-related protein, lipid storage) and various transcription factors (including MADS-box) are noted for their influence on plant growth and developmental processes. (C) A phylogenetic tree delineates the evolutionary timeline of the Malvaceae species, with A. thaliana serving as the outgroup. An accompanying pie chart displays the proportions of gene families that have expanded or contracted, indicating evolutionary dynamics. The divergence time and its confidence interval, when available, were obtained from the TimeTree5 database. (D) The analysis of expanded and contracted gene families focuses on their common functions and roles, as detailed in B, shedding light on the evolutionary adaptations of these species.
Figure 7:
Figure 7:
Gene Ontology enrichment and comparative analysis across T. grandiflorum, T. cacao, and H. umbratica. Black arrows highlight GO terms that are exclusively enriched in T. grandiflorum, in either duplicated genes or singletons. These terms provide insights into the unique biological processes, cellular components, and molecular functions connected with fruit and seed quality and defense mechanism that are particularly prominent in T. grandiflorum compared to the other species.
Figure 8:
Figure 8:
Positive selection analysis of duplicated genes in T. grandiflorum, T. cacao, and H. umbratica. (A) A violin plot displays the distribution of the Ka/Ks ratios for gene pairs resulting from dispersed, proximal, and tandem duplication in T. grandiflorum, T. cacao, and H. umbratica. The number above each plot indicates the percentage of duplicated genes under purifying selection. (B) A swarmplot illustrates the Ka/Ks ratio distributions for selected Gene Ontology (GO) terms associated with fruit traits and defense mechanisms in T. grandiflorum. This plot provides insights into the selective pressures acting on genes related to these specific functions.

References

    1. Cuatrecasas J. Cacao and Its Allies: A Taxonomic Revision of the Genus Theobroma. 1964; Washington, DC: Smithsonian Institution.
    1. The Angiosperm Phylogeny Group . An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG IV. Bot J Linn Soc. 2016;181:1–20. 10.1111/boj.12385. - DOI
    1. da Silva RA, Souza G, Lemos LSL, et al. Genome size, cytogenetic data and transferability of EST-SSRs markers in wild and cultivated species of the genus Theobroma L. (Byttnerioideae, Malvaceae). PLoS One. 2017;12:e0170799. 10.1371/journal.pone.0170799. - DOI - PMC - PubMed
    1. Freitas ÍR, Pirani JR, Colli-Silva M. Cacau para quê? Levantamento bibliográfico sobre os usos materiais e simbólicos das espécies de cacaus do brasil. Ethnoscientia. 2023;8:127. 10.18542/ethnoscientia.v8i1.12940. - DOI
    1. Garcia TB, de V Potiguara RC, Kikuchi TYS, et al. Leaf anatomical features of three theobroma species (Malvaceae s.l.) native to the Brazilian Amazon. Acta Amaz. 2014;44:291–300. 10.1590/1809-4392201300653. - DOI

Publication types