Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Sep 3;15(9):jkaf147.
doi: 10.1093/g3journal/jkaf147.

Local gene duplications drive extensive NLR copy number variation across multiple genotypes of Theobroma cacao

Affiliations

Local gene duplications drive extensive NLR copy number variation across multiple genotypes of Theobroma cacao

Noah P Winters et al. G3 (Bethesda). .

Abstract

Nucleotide-binding leucine-rich repeat receptors (NLR) are an essential component of plant immunity. NLR evolution is complex and dynamic, with rapid expansions, contractions, and polymorphism. Hundreds of high-quality plant genomes generated over the last 2 decades provide substantial insight into the evolutionary dynamics of NLR genes. Despite steadily decreasing sequencing costs, the difficulty of sequencing, assembling, and annotating high-quality genomes has resulted in comparatively little genome-wide information on intraspecies NLR diversity in long-lived perennial species. In this study, we investigated the evolution of NLR genes across 11 high-quality genomes of the chocolate tree, Theobroma cacao L. We found 3-fold variation in NLR copy number across genotypes, a pattern driven primarily by expansion of NLR clusters via tandem and proximal duplication. Our results indicate local duplications can radically reshape gene families over short evolutionary time scales, creating extensive intraspecific variation and a source of NLR diversity that could be utilized to enrich our understanding of both plant-pathogen interactions and resistance breeding.

Keywords: Theobroma cacao; NLR; disease resistance; molecular evolution.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflicts of interest.

Figures

Fig. 1.
Fig. 1.
NLR architecture and copy number across cacao genomes. a) The 4 canonical NLR architectures. NLR genes were classified as NL, CNL, or RNL according to their domain architecture. All NLRs contain an NB-ARC and LRR domain. This NB-ARC/LRR backbone either occurs in isolation (NL), or with 1 of 3 other domains, CC (CNL), TIR (TNL), or CCR (RNL). b) NLR CNV across all classes and genotypes. NL, CNL, TNL, and RNLs are shown along the x-axis, and as blue, yellow, teal, and black points, respectively. Each point represents the number of NLR copies for a particular genotype. Means are represented by diamonds. Lines represent 95% confidence intervals. High CNV genotypes had significantly more NLR genes than low CNV genotypes (mean difference = 225.11, Mann–Whitney test: P-value <0.01). Other than the RNL class, differences in mean NLR number between Low CNV and High CNV genotypes were significant for all classes (negative binomial GLM: NLR # ∼ CNV Group + NLR Class + CNV Group * NLR Class, adjusted P-values <0.01).
Fig. 2.
Fig. 2.
Distribution of NLR genes across each genome. a and b) Number of NLR genes or NLR pseudogenes on each chromosome. Orange depicts High CNV genotypes and purple depicts Low CNV genotypes. Each point represents the number of NLR genes or NLR pseudogenes for a particular genotype. NLR genes or NLR pseudogenes on Chr0 do not belong to 1 of the 10 chromosome-oriented scaffolds. Means are represented by diamonds. Lines represent 95% confidence intervals. Where the lower tail of the confidence interval is negative, the line is truncated at zero. If there is no variance for an observation, no confidence interval is shown.
Fig. 3.
Fig. 3.
Phylogeny of cacao genotypes sampled for this study. Phylogenetic tree of the 11 cacao genotypes used in this study, constructed using 1,364 single copy genes. Four noncacao species of Theobroma were additionally used as outgroups. Numbers on each node represent posterior probability support values calculated by ASTRAL. With the exception of CCN-51 and ICS-1, both of which are hybrids, tip colors indicate population membership. Noncacao Theobroma spp. are shown in gray. CNV class (High, Low, or No Information) of each genotype is shown in orange, purple, or white, respectively. Disease phenotypes are shown for WBD, FPR, CWC, and BPR. Blue indicates resistant, red indicates susceptible, and white indicates no information was available.
Fig. 4.
Fig. 4.
Genome annotation quality metrics. a) The total number of genes annotated in each of the 11 genomes used in this study, separated by Low CNV (left) and High CNV (right). NLR genes are shown in black and non-NLR genes are shown in white. There was no significant difference in gene number between Low CNV and High CNV genotypes (mean difference = 1080.89 genes, Mann–Whitney test: P-value >0.05). b) Distribution of AED scores for each genotype's classified NLR genes. Mean AED score was not significantly different between Low CNV and High CNV groups (mean difference = 0.018, t-test: P-value <0.001). c) BUSCO completeness for each genome used in this study, separated by Low CNV (left) and High CNV (right). The proportion of complete, fragmented, and missing BUSCOs are shown in green, orange, and beige, respectively. Differences in the mean proportion of complete, fragmented, and missing genes between Low CNV and High CNV genotypes were significant (one-way ANOVA: Proportion ∼ CNV Group + BUSCO Class + CNV Group * BUSCO Class, P-value <0.001; Tukey's HSD, adjusted P-value <0.01).
Fig. 5.
Fig. 5.
TE abundance for high and low CNV genotypes. Abundance of the 5 most common TEs in cacao genomes. High CNV genotypes (orange, left) and Low CNV genotypes (purple, right) are shown along the x-axis of each panel. Each point represents the number of TEs for a particular genotype. Means are represented by diamonds. Lines represent 95% confidence intervals. Differences in mean TE abundance between Low and High CNV genotypes were not significant (negative binomial GLM: # TE ∼ CNV + TE Class + CNV*TE Class, adjusted P-values >0.05).
Fig. 6.
Fig. 6.
Density of the 5 most common TEs across each cacao chromosome. Orange depicts High CNV genotypes and purple depicts Low CNV genotypes. Each point represents the number TEs for a particular genotype. Means are represented by diamonds. Lines represent 95% confidence intervals. Stars indicate significant differences in mean TE abundance between chromosomes (negative binomial GLM: # TE ∼ Chrom + TE Class + Chrom*TE Class, adjusted P-values <0.05). Differences in mean TE abundance between Low and High CNV genotypes on each chromosome were not significant (negative binomial GLM: # TE ∼ CNV + Chrom + TE Class + Chrom*TE Class*CNV, adjusted P-values >0.05).
Fig. 7.
Fig. 7.
Patterns of NLR duplication across each genome. a) The proportion of NLR (black, left) and non-NLR (gray, right) genes in each duplication class. Each point represents the proportion of NLR or non-NLR genes for a particular genotype. Means are represented by diamonds. Lines represent 95% confidence intervals. All differences in mean proportion between NLR and non-NLR genes were significant (one-way ANOVA: Proportion ∼ Gene Type + Duplicate Type + Gene Type * Duplicate Type, P-value <0.001; Tukey's HSD, adjusted P-values <0.001). b) The number of NLR genes belonging to each duplication class, for both Low CNV (purple, left) and High CNV (orange, right) genotypes. Points represent the number of NLR genes for a particular genotype. Means are represented by diamonds. Lines represent 95% confidence intervals. All differences in mean NLR number between Low CNV and High CNV groups were significant (negative binomial GLM: # NLR Duplicates ∼ Duplicate Type + CNV Group + Duplicate Type * CNV Group, adjusted P-values <0.001).
Fig. 8.
Fig. 8.
NLR duplications across domain architectures. NL, CNL, TNL, and RNLs are shown along the x-axis of each panel, and as blue, yellow, teal, and black points, respectively. Each point represents the number of NLR copies for a particular genotype. Means are represented by diamonds. Lines represent 95% confidence intervals.
Fig. 9.
Fig. 9.
Number, size, and location of NLR clusters. a) The genomic distribution of NLRs in each duplicate type, for Low CNV (purple) and High CNV (orange) genotypes. Each point represents the number of NLRs for a particular genotype. Boxes outline the 4 chromosomes with the highest NLR density. Means are represented by diamonds. Lines represent 95% confidence intervals. b) Number and size of NLR clusters for each genotype, for Low CNV (purple) genotypes, shown as the four leftmost positions, and High CNV (orange) genotypes, shown as the seven rightmost positons. Each point represents a single NLR cluster. Mean cluster size for each genotype is represented by a diamond. Lines represent 95% confidence intervals. Boxed values indicate the number of NLR clusters for each genotype. Differences in both mean cluster number and mean cluster size between Low CNV and High CNV genotypes were significant (cluster number: mean difference = 6.32, Mann–Whitney test, P-value <0.01; cluster size: mean difference = 11.48, Mann–Whitney test, P-value <0.05).

References

    1. Alonge M, Soyk S, Ramakrishnan S, Wang X, Goodwin S, Sedlazeck FJ, Lippman ZB, Schatz MC. 2019. RaGOO: fast and accurate reference-guided scaffolding of draft genomes. Genome Biol. 20(1):224. 10.1186/s13059-019-1829-6. - DOI - PMC - PubMed
    1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol. 215(3):403–410. 10.1016/S0022-2836(05)80360-2. - DOI - PubMed
    1. Andersen EJ, Nepal MP, Purintun JM, Nelson D, Mermigka G, Sarris PF. 2020. Wheat disease resistance genes and their diversification through integrated domain fusions. Front Genet. 11:898. 10.3389/fgene.2020.00898. - DOI - PMC - PubMed
    1. Anderson PA, Lawrence GJ, Morrish BC, Ayliffe MA, Finnegan EJ, Ellis JG. 1997. Inactivation of the flax rust resistance gene M associated with loss of a repeated unit within the leucine-rich repeat coding region. Plant Cell. 9(4):641–651. 10.1105/tpc.9.4.641. - DOI - PMC - PubMed
    1. Annilo T, Chen Z-Q, Shulenin S, Costantino J, Thomas L, Lou H, Stefanov S, Dean M. 2006. Evolution of the vertebrate ABC gene family: analysis of gene birth and death. Genomics. 88(1):1–11. 10.1016/j.ygeno.2006.03.001. - DOI - PubMed

LinkOut - more resources