Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Oct 6;17(10):e0260907.
doi: 10.1371/journal.pone.0260907. eCollection 2022.

Genome-wide association studies and genomic selection assays made in a large sample of cacao (Theobroma cacao L.) germplasm reveal significant marker-trait associations and good predictive value for improving yield potential

Affiliations

Genome-wide association studies and genomic selection assays made in a large sample of cacao (Theobroma cacao L.) germplasm reveal significant marker-trait associations and good predictive value for improving yield potential

Frances L Bekele et al. PLoS One. .

Abstract

A genome-wide association study (GWAS) was undertaken to unravel marker-trait associations (MTAs) between SNP markers and phenotypic traits. It involved a subset of 421 cacao accessions from the large and diverse collection conserved ex situ at the International Cocoa Genebank Trinidad. A Mixed Linear Model (MLM) in TASSEL was used for the GWAS and followed by confirmatory analyses using GAPIT FarmCPU. An average linkage disequilibrium (r2) of 0.10 at 5.2 Mb was found across several chromosomes. Seventeen significant (P ≤ 8.17 × 10-5 (-log10 (p) = 4.088)) MTAs of interest, including six that pertained to yield-related traits, were identified using TASSEL MLM. The latter accounted for 5 to 17% of the phenotypic variation expressed. The highly significant association (P ≤ 8.17 × 10-5) between seed length to width ratio and TcSNP 733 on chromosome 5 was verified with FarmCPU (P ≤ 1.12 × 10-8). Fourteen MTAs were common to both the TASSEL and FarmCPU models at P ≤ 0.003. The most significant yield-related MTAs involved seed number and seed length on chromosome 7 (P ≤ 1.15 × 10-14 and P ≤ 6.75 × 10-05, respectively) and seed number on chromosome 1 (P ≤ 2.38 × 10-05), based on the TASSEL MLM. It was noteworthy that seed length, seed length to width ratio and seed number were associated with markers at different loci, indicating their polygenic nature. Approximately 40 candidate genes that encode embryo and seed development, protein synthesis, carbohydrate transport and lipid biosynthesis and transport were identified in the flanking regions of the significantly associated SNPs and in linkage disequilibrium with them. A significant association of fruit surface anthocyanin intensity co-localised with MYB-related protein 308 on chromosome 4. Testing of a genomic selection approach revealed good predictive value (genomic estimated breeding values (GEBV)) for economic traits such as seed number (GEBV = 0.611), seed length (0.6199), seed width (0.5435), seed length to width ratio (0.5503), seed/cotyledon mass (0.6014) and ovule number (0.6325). The findings of this study could facilitate genomic selection and marker-assisted breeding of cacao thereby expediting improvement in the yield potential of cacao planting material.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig 1
Fig 1. Box and whisker plots, histograms and individual value plot for pod index showing variation in the yield-related traits.
The asterisks in the box and whisker plots represent outliers.
Fig 2
Fig 2. Correlograms showing Pearson correlations for quantitative traits and Spearman correlations for anthocyanin intensity in various plant organs.
Positive correlations are displayed as blue circles and negative correlations as orange circles. The sizes of the circles are proportional to the correlation coefficients. The plant organs for which anthocyanin intensity was measured were the flower ligule, filament and pedicel, and mature fruit ridges and seed cotyledons).
Fig 3
Fig 3. Plot of log of K versus number of clusters based on STRUCTURE analysis.
Analysis of population structure of 421 cacao accessions using STRUCTURE—estimated LnP(K) of possible clusters (K) from 2 to 15. When K is approaching a true value, L(K) plateaus (or continues increasing slightly).
Fig 4
Fig 4. Neighbour-joining tree based on UPGMA of 421 cacao genotypes.
The tree was generated in DARwin Version 6 and rendered in iTOL version 6 (https://itol.embl.de/). Seven admixed groups are evident.
Fig 5
Fig 5. Plots modelling the decay in pairwise linkage disequilibrium coefficients (r2) as a function of the distance between markers in megabases (Mb).
Plot of pairwise linkage disequilibrium coefficients (r2) on chromosome 1; Plot of pairwise linkage disequilibrium coefficients (r2) on chromosome 4; Plot of pairwise linkage disequilibrium coefficients (r2) on chromosome 5; Plot of pairwise linkage disequilibrium coefficients (r2) on chromosome 7; Plot of pairwise linkage disequilibrium coefficients (r2) on chromosome 9. Heatmap of linkage disequilibrium (r2) across the chromosomes 4 and 5 based on data for 421 cacao accessions genotyped using 612 filtered SNPs. Markers were ordered on the x and y axes in the Heatmap according to location along the chromosomes and each cell of the heatmap represents a single marker pair. The upper triangle, above the black diagonal on the heatmap, is colour-coded based on the r2 value between SNPs while colours depicted in the lower triangle are based on P-values for the corresponding r2 values.
Fig 6
Fig 6. Manhattan plots from genome-wide association analysis.
Genome-wide association plots across 8 cacao chromosomes for seven phenotypic traits that had statistically significant MTAs: filament anthocyanin intensity, fruit surface (ridges) anthocyanin intensity, log fruit length, log seed length, log seed number, seed length to width ratio, seed number.
  1. Based on TASSEL version 5.2.50 MLM results for 421 cacao accessions (612 SNPs).

  2. Chromosome “11” was designated for unmapped SNP markers (some of which have recently been mapped).

  3. X- and Y-axes represent the SNP markers along each chromosome and the -log10(P-value), respectively.

  4. The red horizontal line corresponds to the Bonferonni significance threshold of P-values ≤ 8.17 × 10−5 (–log10 (P) = 4.088) and the blue line corresponds to a significance level of 0.005.

Fig 6 Quantile–quantile plots of estimated−log10 (P) from genome-wide association studies using TASSEL MLM. Quantile–quantile plots of estimated−log10 (P) for filament anthocyanin intensity; Quantile–quantile plots of estimated−log10 (P) for fruit surface (ridges) anthocyanin intensity; Quantile–quantile plots of estimated−log10 (P) for log fruit length; Quantile–quantile plots of estimated−log10 (P) for log seed length; Quantile–quantile plots of estimated−log10 (P) for log seed number; Quantile–quantile plots of estimated−log10 (P) for seed length to width ratio; Quantile–quantile plots of estimated−log10 (P) for seed number. The plots provide no evidence of bias in the GWAS, such as due to genotyping artifacts, and display the extent to which the observed distribution of the test statistic followed the expected (null) distribution. The red line represents expected P-values with no associations.
Fig 7
Fig 7. Physical map of T. cacao L. showing annotated candidate genes, which were co-localised with SNP markers associated with yield-related and other traits.
Gene loci and proteins are shown on the right and genetic distances (Mb) are shown on the left. No candidate genes were identified on chromosomes 2 and 10.

Similar articles

Cited by

References

    1. Alverson WS, Whitlock BA, Nyffeler R, Bayer C, Baum DA. Phylogeny of the core Malvales: evidence from ndhF sequence data. American Journal of Botany. 1999. Oct;86(10):1474–86. doi: 10.2307/2656928 - DOI - PubMed
    1. Expert market research. Expert Market Research Report. 2020. https://www.expertmarketresearch.com/reports/chocolate-market. Accessed August 6, 2020.
    1. Argout X, Salse J, Aury JM, Guiltinan MJ, Droc G, Gouzy J et al.. The genome of Theobroma cacao. Nature Genetics. 2011. Feb;43(2):101–8. doi: 10.1038/ng.736 - DOI - PubMed
    1. Cheesman EE. Notes on the nomenclature, classification and possible relationships of cacao populations. Tropical Agriculture. 1944;21(8).
    1. Eskes A, Lanaud C. Cocoa. In: Charrier A, Jacquot M, Hamon S, Nicolas D, editors. Tropical Plant Breeding. Montpellier: CIRAD; 2001. p. 78–105.

Publication types