Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 May;22(5):1312-1324.
doi: 10.1111/pbi.14267. Epub 2024 Jan 11.

Genomic basis of seed colour in quinoa inferred from variant patterns using extreme gradient boosting

Affiliations

Genomic basis of seed colour in quinoa inferred from variant patterns using extreme gradient boosting

Felix L Sandell et al. Plant Biotechnol J. 2024 May.

Abstract

Quinoa is an agriculturally important crop species originally domesticated in the Andes of central South America. One of its most important phenotypic traits is seed colour. Seed colour variation is determined by contrasting abundance of betalains, a class of strong antioxidant and free radicals scavenging colour pigments only found in plants of the order Caryophyllales. However, the genetic basis for these pigments in seeds remains to be identified. Here we demonstrate the application of machine learning (extreme gradient boosting) to identify genetic variants predictive of seed colour. We show that extreme gradient boosting outperforms the classical genome-wide association approach. We provide re-sequencing and phenotypic data for 156 South American quinoa accessions and identify candidate genes potentially controlling betalain content in quinoa seeds. Genes identified include novel cytochrome P450 genes and known members of the betalain synthesis pathway, as well as genes annotated as being involved in seed development. Our work showcases the power of modern machine learning methods to extract biologically meaningful information from large sequencing data sets.

Keywords: betalain synthesis pathway; genome sequencing; genotype‐phenotype relationships; machine learning; quinoa; seed colour.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1
Figure 1
Representative quinoa seeds classified as white (a), beige (b), greenish‐red (c, d), yellow (e), orange (f), red (g) and black (h) at 50‐fold magnification.
Figure 2
Figure 2
Principal component analysis (PCA) and linear discriminant analysis (LDA) of 129 SNP positions that increased the quality of at least nine independent XGBoost models using accessions with beige, orange and white seeds.
Figure 3
Figure 3
(a) Principal component analysis (PCA) and (b) linear discriminant analysis (LDA) of 129 SNP positions that increased the quality of at least nine independent XGBoost models using accessions with beige, orange, white and yellow seeds.
Figure 4
Figure 4
Hierarchical clustering of the genotypes at 129 variant positions discriminating beige‐, orange‐ and white‐seeded accessions. Dark blue: homozygous alternative; turquoise: heterozygous; light green: homozygous reference; black: missing data.
Figure 5
Figure 5
Linear discriminant analysis (LDA) of 129 SNP positions that increased the quality of at least nine independent XGBoost models. In addition to the four larger seed groups (beige, orange, white and yellow) we also included data from quinoa accessions with red, black, brown and greenish‐red seeds.

Similar articles

Cited by

References

    1. van der Auwera, G. and O'Connor, B.D. (2020) Genomics in the Cloud: Using Docker, GATK, and WDL in Terra, 1st edn. Sebastopol, CA: O'Reilly Media.
    1. Bao, W. , Kojima, K.K. and Kohany, O. (2015) Repbase update, a database of repetitive elements in eukaryotic genomes. Mobile DNA, 6, 11. - PMC - PubMed
    1. Bergstra, J. , Komer, B. , Eliasmith, C. , Yamins, D. and Cox, D.D. (2015) Hyperopt: a Python library for model selection and hyperparameter optimization. Comput. Sci. Discov. 8, 014008.
    1. Bodrug‐Schepers, A. , Stralis‐Pavese, N. , Buerstmayr, H. , Dohm, J.C. and Himmelbauer, H. (2021) Quinoa genome assembly employing genomic variation for guided scaffolding. Theor. Appl. Genet. 134, 3577–3594. - PMC - PubMed
    1. Bolger, A.M. , Lohse, M. and Usadel, B. (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 30, 2114–2120. - PMC - PubMed

LinkOut - more resources