Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Mar 24:3:35.
doi: 10.3389/fbioe.2015.00035. eCollection 2015.

Learning to Classify Organic and Conventional Wheat - A Machine Learning Driven Approach Using the MeltDB 2.0 Metabolomics Analysis Platform

Affiliations

Learning to Classify Organic and Conventional Wheat - A Machine Learning Driven Approach Using the MeltDB 2.0 Metabolomics Analysis Platform

Nikolas Kessler et al. Front Bioeng Biotechnol. .

Abstract

We present results of our machine learning approach to the problem of classifying GC-MS data originating from wheat grains of different farming systems. The aim is to investigate the potential of learning algorithms to classify GC-MS data to be either from conventionally grown or from organically grown samples and considering different cultivars. The motivation of our work is rather obvious nowadays: increased demand for organic food in post-industrialized societies and the necessity to prove organic food authenticity. The background of our data set is given by up to 11 wheat cultivars that have been cultivated in both farming systems, organic and conventional, throughout 3 years. More than 300 GC-MS measurements were recorded and subsequently processed and analyzed in the MeltDB 2.0 metabolomics analysis platform, being briefly outlined in this paper. We further describe how unsupervised (t-SNE, PCA) and supervised (SVM) methods can be applied for sample visualization and classification. Our results clearly show that years have most and wheat cultivars have second-most influence on the metabolic composition of a sample. We can also show that for a given year and cultivar, organic and conventional cultivation can be distinguished by machine-learning algorithms.

Keywords: computational metabolomics; food authentication; machine learning; metabolome informatics; metabolomics; organic farming; statistics.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The principal component analysis on the entire dataset of all samples throughout all years, cultivars, and treatments show that the first two components mainly separate samples by the factor year. A separation by the factor farming system is not possible.
Figure 2
Figure 2
A principal component analysis performed on a dataset from 1 year only will mainly cluster samples by their cultivar, regardless of the applied farming system. This PCA is based on samples from the year 2007.
Figure 3
Figure 3
Similar to Figure 1, in the principal component analysis on a dataset of only one cultivar – here “Runal” is shown – the first principal components separate samples by factor year.
Figure 4
Figure 4
Plotting samples from one cultivar (here “Runal”) along the principal components two and four show that a separation by farming system might be possible, even though the main variance is caused by the factor year.
Figure 5
Figure 5
The t-SNE method applied to all samples results in clusters and sub clusters formed according to the factor year and cultivar, respectively.
Figure 6
Figure 6
The same t-SNE result as in Figure 5, but colored by farming system: clusters representing cultivars form subclusters according to the factor farming system.

Similar articles

Cited by

References

    1. Abdelmoula W. M., Škrášková K., Balluff B., Carreira R. J., Tolner E. A., Lelieveldt B. P. F., et al. (2014). Automatic generic registration of mass spectrometry imaging data to histology using nonlinear stochastic embedding. Anal. Chem. 86, 9204–9211.10.1021/ac502170f - DOI - PubMed
    1. Bonte A., Neuweger H., Goesmann A., Thonar C., Mäder P., Langenkämper G., et al. (2014). Metabolite profiling on wheat grain to enable a distinction of samples from organic and conventional farming systems. J. Sci. Food Agric. 94, 2605–12.10.1002/jsfa.6566 - DOI - PubMed
    1. Breiman L. (2001). Random forests. Mach. Learn. 45, 5–3210.1023/A:1017934522171 - DOI
    1. Bushati N., Smith J., Briscoe J., Watkins C. (2011). An intuitive graphical visualization technique for the interrogation of transcriptome data. Nucleic Acids Res. 39, 7380–7389.10.1093/nar/gkr462 - DOI - PMC - PubMed
    1. Capuano E., Boerrigter-Eenling R., van der Veer G., van Ruth S. M. (2013). Analytical authentication of organic products: an overview of markers. J. Sci. Food Agric. 93, 12–28.10.1002/jsfa.5914 - DOI - PubMed