Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct 29:9:100913.
doi: 10.1016/j.crfs.2024.100913. eCollection 2024.

Integrating near-infrared hyperspectral imaging with machine learning and feature selection: Detecting adulteration of extra-virgin olive oil with lower-grade olive oils and hazelnut oil

Affiliations

Integrating near-infrared hyperspectral imaging with machine learning and feature selection: Detecting adulteration of extra-virgin olive oil with lower-grade olive oils and hazelnut oil

Derick Malavi et al. Curr Res Food Sci. .

Abstract

Detecting adulteration in extra virgin olive oil (EVOO) is particularly challenging with oils of similar chemical composition. This study applies near-infrared hyperspectral imaging (NIR-HSI) and machine learning (ML) to detect EVOO adulteration with hazelnut, refined olive, and olive pomace oils at various concentrations (1%, 5%, 10%, 20%, 40%, and 100% m/m). Savitzky-Golay filtering, first and second derivatives, multiplicative scatter correction (MSC), standard normal variate (SNV), and their combinations were used to preprocess the spectral data, with Principal Component Analysis (PCA) reducing dimensionality. Classification was performed using Partial Least Squares-Discriminant Analysis (PLS-DA) and ML algorithms, including k-Nearest Neighbors (k-NN), Naïve Bayes, Random Forest (RF), Support Vector Machine (SVM), and Artificial Neural Networks (ANN). PLS-DA, k-NN, RF, SVM, NB, and ANN models achieved accuracy rates of 97.0-99.0%, 96.2-100%, 96.5-100%, 98.6-99.5%, 93.9-99.7%, and 99.2-100%, respectively, in discriminating between pure EVOO, adulterants, and adulterated oils. PLS-DA, RF, SVM, and ANN significantly outperformed Naïve Bayes (p < 0.05) in binary classification, with Matthews correlation coefficient (MCC) values exceeding 0.90. All the binary classifiers except Naïve Bayes, when coupled with SNV/MSC, Savitzky-Golay smoothing and derivatives, consistently achieved perfect scores (1.0) for accuracy, sensitivity, specificity, F1 score, precision, and MCC in distinguishing pure EVOO from adulterated oils. No significant differences (p > 0.05) in model performance were found between those using full spectra and those based on key variable selection. However, PLS-DA and ANN significantly outperformed k-NN, RF, and SVM (p < 0.05), with MCC values ranging from 0.95 to 1.00, indicating superior classification performance. These findings demonstrate that combining NIR-HSI with machine learning, along with key variable selection, potentially offers an effective, non-destructive solution for detecting adulteration in EVOO and combating fraud in the olive oil industry.

Keywords: Adulteration; Authentication; Classification models; Extra-virgin olive oil (EVOO); Machine learning; Variable selection.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Image 1
Graphical abstract
Fig. 1
Fig. 1
Schematic sampling experimental design.
Fig. 2
Fig. 2
Averaged hyperspectral imaging raw spectra extra virgin olive oil (EVOO), edible oil adulterants, and adulterated olive oils.
Fig. 3
Fig. 3
PCA scores plots illustrate the grouping distribution of EVOO, edible oil adulterants, and adulterated olive oil using unprocessed spectra and different sets of preprocessed spectral data.
Fig. 4
Fig. 4
(a). The optimum number of latent variables for the MSC + SG + 2nd derivative - PLS-DA model used for binary classification and (b) is the optimum number of latent variables for the MSC + SG+2nd derivative-PLS-DA model for multi-class classification (7 classes). The simplest optimal model (represented by the black dotted line) is selected based on the 'one standard error rule,' meaning it falls within one standard error of the highest accuracy model (represented by the red dotted line). (c) A confusion matrix table showing correct classification and misclassifications by PLS-DA + raw spectra data; (e) PLS-DA plot indicating misclassification of some adulterants as EVOO with data pre-processed by SG smoothing in cross-validation; (f) A confusion matrix indicating perfect classification with one of the binary PLS-DA classification models. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)
Fig. 5
Fig. 5
(a) Selection of optimal k-nearest neighbors by cross-validation and oneSE rule. The simplest optimal k-NN model (represented by the black dotted line) is selected based on the 'one standard error rule,' meaning it falls within one standard error of the highest accuracy model (represented by the red dotted line). Confusion matrixes showing misclassification by (b) ‘seven-class’ k-NN model and Savitzky-Golay data, (c) ‘three-class’ k-NN model using unprocessed spectra and (d) perfect classification by ‘two-class’ k-NN model coupled with MSC + SG+2nd derivative data. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)
Fig. 6
Fig. 6
(a) Model selection based on 10-fold cross-validation and oneSE rule. The simplest optimal model (represented by the blue dotted line) is selected based on the 'one standard error rule,' meaning it falls within one standard error of the highest accuracy model (represented by the red dotted line). (b) Number of trees and out-of-bag error (OOB) for RF model classifier; the red line indicates how often the model incorrectly predicts the ‘EVOO’ class while the black line reflects the frequency of incorrect predictions for the 'Adulterated' class. Confusion matrices showing correct and incorrect classifications: (c) RF-Savitzky-Golay and (d) RF-MSC + SG+2nd derivative preprocessing model for seven-class classification; (e) RF-unprocessed spectra model for three-class discrimination; and (f) RF-SNV model for binary classification. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)
Fig. 7
Fig. 7
Performance heat map displaying key features selected by models built from spectral data preprocessed with SNV + Savitzky-Golay + second derivative.
Fig. 8
Fig. 8
Box Plots demonstrating MCC values based on model type and spectral preprocessing techniques. The symbol formula image on each plot indicates the mean MCC value.
Fig. 9
Fig. 9
Pixel-based classification maps showing the detection of EVOO adulteration using Spectral Angle Mapper (SAM) and Spectral Information Divergence (SID) at adulteration levels of 0%, 1%, 5%, 10%, 20%, 40%, and 100%. Blue represents pure EVOO, while red, pink, and brown indicate the presence of adulterants (hazelnut, olive pomace, and refined olive oils, respectively). The progression from blue to adulterant colors illustrates how the classifiers detect increasing levels of adulteration. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

References

    1. Ai F.F., Bin J., Zhang Z.M., Huang J.H., Wang J.B., Liang Y.Z., Yu L., Yang Z.Y. Application of random forests to select premium quality vegetable oils by their fatty acid composition. Food Chem. 2014;143:472–478. doi: 10.1016/j.foodchem.2013.08.013. - DOI - PubMed
    1. Aparicio R., Aparicio-Ruíz R. Authentication of vegetable oils by chromatographic techniques. J. Chromatogr. A. 2000;881(1–2):93–104. doi: 10.1016/S0021-9673(00)00355-1. - DOI - PubMed
    1. Aqeel M., Sohaib A., Iqbal M., Rehman H.U., Rustam F. Hyperspectral identification of oil adulteration using machine learning techniques. Curr. Res. Food Sci. 2024;8(February) doi: 10.1016/j.crfs.2024.100773. - DOI - PMC - PubMed
    1. Arlorio M., Coisson J.D., Bordiga M., Travaglia F., Garino C., Zuidmeer L., van Ree R., Giuffrida M.G., Conti A., Martelli A. Olive oil adulterated with hazelnut oils: simulation to identify possible risks to allergic consumers. Food Addit. Contam. 2010;27(1):11–18. doi: 10.1080/02652030903225799. - DOI - PubMed
    1. Aroca-Santos R., Cancilla J.C., Pariente E.S., Torrecilla J.S. Neural networks applied to characterize blends containing refined and extra virgin olive oils. Talanta. 2016;161:304–308. doi: 10.1016/j.talanta.2016.08.033. - DOI - PubMed

LinkOut - more resources