Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Sep 3;20(17):5001.
doi: 10.3390/s20175001.

Robust Wavelength Selection Using Filter-Wrapper Method and Input Scaling on Near Infrared Spectral Data

Affiliations

Robust Wavelength Selection Using Filter-Wrapper Method and Input Scaling on Near Infrared Spectral Data

Divo Dharma Silalahi et al. Sensors (Basel). .

Abstract

The extraction of relevant wavelengths from a large dataset of Near Infrared Spectroscopy (NIRS) is a significant challenge in vibrational spectroscopy research. Nonetheless, this process allows the improvement in the chemical interpretability by emphasizing the chemical entities related to the chemical parameters of samples. With the complexity in the dataset, it may be possible that irrelevant wavelengths are still included in the multivariate calibration. This yields the computational process to become unnecessary complex and decreases the accuracy and robustness of the model. In multivariate analysis, Partial Least Square Regression (PLSR) is a method commonly used to build a predictive model from NIR spectral data. However, in the PLSR method and common commercial chemometrics software, there is no standard wavelength selection procedure applied to screen the irrelevant wavelengths. In this study, a new robust wavelength selection procedure called the modified VIP-MCUVE (mod-VIP-MCUVE) using Filter-Wrapper method and input scaling strategy is introduced. The proposed method combines the modified Variable Importance in Projection (VIP) and modified Monte Carlo Uninformative Variable Elimination (MCUVE) to calculate the scale matrix of the input variable. The modified VIP uses the orthogonal components of Partial Least Square (PLS) in investigating the informative variable in the model by applying the amount of variation both in X and y{SSX,SSY}, simultaneously. The modified MCUVE uses a robust reliability coefficient and a robust tolerance interval in the selection procedure. To evaluate the superiority of the proposed method, the classical VIP, MCUVE, and autoscaling procedure in classical PLSR were also included in the evaluation. Using artificial data with Monte Carlo simulation and NIR spectral data of oil palm (Elaeis guineensis Jacq.) fruit mesocarp, the study shows that the proposed method offers advantages to improve model interpretability, to be computationally extensive, and to produce better model accuracy.

Keywords: near infrared spectral data; partial least squares; robust statistics; scaling; uninformative variable eliminations; variable importance in projection; variable selection.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Global minimum cross-validation for the optimum number of PLS components on different dataset scenarios.
Figure 1
Figure 1
Global minimum cross-validation for the optimum number of PLS components on different dataset scenarios.
Figure 2
Figure 2
Comparison of the selected relevant variables based on the cut-off criteria in variable selection methods using different dataset scenarios.
Figure 2
Figure 2
Comparison of the selected relevant variables based on the cut-off criteria in variable selection methods using different dataset scenarios.
Figure 3
Figure 3
Time-consuming performances between methods during the fitting process using different dataset scenarios (n = number of samples, m = number of predictors, IV = number of important variables).
Figure 4
Figure 4
Twelve sampling positions for fruit mesocarp samples of an oil palm fresh fruit bunch.
Figure 5
Figure 5
NIR spectra on oil palm fruit mesocarp: (a) fresh mesocarp, (b) dried ground mesocarp.
Figure 6
Figure 6
Frequency distribution on response variable: %ODM (red), %OWM (green), and %FFA blue).
Figure 7
Figure 7
Comparison of selected wavelengths from different wavelength selection methods using spectral data of fresh fruit mesocarp on the %ODM.
Figure 8
Figure 8
Comparison of selected wavelengths from different wavelength selection methods using spectral data of fresh fruit mesocarp on the %OWM.
Figure 9
Figure 9
Comparison of selected wavelengths from different wavelength selection methods using spectral data of ground dried mesocarp on the %FFA.

References

    1. Schowengerdt R.A. Remote Sensing Models and Methods for Image Processing. Academic Press; Cambridge, MA, USA: 1997.
    1. Hourant P., Baeten V., Morales M.T., Meurens M., Aparicio R. Oil and Fat Classification by Selected Bands of Near-Infrared Spectroscopy. Appl. Spectrosc. 2000;54:1168–1174. doi: 10.1366/0003702001950733. - DOI
    1. Kasemsumran S., Thanapase W., Punsuvon V., Ozaki Y. A Feasibility Study on Non-Destructive Determination of Oil Content in Palm Fruits by Visible–Near Infrared Spectroscopy. J. Near Infrared Spectrosc. 2012;20:687–694. doi: 10.1255/jnirs.1025. - DOI
    1. Chong I.-G., Jun C.-H. Performance of some variable selection methods when multicollinearity is present. Chemom. Intell. Lab. Syst. 2005;78:103–112. doi: 10.1016/j.chemolab.2004.12.011. - DOI
    1. Saeys Y., Inza I., Larranaga P. A review of feature selection techniques in bioinformatics. Bioinformatics. 2007;23:2507–2517. doi: 10.1093/bioinformatics/btm344. - DOI - PubMed

LinkOut - more resources