Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Apr 23;10(28):16245-16253.
doi: 10.1039/d0ra00922a.

An efficient variable selection method based on random frog for the multivariate calibration of NIR spectra

Affiliations

An efficient variable selection method based on random frog for the multivariate calibration of NIR spectra

Jingjing Sun et al. RSC Adv. .

Abstract

Variable selection is a critical step for spectrum modeling. In this study, a new method of variable interval selection based on random frog (RF), known as Interval Selection based on Random Frog (ISRF), is developed. In the ISRF algorithm, RF is used to search the most likely informative variables and then, a local search is applied to expand the interval width of the informative variables. Through multiple runs and visualization of the results, the best informative interval variables are obtained. This method was tested on three near infrared (NIR) datasets. Four variable selection methods, namely, genetic algorithm PLS (GA-PLS), random frog, interval random frog (iRF) and interval variable iterative space shrinkage approach (iVISSA) were used for comparison. The results show that the proposed method is very efficient to find the best interval variables and improve the model's prediction performance and interpretation.

PubMed Disclaimer

Conflict of interest statement

There are no conflicts to declare.

Figures

Fig. 1
Fig. 1. The RMSECV of the union of the top ranked wavelengths from 1st to last (175th) on the soy dataset. The top 31 wavelengths are the optimal wavelengths with the lowest RMSECV on the calibration set.
Fig. 2
Fig. 2. Wavelengths selected by different methods on the soy dataset. (A) Original spectra, (B) GA-PLS, (C) RF, (D) iRF, (E) iVISSA and (F) ISRF.
Fig. 3
Fig. 3. Wavelengths selected by different methods on the corn dataset. (A) Original spectra, (B) GA-PLS, (C) RF, (D) iRF, (E) iVISSA and (F) ISRF.
Fig. 4
Fig. 4. The RMSECV of the union of the top ranked wavelengths from 1st to last (175th) on the wheat dataset. The top 22 wavelengths are the optimal wavelengths with the lowest RMSECV on the calibration set.
Fig. 5
Fig. 5. Wavelengths selected by different methods on the wheat dataset. (A) Original spectra, (B) GA-PLS, (C) RF, (D) iRF, (E) iVISSA and (F) ISRF.
Fig. 6
Fig. 6. Wavelengths selected by ISRF on three datasets. (A) Soy dataset, (B) corn dataset, and (C) wheat dataset. The intervals marked by black blocks are the final variable composition.

References

    1. Pasquini C. J. Braz. Chem. Soc. 2003;14:198–219. doi: 10.1590/S0103-50532003000200006. - DOI
    1. Stenberg B., Rossel R. A. V., Mouazen A. M. and Wetterlind J., in Advances in agronomy, Elsevier, 2010, vol. 107, pp. 163–215
    1. Sans S. Ferré J. Boqué R. Sabaté J. Casals J. Simó J. Food Chem. 2018;262:178–183. doi: 10.1016/j.foodchem.2018.04.102. - DOI - PubMed
    1. Gredilla A. de Vallejuelo S. F.-O. Elejoste N. de Diego A. Madariaga J. M. TrAC, Trends Anal. Chem. 2016;76:30–39. doi: 10.1016/j.trac.2015.11.011. - DOI
    1. Candes E. Tao T. Annals of Statistics. 2007;35:2313–2351. doi: 10.1214/009053606000001523. - DOI