Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec;31(57):65464-65480.
doi: 10.1007/s11356-024-35481-2. Epub 2024 Nov 25.

Stacked Ensemble with Machine Learning Regressors on Optimal Features (SMOF) of hyperspectral sensor PRISMA for inland water turbidity prediction

Affiliations

Stacked Ensemble with Machine Learning Regressors on Optimal Features (SMOF) of hyperspectral sensor PRISMA for inland water turbidity prediction

Rajarshi Bhattacharjee et al. Environ Sci Pollut Res Int. 2024 Dec.

Abstract

Leveraging hyperspectral data across various domains yields substantial benefits, yet managing many spectral bands and identifying the essential ones poses a formidable challenge. This study identifies the most relevant bands within a hyperspectral data cube for turbidity prediction in inland water. Nine machine learning regressors Cat Boost, Decision Trees, Extra Trees, Gradient Boost, Light Gradient Boost (LightGBM), Recursive Feature Elimination (RFE), Random Forest, Support Vector Regressor (SVR), and Xtreme Gradient Boost (XGBoost) have been used to compute the feature importance of the hyperspectral bands for predicting turbidity. Random Forest has outperformed the other models with a mean absolute percentage error (MAPE) of 1.61%, and the R2 of the linear fit is 0.96. Band 77, with a central wavelength of 1067.61 nm, is the most dominating band regarding feature importance. We have also developed a novel framework for turbidity prediction: Stacked Ensemble with Machine Learning Regressors on Optimal Features (SMOF). It employs a stacking ensemble of the nine regressors mentioned above with Random Forest as both base and meta-model, leveraging feature selection outputs. With this framework, the MAPE (%) reached 1.21, while the R2 stood at 0.95. The present study also presents a simple statistical algorithm to detect noisy bands in the Hyperspectral Precursor of the Application Mission (PRISMA) data cube. The approach assesses quadrat-wise intra-band spatial coherence using Renyi's entropy thresholding for noisy band segregation. Radiometric calibration error and absorption due to water vapour are the two primary sources of noise within the data cube. Moreover, this research implements the open-source Water Colour Simulator (WASI) to simulate inland water spectra with varied proportions of turbidity. Overall, the study presents an approach to identify noisy bands and integrates the potential wavelengths for turbidity prediction of inland waters.

Keywords: Ensemble stacking; Feature selection; Image fusion; Machine learning; Spectral noise; Water quality.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethical approval: Not applicable. Consent to participate: Not applicable. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

References

    1. Adler AI, Painsky A (2022) Feature importance in gradient boosting trees with cross-validation feature selection. Entropy 24(5):687 - DOI
    1. Agjee NEH, Mutanga O, Peerbhay K, Ismail R (2018) The impact of simulated spectral noise on random forest and oblique random forest classification performance. J Spectrosc 2018(1):8316918
    1. Iqbal MRA, Rahman S, Nabil SI, Chowdhury IUA (2012) Knowledge based decision tree construction with feature importance domain knowledge. 2012 7th International Conference on Electrical and Computer Engineering 659–662. https://doi.org/10.1109/ICECE.2012.6471636
    1. Alfian G, Syafrudin M, Fahrurrozi I, Fitriyani NL, Atmaji FTD, Widodo T, Rhee J (2022) Predicting breast cancer from risk factors using SVM and extra-trees-based feature selection method. Computers 11(9):136 - DOI
    1. Bharati L, Lacombe G, Gurung P, Jayakody P, Hoanh CT, Smakhtin V (2011) The impacts of water infrastructure and climate change on the hydrology of the Upper Ganges River. https://doi.org/10.5337/2011.210

MeSH terms

LinkOut - more resources