Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan;31(1):262-279.
doi: 10.1007/s11356-023-31148-6. Epub 2023 Nov 28.

A novel hybrid model based on two-stage data processing and machine learning for forecasting chlorophyll-a concentration in reservoirs

Affiliations

A novel hybrid model based on two-stage data processing and machine learning for forecasting chlorophyll-a concentration in reservoirs

Wenqing Yu et al. Environ Sci Pollut Res Int. 2024 Jan.

Abstract

The accurate and efficient prediction of chlorophyll-a (Chl-a) concentration is crucial for the early detection of algal blooms in reservoirs. Nevertheless, predicting Chl-a concentration in multivariate time series poses a significant challenge due to the complex interrelationships within the aquatic environment and the discrete and non-stationary nature of online monitoring of water quality data. To address the aforementioned issue, this paper proposes a novel prediction model named SGMD-KPCA-BiLSTM (SKB) for predicting Chl-a concentration. The model combines two-stage data processing and machine learning (ML). To capture nonlinear relationships in multivariate time series data, the optimal data subset is determined by combining symplectic geometry mode decomposition (SGMD) and kernel principal component analysis (KPCA). This subset is then input into a bidirectional long short-term memory (BiLSTM) model, and the model's hyperparameters are optimized using the sparrow search algorithm (SSA) to improve the accuracy of predictions. The performance of the model was evaluated at Qiaodian Reservoir in Shandong, China. To assess its superiority, the evaluation criteria included the root mean square error (RMSE), mean absolute percentage error (MAPE), mean absolute error (MAE), coefficient of determination (R2), frequency histograms of the prediction error, and the Taylor diagram. The prediction performance of five single models, namely the back-propagation (BP) neural network, support vector regression (SVR), long short-term memory (LSTM), convolutional neural network with long short-term memory (CNN-LSTM), and BiLSTM, as well as three hybrid models, namely SGMD-LSTM, SGMD-KPCA-LSTM, and SGMD-BiLSTM, were compared against the SKB model. The results demonstrated that the SKB model performs best in predicting Chl-a concentration (R2 = 96.19%, RMSE = 1.05, MAE = 0.65, MAPE = 0.08). It significantly reduced the prediction error compared to other models for comparison. Furthermore, the multi-step predictive capabilities of the SKB model are also discussed. The analysis shows a decline in predictive performance with larger prediction time steps, and the SKB model exhibits slightly superior performance compared to the other model at corresponding prediction intervals. The model has significant advantages in terms of its ability to accurately predict the non-smooth and nonlinear Chl-a sequences observed by the online monitoring system. This study presents a potential solution for controlling and preventing reservoir eutrophication, as well as an innovative approach for predicting water quality.

Keywords: Bidirectional long short-term memory; Chlorophyll-a; Eutrophication; Kernel principal component analysis; Prediction; Symplectic geometry mode decomposition.

PubMed Disclaimer

References

    1. Alexakis D, Kagalou I, Tsakiris G (2013) Assessment of pressures and impacts on surface water bodies of the Mediterranean. Case study: Pamvotis Lake, Greece. Environ Earth Sci 70:687–698. https://doi.org/10.1007/s12665-012-2152-7 - DOI
    1. Antico A, Schlotthauer G, Torres ME (2014) Analysis of hydroclimatic variability and trends using a novel empiricalmode decomposition: application to the Parana River Basin. J Geophys Res-Atmos 119:1218–1233. https://doi.org/10.1002/2013jd020420 - DOI
    1. Boyer JN, Kelble CR, Ortner PB, Rudnick DT (2009) Phytoplankton bloom status: chlorophyll a biomass as an indicator of water quality condition in the southern estuaries of Florida, USA. Ecol Indic 9:S56–S67. https://doi.org/10.1016/j.ecolind.2008.11.013 - DOI
    1. Carvalho L, Miller CA, Scott EM, Codd GA, Davies PS, Tyler AN (2011) Cyanobacterial blooms: statistical models describing risk factors for national-scale lake assessment and lake management. Sci Total Environ 409:5353–5358. https://doi.org/10.1016/j.scitotenv.2011.09.030 - DOI
    1. Cen HB, Jiang JH, Han GQ, Lin XY, Liu Y, Jia XY et al (2022) Applying deep learning in the prediction of chlorophyll-a in the East China Sea. Remote Sens 14:16. https://doi.org/10.3390/rs14215461 - DOI

LinkOut - more resources