Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Oct 10;15(10):608.
doi: 10.3390/toxins15100608.

Ensemble Machine Learning of Gradient Boosting (XGBoost, LightGBM, CatBoost) and Attention-Based CNN-LSTM for Harmful Algal Blooms Forecasting

Affiliations

Ensemble Machine Learning of Gradient Boosting (XGBoost, LightGBM, CatBoost) and Attention-Based CNN-LSTM for Harmful Algal Blooms Forecasting

Jung Min Ahn et al. Toxins (Basel). .

Abstract

Harmful algal blooms (HABs) are a serious threat to ecosystems and human health. The accurate prediction of HABs is crucial for their proactive preparation and management. While mechanism-based numerical modeling, such as the Environmental Fluid Dynamics Code (EFDC), has been widely used in the past, the recent development of machine learning technology with data-based processing capabilities has opened up new possibilities for HABs prediction. In this study, we developed and evaluated two types of machine learning-based models for HABs prediction: Gradient Boosting models (XGBoost, LightGBM, CatBoost) and attention-based CNN-LSTM models. We used Bayesian optimization techniques for hyperparameter tuning, and applied bagging and stacking ensemble techniques to obtain the final prediction results. The final prediction result was derived by applying the optimal hyperparameter and bagging and stacking ensemble techniques, and the applicability of prediction to HABs was evaluated. When predicting HABs with an ensemble technique, it is judged that the overall prediction performance can be improved by complementing the advantages of each model and averaging errors such as overfitting of individual models. Our study highlights the potential of machine learning-based models for HABs prediction and emphasizes the need to incorporate the latest technology into this important field.

Keywords: Bayesian optimization; Gradient Boosting; attention-based CNN-LSTM; ensemble techniques; harmful algal blooms.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure A1
Figure A1
Training, and Predicting Procedure using the data.
Figure 1
Figure 1
Results of data correlation analysis.
Figure 2
Figure 2
Schematic diagram of ensemble learning using Gradient Boosting technique.
Figure 3
Figure 3
Results of HABs (cells/mL) predicted via Gradient Boosting technique: (a) XGBoost; (b) LightGBM; (c) CatBoost; (d) bagging ensemble; (e) stacking ensemble.
Figure 3
Figure 3
Results of HABs (cells/mL) predicted via Gradient Boosting technique: (a) XGBoost; (b) LightGBM; (c) CatBoost; (d) bagging ensemble; (e) stacking ensemble.
Figure 4
Figure 4
Attention-based CNN-LSTM training and prediction networks.
Figure 5
Figure 5
Final HABs (cells/mL) predicted by the attention-based CNN-LSTM technique and the ensemble technique: (a) Attention-based CNN-LSTM; (b) Final ensemble.
Figure 6
Figure 6
The study area (The red box is the point where monitoring was performed).

References

    1. Aksoy N., Genc I. Predictive models development using gradient boosting based methods for solar power plants. J. Comput. Sci. 2023;67:101958. doi: 10.1016/j.jocs.2023.101958. - DOI
    1. Chen T., Guestrin C. XGBoost: A Scalable Tree Boosting System; Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; San Francisco, CA, USA. 13–17 August 2016; pp. 785–794. - DOI
    1. Ke G., Meng Q., Finley T., Wang T., Chen W., Ma W., Ye Q., Liu T.-Y. Light GBM: A Highly Efficient Gradient Boosting Decision Tree. [(accessed on 4 April 2023)];Adv. Neural Inf. Process. Syst. 2017 30:3146–3154. Available online: https://proceedings.neurips.cc/paper/2017/hash/6449f44a102fde848669bdd9e....
    1. Prokhorenkova L., Gusev G., Vorobev A., Dorogush A.V., Gulin A. CatBoost: Unbiased boosting with categorical features. [(accessed on 4 April 2023)];Adv. Neural Inf. Process. Syst. 2018 31:6638–6648. Available online: https://proceedings.neurips.cc/paper/2018/hash/83b2d666b98a3b304ce08d057....
    1. Werbos P.J. Backpropagation through time: What it does and how to do it. Proc. IEEE. 1986;78:1550–1560. doi: 10.1109/5.58337. - DOI

Publication types

LinkOut - more resources