Ensemble Machine Learning of Gradient Boosting (XGBoost, LightGBM, CatBoost) and Attention-Based CNN-LSTM for Harmful Algal Blooms Forecasting
- PMID: 37888638
- PMCID: PMC10611362
- DOI: 10.3390/toxins15100608
Ensemble Machine Learning of Gradient Boosting (XGBoost, LightGBM, CatBoost) and Attention-Based CNN-LSTM for Harmful Algal Blooms Forecasting
Abstract
Harmful algal blooms (HABs) are a serious threat to ecosystems and human health. The accurate prediction of HABs is crucial for their proactive preparation and management. While mechanism-based numerical modeling, such as the Environmental Fluid Dynamics Code (EFDC), has been widely used in the past, the recent development of machine learning technology with data-based processing capabilities has opened up new possibilities for HABs prediction. In this study, we developed and evaluated two types of machine learning-based models for HABs prediction: Gradient Boosting models (XGBoost, LightGBM, CatBoost) and attention-based CNN-LSTM models. We used Bayesian optimization techniques for hyperparameter tuning, and applied bagging and stacking ensemble techniques to obtain the final prediction results. The final prediction result was derived by applying the optimal hyperparameter and bagging and stacking ensemble techniques, and the applicability of prediction to HABs was evaluated. When predicting HABs with an ensemble technique, it is judged that the overall prediction performance can be improved by complementing the advantages of each model and averaging errors such as overfitting of individual models. Our study highlights the potential of machine learning-based models for HABs prediction and emphasizes the need to incorporate the latest technology into this important field.
Keywords: Bayesian optimization; Gradient Boosting; attention-based CNN-LSTM; ensemble techniques; harmful algal blooms.
Conflict of interest statement
The authors declare no conflict of interest.
Figures
References
-
- Aksoy N., Genc I. Predictive models development using gradient boosting based methods for solar power plants. J. Comput. Sci. 2023;67:101958. doi: 10.1016/j.jocs.2023.101958. - DOI
-
- Chen T., Guestrin C. XGBoost: A Scalable Tree Boosting System; Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; San Francisco, CA, USA. 13–17 August 2016; pp. 785–794. - DOI
-
- Ke G., Meng Q., Finley T., Wang T., Chen W., Ma W., Ye Q., Liu T.-Y. Light GBM: A Highly Efficient Gradient Boosting Decision Tree. [(accessed on 4 April 2023)];Adv. Neural Inf. Process. Syst. 2017 30:3146–3154. Available online: https://proceedings.neurips.cc/paper/2017/hash/6449f44a102fde848669bdd9e....
-
- Prokhorenkova L., Gusev G., Vorobev A., Dorogush A.V., Gulin A. CatBoost: Unbiased boosting with categorical features. [(accessed on 4 April 2023)];Adv. Neural Inf. Process. Syst. 2018 31:6638–6648. Available online: https://proceedings.neurips.cc/paper/2018/hash/83b2d666b98a3b304ce08d057....
-
- Werbos P.J. Backpropagation through time: What it does and how to do it. Proc. IEEE. 1986;78:1550–1560. doi: 10.1109/5.58337. - DOI
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Miscellaneous
