Comparison of individual and ensemble machine learning models for prediction of sulphate levels in untreated and treated Acid Mine Drainage

doi:10.1007/s10661-024-12467-8

. 2024 Mar 2;196(4):332.

doi: 10.1007/s10661-024-12467-8.

Comparison of individual and ensemble machine learning models for prediction of sulphate levels in untreated and treated Acid Mine Drainage

Taskeen Hasrod¹, Yannick B Nuapia², Hlanganani Tutu³

Affiliations

¹ Molecular Sciences Institute, School of Chemistry, University of the Witwatersrand, Private Bag X3, Johannesburg, 2050, South Africa.
² Pharmacy Department, School of Healthcare Sciences, University of Limpopo, Turfloop Campus, Polokwane, 0727, South Africa.
³ Molecular Sciences Institute, School of Chemistry, University of the Witwatersrand, Private Bag X3, Johannesburg, 2050, South Africa. hlanganani.tutu@wits.ac.za.

PMID: 38429461
PMCID: PMC10907470
DOI: 10.1007/s10661-024-12467-8

Comparison of individual and ensemble machine learning models for prediction of sulphate levels in untreated and treated Acid Mine Drainage

Taskeen Hasrod et al. Environ Monit Assess. 2024.

. 2024 Mar 2;196(4):332.

doi: 10.1007/s10661-024-12467-8.

Authors

Taskeen Hasrod¹, Yannick B Nuapia², Hlanganani Tutu³

Affiliations

¹ Molecular Sciences Institute, School of Chemistry, University of the Witwatersrand, Private Bag X3, Johannesburg, 2050, South Africa.
² Pharmacy Department, School of Healthcare Sciences, University of Limpopo, Turfloop Campus, Polokwane, 0727, South Africa.
³ Molecular Sciences Institute, School of Chemistry, University of the Witwatersrand, Private Bag X3, Johannesburg, 2050, South Africa. hlanganani.tutu@wits.ac.za.

PMID: 38429461
PMCID: PMC10907470
DOI: 10.1007/s10661-024-12467-8

Abstract

Machine learning was used to provide data for further evaluation of potential extraction of octathiocane (S₈), a commercially useful by-product, from Acid Mine Drainage (AMD) by predicting sulphate levels in an AMD water quality dataset. Individual ML regressor models, namely: Linear Regression (LR), Least Absolute Shrinkage and Selection Operator (LASSO), Ridge (RD), Elastic Net (EN), K-Nearest Neighbours (KNN), Support Vector Regression (SVR), Decision Tree (DT), Extreme Gradient Boosting (XGBoost), Random Forest (RF), Multi-Layer Perceptron Artificial Neural Network (MLP) and Stacking Ensemble (SE-ML) combinations of these models were successfully used to predict sulphate levels. A SE-ML regressor trained on untreated AMD which stacked seven of the best-performing individual models and fed them to a LR meta-learner model was found to be the best-performing model with a Mean Squared Error (MSE) of 0.000011, Mean Absolute Error (MAE) of 0.002617 and R² of 0.9997. Temperature (°C), Total Dissolved Solids (mg/L) and, importantly, iron (mg/L) were highly correlated to sulphate (mg/L) with iron showing a strong positive linear correlation that indicated dissolved products from pyrite oxidation. Ensemble learning (bagging, boosting and stacking) outperformed individual methods due to their combined predictive accuracies. Surprisingly, when comparing SE-ML that combined all models with SE-ML that combined only the best-performing models, there was only a slight difference in model accuracies which indicated that including bad-performing models in the stack had no adverse effect on its predictive performance.

Keywords: Acid Mine Drainage; Environmental chemistry; Machine learning; Regression; Stacking ensemble machine learning; Sulphate.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1**
Scatter matrix (below main diagonal) and Pearson’s correlation matrix (above main diagonal) of Pump A indicate the interrelationships between water quality parameters. Diagonal histogram and density plots indicate the distribution of each parameter. For the correlation matrix, red circles are positive correlations, blue circles are negative correlations and larger circles indicate more strongly correlated variables

**Fig. 2**
Dimensionality reduction and feature extraction results obtained from PCA for Pump A, a Bi-plot indicating the clustering of individual observations and its relation to the loadings plot. b Expanded view of the loadings plot indicating the relationship between parameters. c Elbow method plot indicating the optimal number of clusters. d Scree plot indicating the amount of variance explained by each principal component

**Fig. 3**
Regression algorithm NMSE comparison for Pump A showing a all individual baseline models, b all models and a stacking regressor containing all the models and c all well-performing models and a stacking regressor containing all the best-performing models

**Fig. 4**
Testing statistics (MSE, MAE and R²) accuracy comparison of regression models trained on Pump A. a, b Stacking regressor using all models. c, d Stacking regressor using only the best-performing models

**Fig. 5**
Scatter matrix (below main diagonal) and Pearson’s correlation matrix (above main diagonal) of Pump B indicate the interrelationships between water quality parameters. Diagonal histogram and density plots indicate the distribution of each parameter. For the correlation matrix, red circles are positive correlations, blue circles are negative correlations and larger circles indicate more strongly correlated variables

**Fig. 6**
Dimensionality reduction and feature extraction results obtained from PCA for Pump B. a Bi-plot indicating the clustering of individual observations and its relation to the loadings plot. b Expanded view of the loadings plot indicating the relationship between parameters. c Elbow method plot indicating the optimal number of clusters. d Scree plot indicating the amount of variance explained by each principal component

**Fig. 7**
Regression algorithm NMSE comparison for Pump B showing a all individual baseline models, b all models and a stacking regressor containing all the models and c all good-performing models and a stacking regressor containing all the best-performing models

**Fig. 8**
Testing statistics (MSE, MAE and R²) accuracy comparison of regression models trained on Pump B. a, b Stacking regressor using all models. c, d Stacking regressor using only the best-performing models

**Fig. 9**
Scatter matrix (below main diagonal) and Pearson’s correlation matrix (above main diagonal) of Treated Water indicate the interrelationships between water quality parameters. Diagonal histogram and density plots indicate the distribution of each parameter. For the correlation matrix, red circles are positive correlations, blue circles are negative correlations and larger circles indicate more strongly correlated variables

**Fig. 10**
Dimensionality reduction and feature extraction results obtained from PCA for the Treated Water. a Bi-plot indicating the clustering of individual observations and its relation to the loadings plot. b Expanded view of the loadings plot indicating the relationship between parameters. c Elbow method plot indicating the optimal number of clusters. d Scree plot indicating the amount of variance explained by each principal component

**Fig. 11**
Regression algorithm NMSE comparison for Treated Water showing a all individual baseline models, b all models and a stacking regressor containing all the models and c all good-performing models and a stacking regressor containing all the best-performing models

**Fig. 12**
Testing statistics (MSE, MAE and R²) accuracy comparison of regression models trained on Treated Water. a, b Stacking regressor using all models. c, d Stacking regressor using only the best-performing models

See this image and copyright information in PMC

Cited by

An online explainable ensemble machine learning model for predicting epidermal growth factor receptor mutation status in lung adenocarcinoma.
Song Q, Li X, Song B, Zhang T, Hu X, Li A, Ma D, Min X, Yu Y. Song Q, et al. Transl Lung Cancer Res. 2025 Jul 31;14(7):2670-2687. doi: 10.21037/tlcr-2025-237. Epub 2025 Jul 28. Transl Lung Cancer Res. 2025. PMID: 40799429 Free PMC article.

References

1. Alzubi J, Nayyar A, Kumar A. Machine learning from theory to algorithms: An overview. Journal of Physics: Conference Series. 2018;1142:012012. doi: 10.1088/1742-6596/1142/1/012012. - DOI
1. Arora, S., & Keshari, A. K. (2023). Implementing machine learning algorithm to model reaeration coefficient of urbanized rivers. Environmental Modeling & Assessment10.1007/s10666-023-09895-0
1. Awad, M., & Khanna, R. (2015). Support vector regression. In Efficient Learning Machines (67–80). Berkeley, CA: Apress. 10.1007/978-1-4302-5990-9_4
1. Betrie GD, Tesfamariam S, Morin KA, Sadiq R. Predicting copper concentrations in acid mine drainage: A comparative analysis of five machine learning techniques. Environmental Monitoring and Assessment. 2013;185(5):4171–4182. doi: 10.1007/s10661-012-2859-7. - DOI - PubMed
1. Betrie GD, Sadiq R, Morin KA, Tesfamariam S. Uncertainty quantification and integration of machine learning techniques for predicting acid rock drainage chemistry: A probability bounds approach. Science of the Total Environment. 2014;490:182–190. doi: 10.1016/j.scitotenv.2014.04.125. - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

[1] Alzubi J, Nayyar A, Kumar A. Machine learning from theory to algorithms: An overview. Journal of Physics: Conference Series. 2018;1142:012012. doi: 10.1088/1742-6596/1142/1/012012. - DOI

[2] Alzubi J, Nayyar A, Kumar A. Machine learning from theory to algorithms: An overview. Journal of Physics: Conference Series. 2018;1142:012012. doi: 10.1088/1742-6596/1142/1/012012. - DOI

[3] Arora, S., & Keshari, A. K. (2023). Implementing machine learning algorithm to model reaeration coefficient of urbanized rivers. Environmental Modeling & Assessment10.1007/s10666-023-09895-0

[4] Arora, S., & Keshari, A. K. (2023). Implementing machine learning algorithm to model reaeration coefficient of urbanized rivers. Environmental Modeling & Assessment10.1007/s10666-023-09895-0

[5] Awad, M., & Khanna, R. (2015). Support vector regression. In Efficient Learning Machines (67–80). Berkeley, CA: Apress. 10.1007/978-1-4302-5990-9_4

[6] Awad, M., & Khanna, R. (2015). Support vector regression. In Efficient Learning Machines (67–80). Berkeley, CA: Apress. 10.1007/978-1-4302-5990-9_4

[7] Betrie GD, Tesfamariam S, Morin KA, Sadiq R. Predicting copper concentrations in acid mine drainage: A comparative analysis of five machine learning techniques. Environmental Monitoring and Assessment. 2013;185(5):4171–4182. doi: 10.1007/s10661-012-2859-7. - DOI - PubMed

[8] Betrie GD, Tesfamariam S, Morin KA, Sadiq R. Predicting copper concentrations in acid mine drainage: A comparative analysis of five machine learning techniques. Environmental Monitoring and Assessment. 2013;185(5):4171–4182. doi: 10.1007/s10661-012-2859-7. - DOI - PubMed

[9] Betrie GD, Sadiq R, Morin KA, Tesfamariam S. Uncertainty quantification and integration of machine learning techniques for predicting acid rock drainage chemistry: A probability bounds approach. Science of the Total Environment. 2014;490:182–190. doi: 10.1016/j.scitotenv.2014.04.125. - DOI - PubMed

[10] Betrie GD, Sadiq R, Morin KA, Tesfamariam S. Uncertainty quantification and integration of machine learning techniques for predicting acid rock drainage chemistry: A probability bounds approach. Science of the Total Environment. 2014;490:182–190. doi: 10.1016/j.scitotenv.2014.04.125. - DOI - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Comparison of individual and ensemble machine learning models for prediction of sulphate levels in untreated and treated Acid Mine Drainage

Affiliations

Comparison of individual and ensemble machine learning models for prediction of sulphate levels in untreated and treated Acid Mine Drainage

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous