. 2024 Nov 1:7:1455331.

doi: 10.3389/frai.2024.1455331. eCollection 2024.

Enhancing random forest predictive performance for foot and mouth disease outbreaks in Uganda: a calibrated uncertainty prediction approach for varying distributions

Affiliations

¹ Department of Information Technology, College of Computing and Information Sciences, Makerere University, Kampala, Uganda.
² Department of Vaccinology, National Livestock Resources Research Institute, Kampala, Uganda.
³ African Center of Excellence in Bioinformatics (ACE-B), Makerere University, Kampala, Uganda.
⁴ Department of Computer Science, College of Computing and Information sciences, Makerere University, Kampala, Uganda.
⁵ College of Veterinary Medicine, Animal Resources and Bio-security, Makerere University, Kampala, Uganda.
⁶ College of Business and Management Science, Makerere University, Kampala, Uganda.

PMID: 39554990
PMCID: PMC11564173
DOI: 10.3389/frai.2024.1455331

Enhancing random forest predictive performance for foot and mouth disease outbreaks in Uganda: a calibrated uncertainty prediction approach for varying distributions

Geofrey Kapalaga et al. Front Artif Intell. 2024.

. 2024 Nov 1:7:1455331.

doi: 10.3389/frai.2024.1455331. eCollection 2024.

Authors

Affiliations

¹ Department of Information Technology, College of Computing and Information Sciences, Makerere University, Kampala, Uganda.
² Department of Vaccinology, National Livestock Resources Research Institute, Kampala, Uganda.
³ African Center of Excellence in Bioinformatics (ACE-B), Makerere University, Kampala, Uganda.
⁴ Department of Computer Science, College of Computing and Information sciences, Makerere University, Kampala, Uganda.
⁵ College of Veterinary Medicine, Animal Resources and Bio-security, Makerere University, Kampala, Uganda.
⁶ College of Business and Management Science, Makerere University, Kampala, Uganda.

PMID: 39554990
PMCID: PMC11564173
DOI: 10.3389/frai.2024.1455331

Abstract

Foot-and-mouth disease poses a significant threat to both domestic and wild cloven-hoofed animals, leading to severe economic losses and jeopardizing food security. While machine learning models have become essential for predicting foot-and-mouth disease outbreaks, their effectiveness is often compromised by distribution shifts between training and target datasets, especially in non-stationary environments. Despite the critical impact of these shifts, their implications in foot-and-mouth disease outbreak prediction have been largely overlooked. This study introduces the Calibrated Uncertainty Prediction approach, designed to enhance the performance of Random Forest models in predicting foot-and-mouth disease outbreaks across varying distributions. The Calibrated Uncertainty Prediction approach effectively addresses distribution shifts by calibrating uncertain instances for pseudo-label annotation, allowing the active learner to generalize more effectively to the target domain. By utilizing a probabilistic calibration model, Calibrated Uncertainty Prediction pseudo-annotates the most informative instances, refining the active learner iteratively and minimizing the need for human annotation and outperforming existing methods known to mitigate distribution shifts. This reduces costs, saves time, and lessens the dependence on domain experts while achieving outstanding predictive performance. The results demonstrate that Calibrated Uncertainty Prediction significantly enhances predictive performance in non-stationary environments, achieving an accuracy of 98.5%, Area Under the Curve of 0.842, recall of 0.743, precision of 0.855, and an F1 score of 0.791. These findings underscore Calibrated Uncertainty Prediction's ability to overcome the vulnerabilities of existing ML models, offering a robust solution for foot-and-mouth disease outbreak prediction and contributing to the broader field of predictive modeling in infectious disease management.

Keywords: calibrated uncertainty prediction; distribution shifts; foot-and-mouth disease; performance improvement rates; random forest.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**Figure 1**
Model accuracy degradation under varying distributions. RF, random forest; SVM, support vector machine; kNN, k-nearest neighbors; GBM, gradient boosting machine; AdaBoost, adaptive boost; LR, logistic regression; CART, classification and regression tree.

**Figure 2**
Variability in rainfall **(A)** and max temperature **(B)** features, highlighting varying distribution (Kapalaga et al., 2024).

**Figure 3**
A general framework for handling distribution shifts in ML. ACC, accuracy; AUC, area under curve; PR, precision; DS, distribution shifts; ML, machine learning.

**Figure 4**
Experimental design to guide the CUP development and evaluation. RF, random forest; CUP, calibrated uncertainty prediction; FMD, foot-and-mouth disease.

**Figure 5**
Map of Uganda showing districts affected by FMD outbreaks between 2011 and 2022.

**Figure 6**
Prevalence of FMD outbreaks by district.

**Figure 7**
Map of Uganda with purposively selected study districts.

**Figure 8**
Model Performances across Oversampling Techniques with balanced dataset (Kapalaga et al., 2024). RF, random forest; SVM, support vector machine; kNN, k-nearest neighbors; GBM, gradient boosting machine; AdaBoost, adaptive boost; LR, logistic regression; CART, classification and regression tree; SMOTE, synthetic minority over-sampling technique; ADASYN, adaptive synthetic sampling.

**Figure 9**
Comparative model performance across oversampling techniques. RF, random forest; SVM, support vector machine; kNN, k-nearest neighbors; GBM, gradient boosting machine; AdaBoost, adaptive boost; LR, logistic regression; CART, classification and regression tree; SMOTE, synthetic minority over-sampling technique; ADASYN, adaptive synthetic sampling.

**Figure 10**
Visual overview of the CUP approach. ACC, accuracy; AUC, area under curve; PR, precision, $L_{0}$ , training dataset for training initial active learner ( $A$ ); $U$ , validation dataset; $Q_{0}$ , queried uncertainty samples; $L_{1}$ , dataset for training model calibrator ( $M$ ); $Q_{1}$ , pseudo-labeled uncertainty samples.

**Figure 11**
Workflow for evaluating performance of CUP with existing methods. AUC of ROC, area under curve of receiver operating characteristic; RF, random forest; DWL, dynamic weighted learning, STar, select TARgets; LAAL-ELM, less annotated active learning extreme learning machine; RLLS, regularized learning under label shifts; CUP, calibrated uncertainty prediction; FMD, foot-and-mouth disease.

**Figure 12**
Visual overview of the iterative probabilistic calibration process applied to uncertainty instances. **(A)** Depicts the distribution of uncertainty samples before calibration, while **(B–F)** illustrate the status of uncertainty examples after iterative calibrations. **(G)** Showcases a scenario where uncertainty samples are perfectly calibrated.

**Figure 13**
Performance improvement with the CUP approach. AUC, area under curve.

**Figure 14**
Comparative performance analysis of various methods. ACC, accuracy, AUC, area under curve, RF, random forest; DWL, dynamic weighted learning; STar, select TARgets; LAAL-ELM, less annotated active learning extreme learning machine; RLLS, regularized learning under label shifts; CUP, calibrated uncertainty prediction.

**Figure 15**
Weighted average performance of evaluated methods. RF, random forest; DWL, dynamic weighted learning; STar, select TARgets; LAAL-ELM, less annotated active learning extreme learning machine; RLLS, regularized learning under label shifts; CUP, calibrated uncertainty prediction.

See this image and copyright information in PMC

References

1. Aghaei S., Gómez A., Vayanos P. (2021). Strong optimal classification trees. arXiv:2103.15965. doi: 10.48550/arXiv.2103.15965 - DOI
1. Amrani H. (2021). Model-centric and data-centric AI for personalization in human activity recognition. Thesis for: Master's Degree of Computer Science. doi: 10.13140/RG.2.2.16280.72965 - DOI
1. Antoniou A., Storkey A., Edwards H. (2017). Data augmentation generative adversarial networks. arXiv:1711.04340. doi: 10.48550/arXiv.1711.04340 - DOI
1. Arazo E., Ortego D., Albert P., O’Connor N. E., McGuinness K. (2020). Pseudo-labeling and confirmation bias in deep semi-supervised learning. arXiv, 1–8. doi: 10.48550/arXiv.1908.02983 - DOI
1. Azizzadenesheli K., Liu A., Yang F., Anandkumar A. (2019). Regularized learning for domain adaptation under label shifts. arXiv:1903.09734. doi: 10.48550/arXiv.1903.09734 - DOI

LinkOut - more resources

Full Text Sources
- Frontiers Media SA
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Enhancing random forest predictive performance for foot and mouth disease outbreaks in Uganda: a calibrated uncertainty prediction approach for varying distributions

Affiliations

Enhancing random forest predictive performance for foot and mouth disease outbreaks in Uganda: a calibrated uncertainty prediction approach for varying distributions

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources