Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov 1:7:1455331.
doi: 10.3389/frai.2024.1455331. eCollection 2024.

Enhancing random forest predictive performance for foot and mouth disease outbreaks in Uganda: a calibrated uncertainty prediction approach for varying distributions

Affiliations

Enhancing random forest predictive performance for foot and mouth disease outbreaks in Uganda: a calibrated uncertainty prediction approach for varying distributions

Geofrey Kapalaga et al. Front Artif Intell. .

Abstract

Foot-and-mouth disease poses a significant threat to both domestic and wild cloven-hoofed animals, leading to severe economic losses and jeopardizing food security. While machine learning models have become essential for predicting foot-and-mouth disease outbreaks, their effectiveness is often compromised by distribution shifts between training and target datasets, especially in non-stationary environments. Despite the critical impact of these shifts, their implications in foot-and-mouth disease outbreak prediction have been largely overlooked. This study introduces the Calibrated Uncertainty Prediction approach, designed to enhance the performance of Random Forest models in predicting foot-and-mouth disease outbreaks across varying distributions. The Calibrated Uncertainty Prediction approach effectively addresses distribution shifts by calibrating uncertain instances for pseudo-label annotation, allowing the active learner to generalize more effectively to the target domain. By utilizing a probabilistic calibration model, Calibrated Uncertainty Prediction pseudo-annotates the most informative instances, refining the active learner iteratively and minimizing the need for human annotation and outperforming existing methods known to mitigate distribution shifts. This reduces costs, saves time, and lessens the dependence on domain experts while achieving outstanding predictive performance. The results demonstrate that Calibrated Uncertainty Prediction significantly enhances predictive performance in non-stationary environments, achieving an accuracy of 98.5%, Area Under the Curve of 0.842, recall of 0.743, precision of 0.855, and an F1 score of 0.791. These findings underscore Calibrated Uncertainty Prediction's ability to overcome the vulnerabilities of existing ML models, offering a robust solution for foot-and-mouth disease outbreak prediction and contributing to the broader field of predictive modeling in infectious disease management.

Keywords: calibrated uncertainty prediction; distribution shifts; foot-and-mouth disease; performance improvement rates; random forest.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Model accuracy degradation under varying distributions. RF, random forest; SVM, support vector machine; kNN, k-nearest neighbors; GBM, gradient boosting machine; AdaBoost, adaptive boost; LR, logistic regression; CART, classification and regression tree.
Figure 2
Figure 2
Variability in rainfall (A) and max temperature (B) features, highlighting varying distribution (Kapalaga et al., 2024).
Figure 3
Figure 3
A general framework for handling distribution shifts in ML. ACC, accuracy; AUC, area under curve; PR, precision; DS, distribution shifts; ML, machine learning.
Figure 4
Figure 4
Experimental design to guide the CUP development and evaluation. RF, random forest; CUP, calibrated uncertainty prediction; FMD, foot-and-mouth disease.
Figure 5
Figure 5
Map of Uganda showing districts affected by FMD outbreaks between 2011 and 2022.
Figure 6
Figure 6
Prevalence of FMD outbreaks by district.
Figure 7
Figure 7
Map of Uganda with purposively selected study districts.
Figure 8
Figure 8
Model Performances across Oversampling Techniques with balanced dataset (Kapalaga et al., 2024). RF, random forest; SVM, support vector machine; kNN, k-nearest neighbors; GBM, gradient boosting machine; AdaBoost, adaptive boost; LR, logistic regression; CART, classification and regression tree; SMOTE, synthetic minority over-sampling technique; ADASYN, adaptive synthetic sampling.
Figure 9
Figure 9
Comparative model performance across oversampling techniques. RF, random forest; SVM, support vector machine; kNN, k-nearest neighbors; GBM, gradient boosting machine; AdaBoost, adaptive boost; LR, logistic regression; CART, classification and regression tree; SMOTE, synthetic minority over-sampling technique; ADASYN, adaptive synthetic sampling.
Figure 10
Figure 10
Visual overview of the CUP approach. ACC, accuracy; AUC, area under curve; PR, precision, L0 , training dataset for training initial active learner ( A ); U , validation dataset; Q0 , queried uncertainty samples; L1 , dataset for training model calibrator ( M ); Q1 , pseudo-labeled uncertainty samples.
Figure 11
Figure 11
Workflow for evaluating performance of CUP with existing methods. AUC of ROC, area under curve of receiver operating characteristic; RF, random forest; DWL, dynamic weighted learning, STar, select TARgets; LAAL-ELM, less annotated active learning extreme learning machine; RLLS, regularized learning under label shifts; CUP, calibrated uncertainty prediction; FMD, foot-and-mouth disease.
Figure 12
Figure 12
Visual overview of the iterative probabilistic calibration process applied to uncertainty instances. (A) Depicts the distribution of uncertainty samples before calibration, while (B–F) illustrate the status of uncertainty examples after iterative calibrations. (G) Showcases a scenario where uncertainty samples are perfectly calibrated.
Figure 13
Figure 13
Performance improvement with the CUP approach. AUC, area under curve.
Figure 14
Figure 14
Comparative performance analysis of various methods. ACC, accuracy, AUC, area under curve, RF, random forest; DWL, dynamic weighted learning; STar, select TARgets; LAAL-ELM, less annotated active learning extreme learning machine; RLLS, regularized learning under label shifts; CUP, calibrated uncertainty prediction.
Figure 15
Figure 15
Weighted average performance of evaluated methods. RF, random forest; DWL, dynamic weighted learning; STar, select TARgets; LAAL-ELM, less annotated active learning extreme learning machine; RLLS, regularized learning under label shifts; CUP, calibrated uncertainty prediction.

Similar articles

References

    1. Aghaei S., Gómez A., Vayanos P. (2021). Strong optimal classification trees. arXiv:2103.15965. doi: 10.48550/arXiv.2103.15965 - DOI
    1. Amrani H. (2021). Model-centric and data-centric AI for personalization in human activity recognition. Thesis for: Master's Degree of Computer Science. doi: 10.13140/RG.2.2.16280.72965 - DOI
    1. Antoniou A., Storkey A., Edwards H. (2017). Data augmentation generative adversarial networks. arXiv:1711.04340. doi: 10.48550/arXiv.1711.04340 - DOI
    1. Arazo E., Ortego D., Albert P., O’Connor N. E., McGuinness K. (2020). Pseudo-labeling and confirmation bias in deep semi-supervised learning. arXiv, 1–8. doi: 10.48550/arXiv.1908.02983 - DOI
    1. Azizzadenesheli K., Liu A., Yang F., Anandkumar A. (2019). Regularized learning for domain adaptation under label shifts. arXiv:1903.09734. doi: 10.48550/arXiv.1903.09734 - DOI

LinkOut - more resources