. 2023 Nov 21;22(1):356.

doi: 10.1186/s12936-023-04760-7.

Comparison of new computational methods for spatial modelling of malaria

Spencer Wong¹, Jennifer A Flegg², Nick Golding^{3

4}, Sevvandi Kandanaarachchi⁵

Affiliations

¹ School of Mathematics and Statistics, The University of Melbourne, Parkville, VIC, 3010, Australia.
² School of Mathematics and Statistics, The University of Melbourne, Parkville, VIC, 3010, Australia. jennifer.flegg@unimelb.edu.au.
³ Telethon Kids Institute, Perth Children's Hospital, 15 Hospital Ave, Nedlands, WA, 6009, Australia.
⁴ Curtin University, Kent St, Bentley, WA, 6102, Australia.
⁵ CSIRO's Data61, Research Way, Clayton, VIC, 3168, Australia.

PMID: 37990242
PMCID: PMC10664662
DOI: 10.1186/s12936-023-04760-7

Comparison of new computational methods for spatial modelling of malaria

Spencer Wong et al. Malar J. 2023.

. 2023 Nov 21;22(1):356.

doi: 10.1186/s12936-023-04760-7.

Authors

Spencer Wong¹, Jennifer A Flegg², Nick Golding^{3

4}, Sevvandi Kandanaarachchi⁵

Affiliations

¹ School of Mathematics and Statistics, The University of Melbourne, Parkville, VIC, 3010, Australia.
² School of Mathematics and Statistics, The University of Melbourne, Parkville, VIC, 3010, Australia. jennifer.flegg@unimelb.edu.au.
³ Telethon Kids Institute, Perth Children's Hospital, 15 Hospital Ave, Nedlands, WA, 6009, Australia.
⁴ Curtin University, Kent St, Bentley, WA, 6102, Australia.
⁵ CSIRO's Data61, Research Way, Clayton, VIC, 3168, Australia.

PMID: 37990242
PMCID: PMC10664662
DOI: 10.1186/s12936-023-04760-7

Abstract

Background: Geostatistical analysis of health data is increasingly used to model spatial variation in malaria prevalence, burden, and other metrics. Traditional inference methods for geostatistical modelling are notoriously computationally intensive, motivating the development of newer, approximate methods for geostatistical analysis or, more broadly, computational modelling of spatial processes. The appeal of faster methods is particularly great as the size of the region and number of spatial locations being modelled increases.

Methods: This work presents an applied comparison of four proposed 'fast' computational methods for spatial modelling and the software provided to implement them-Integrated Nested Laplace Approximation (INLA), tree boosting with Gaussian processes and mixed effect models (GPBoost), Fixed Rank Kriging (FRK) and Spatial Random Forests (SpRF). The four methods are illustrated by estimating malaria prevalence on two different spatial scales-country and continent. The performance of the four methods is compared on these data in terms of accuracy, computation time, and ease of implementation.

Results: Two of these methods-SpRF and GPBoost-do not scale well as the data size increases, and so are likely to be infeasible for larger-scale analysis problems. The two remaining methods-INLA and FRK-do scale well computationally, however the resulting model fits are very sensitive to the user's modelling assumptions and parameter choices. The binomial observation distribution commonly used for disease prevalence mapping with INLA fails to account for small-scale overdispersion present in the malaria prevalence data, which can lead to poor predictions. Selection of an appropriate alternative such as the Beta-binomial distribution is required to produce a reliable model fit. The small-scale random effect term in FRK overcomes this pitfall, but FRK model estimates are very reliant on providing a sufficient number and appropriate configuration of basis functions. Unfortunately the computation time for FRK increases rapidly with increasing basis resolution.

Conclusions: INLA and FRK both enable scalable geostatistical modelling of malaria prevalence data. However care must be taken when using both methods to assess the fit of the model to data and plausibility of predictions, in order to select appropriate model assumptions and parameters.

Keywords: Geostatistics; Predictive modelling; Risk mapping; Spatial modelling.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Fig. 1**
2009 *P. falciparum* prevalence data in Kenya. a shows prevalence survey results, while b shows the Malaria Atlas Project predicted prevalence

**Fig. 2**
*P. falciparum* prevalence survey locations in Kenya for 2009. Colours represent different cross-validation folds. a and b show 10-fold and 50-fold cross-validation locations respectively

**Fig. 3**
*P. falciparum* prevalence data used to fit the four models at the continental scale. a shows the 2009 observed data at 868 locations. b shows the prevalence generated from binomial samples at the observation locations. c shows the prevalence generated by binomial samples at 1000 uniformly random locations. d is the Malaria Atlas Project predicted prevalence raster from 2009 used to generate the samples in b and c

**Fig. 4**
Predicted prevalences and uncertainties for a INLA, b GPBoost, c SpRF, and d FRK when trained on *P. falciparum* prevalence data from Kenya in 2009. Note that these maps are intended only to illustrate differences in model predictions when fit to a small data sample, and are not likely to accurately represent malaria prevalence across the country in this year

**Fig. 5**
Interval predictions for 10-fold cross-validation for a INLA, b GPBoost, c SpRF, and d FRK using the national level Kenya data. Points show the predicted mean from each model, and intervals show one standard deviation above and below the mean

**Fig. 6**
*P. falciparum* prevalence predictions when fit using three different datasets. In column (i), models are fit using the survey data from Africa in 2009, shown in Fig. 3a. In column (ii), the models are fit to binomial samples drawn from the Malaria Atlas prevalence raster at the same survey locations, shown in Fig. 3b. In column (iii), they are fit to binomial samples drawn from the raster at 1000 uniformly selected locations across the continent, shown in Fig. 3c. Outputs have been masked by the Malaria Atlas Project raster in Fig. 3d. Note that these maps are intended only to illustrate differences in model predictions and are not likely to accurately represent malaria prevalence in this year

**Fig. 7**
Times taken by each model on uniformly distributed simulated datasets. GPBoost was not run with 5000 or 10,000 points due to the likely long computation time

**Fig. 8**
Prevalence predictions from an INLA-based model with a Beta-binomial response, fit to the observation data in Fig. 3a

**Fig. 9**
*P. falciparum* prevalence predictions from the FRK model with a nres = 3 and regular = 1, and b nres = 2 and regular = 2

**Fig. 10**
Model predictions of the four models vs actual prevalences using 10-fold CV

**Fig. 11**
Model predictions of the four models vs actual prevalences using 50-fold CV with folds in different colours

**Fig. 12**
Interval predictions for 50-fold cross-validation for a INLA, b GPBoost, c SpRF and d FRK

**Fig. 13**
a *P. falciparum* prevalence in Kenya for 2009 and b the kernel density estimates of the sampled locations

**Fig. 14**
*P. falciparum* prevalence and kernel density estimates of different clusters (folds) for a 10-fold cross-validation and b 50-fold cross-validation, with zero prevalence observations taken out

**Fig. 15**
Kernel density estimates of the locations and the absolute errors of the four models for a 10-fold cross-validation and b 50-fold cross-validation

**Fig. 16**
Kernel density estimates of the locations and the interval widths of the four models for a 10-fold cross-validation and b 50-fold cross-validation

**Fig. 17**
Predicted prevalence from INLA when fit to simulated data at observation locations with a no added noise, b added Gaussian noise of standard deviation 0.4 and c added Gaussian noise of standard deviation 1.2

**Fig. 18**
Posterior means of the intercept, range and variance for the INLA-based model fit using simulated data at the observation locations with added Gaussian noise of varying standard deviation. The bottom right plot shows the time taken to fit the model to each of the datasets. Error bars show posterior interquartile ranges

**Fig. 19**
Predictions from an INLA-based model with a Gaussian response fit to the observation data. Values have been clipped to lie within $[0, 1]$

**Fig. 20**
Predicted standard deviations for each of the maps shown in Fig. 6

**Fig. 21**
*P. falciparum* prevalence predictions for GPBoost when using the Vecchia approximation for various values of the nearest neighbour parameters, $m_{v}$ and $m_{v, p}$ . a uses $m_{v} = m_{v, p} = 30$ , b uses $m_{v} = 30$ and $m_{v, p} = 150$ , while c uses $m_{v} = m_{v, p} = 150$

**Fig. 22**
Time taken for GPBoost with the Vecchia approximation using values of nrounds = 247 and nrounds = 5 applied for simulated datasets of various sizes, compared to the times in Fig. 7

See this image and copyright information in PMC

References

1. Diggle P, Ribeiro Jr PJ. Model-based geostatistics. Springer; 2007.
1. Martínez-Minaya J, Cameletti M, Conesa D, Pennino MG. Species distribution modelling: a statistical review with focus in spatio-temporal issues. Stoch Environ Res Risk Assess. 2018;32:3227–3244. doi: 10.1007/s00477-018-1548-7. - DOI
1. Holdaway MR. Spatial modelling and interpolation of monthly temperature using kriging. Clim Res. 1996;6(3):215–225. doi: 10.3354/cr006215. - DOI
1. Samalot A, Astitha M, Yang J, Galanis G. Combined Kalman filter and universal kriging to improve storm wind speed predictions for the northeastern United States. Weather Forecast. 2019;34(3):587–601. doi: 10.1175/WAF-D-18-0068.1. - DOI
1. Mulla D. Mapping and managing spatial patterns in soil fertility and crop yield. In: Proceedings of soil specific crop management: a workshop on research and development issues. Wiley Online Library; 1993. pp. 15–26.

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Comparison of new computational methods for spatial modelling of malaria

Affiliations

Comparison of new computational methods for spatial modelling of malaria

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Miscellaneous