Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Sep 24:6:44.
doi: 10.1186/1476-072X-6-44.

Developing a spatial-statistical model and map of historical malaria prevalence in Botswana using a staged variable selection procedure

Affiliations

Developing a spatial-statistical model and map of historical malaria prevalence in Botswana using a staged variable selection procedure

Marlies H Craig et al. Int J Health Geogr. .

Abstract

Background: Several malaria risk maps have been developed in recent years, many from the prevalence of infection data collated by the MARA (Mapping Malaria Risk in Africa) project, and using various environmental data sets as predictors. Variable selection is a major obstacle due to analytical problems caused by over-fitting, confounding and non-independence in the data. Testing and comparing every combination of explanatory variables in a Bayesian spatial framework remains unfeasible for most researchers. The aim of this study was to develop a malaria risk map using a systematic and practicable variable selection process for spatial analysis and mapping of historical malaria risk in Botswana.

Results: Of 50 potential explanatory variables from eight environmental data themes, 42 were significantly associated with malaria prevalence in univariate logistic regression and were ranked by the Akaike Information Criterion. Those correlated with higher-ranking relatives of the same environmental theme, were temporarily excluded. The remaining 14 candidates were ranked by selection frequency after running automated step-wise selection procedures on 1000 bootstrap samples drawn from the data. A non-spatial multiple-variable model was developed through step-wise inclusion in order of selection frequency. Previously excluded variables were then re-evaluated for inclusion, using further step-wise bootstrap procedures, resulting in the exclusion of another variable. Finally a Bayesian geo-statistical model using Markov Chain Monte Carlo simulation was fitted to the data, resulting in a final model of three predictor variables, namely summer rainfall, mean annual temperature and altitude. Each was independently and significantly associated with malaria prevalence after allowing for spatial correlation. This model was used to predict malaria prevalence at unobserved locations, producing a smooth risk map for the whole country.

Conclusion: We have produced a highly plausible and parsimonious model of historical malaria risk for Botswana from point-referenced data from a 1961/2 prevalence survey of malaria infection in 1-14 year old children. After starting with a list of 50 potential variables we ended with three highly plausible predictors, by applying a systematic and repeatable staged variable selection procedure that included a spatial analysis, which has application for other environmentally determined infectious diseases. All this was accomplished using general-purpose statistical software.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Malaria prevalence data. Malaria prevalence of infection in 1 to 14 year old children, in Botswana, during the 1961/62 national survey.
Figure 2
Figure 2
Month of survey during the 1961/62 national malaria survey.
Figure 3
Figure 3
Flow diagram of staged variable selection procedure.
Figure 4
Figure 4
Plots of malaria prevalence against fourteen potential explanatory variables. Scatter – and box plots of candidate environmental explanatory variables used in step-wise procedures. Malaria prevalence in 1 to 14 year old children, Botswana, 1961/62, is shown on the Y axis on a logit scale. (A) annual maximum rainfall (mm); (B) winter (April – October) total rainfall (mm); (C) rainfall concentration (%); (D) winter (April – October) mean temperature (°C); (E) annual maximum temperature (°C); (F) temperature proportional standard deviation (°C); (G) elevation (m); (H) annual maximum NDVI; (I) NDVI standard deviation; (J) summer (December–March) mean vapour pressure (hPa); (K) vapour pressure standard deviation (hPa); (L) log distance to permanent water (m); (M) land cover: dry/low risk, moist/high risk areas; (N) start month of survey.
Figure 5
Figure 5
Distribution of coefficients of fourteen candidate variables in 1000 stepwise bootstrap models. Frequency histograms of coefficients obtained in automated backward stepwise exclusion regression analysis against 1000 bootstrap samples of the malaria prevalence data in Stage 3. In each case the vertical black line indicates coefficient = 0. (A) annual maximum rainfall (mm); (B) winter (April – October) total rainfall (mm); (C) rainfall concentration (%); (D) winter (April – October) mean temperature (°C); (E) annual maximum temperature (°C); (F) temperature proportional standard deviation (°C); (G) elevation (m); (H) annual maximum NDVI; (I) NDVI standard deviation; (J) summer (December–March) mean vapour pressure (hPa); (K) vapour pressure standard deviation (hPa); (L) log distance to permanent water (m); (M) land cover: dry/low risk, moist/high risk areas; (N) start month of survey: main season (April–May).
Figure 6
Figure 6
Predicted versus observed prevalence. Predicted versus observed prevalence, on a logit scale, for the derivation (crosses) and validation (squares) data of the Stage 5 non-spatial model, and for the median (closed circles) and upper/lower confidence interval (spikes) of the Stage 6 spatial model.
Figure 7
Figure 7
Maps of predicted malaria prevalence and covariates. Predicted pre-control childhood malaria prevalence maps for Botswana, resulting from (A) the stage 5 non-spatial model and (B) the stage 6 spatial model; 118 survey sites are shown; (C) the upper and lower 95% CI of the spatial model. Co-variates used in the models: (D) annual mean temperature, C; (E) summer total rainfall, mm; (F) elevation, m; (G) land cover categories, high-risk/low-risk. Lines represent district boundaries.

References

    1. Snow RW, Marsh K, Le Sueur D. The need for maps of transmission intensity to guide malaria control in Africa. Parasitol Today. 1996;12:455–457. doi: 10.1016/S0169-4758(96)30032-X. - DOI
    1. Kleinschmidt I, Bagayoko M, Clarke GP, Craig M, Le Sueur D. A spatial statistical approach to malaria mapping. Int J Epidemiol. 2000;29:355–361. doi: 10.1093/ije/29.2.355. - DOI - PubMed
    1. Kleinschmidt I, Omumbo J, Briet O, Van De GN, Sogoba N, Mensah NK, Windmeijer P, Moussa M, Teuscher T. An empirical malaria distribution map for West Africa. Trop Med Int Health. 2001;6:779–786. doi: 10.1046/j.1365-3156.2001.00790.x. - DOI - PubMed
    1. Omumbo JA, Hay SI, Snow RW, Tatem AJ, Rogers DJ. Modelling malaria risk in East Africa at high-spatial resolution. Trop Med Int Health. 2005;10:557–566. doi: 10.1111/j.1365-3156.2005.01424.x. - DOI - PMC - PubMed
    1. Snow RW, Gouws E, Omumbo JA, Rapuoda B, Craig MH, Tanser FC, Le Sueur D, Ouma J. Models to predict the intensity of Plasmodium falciparum transmission: applications to the burden of disease in Kenya. Trans R Soc Trop Med Hyg. 1998;92:601–606. doi: 10.1016/S0035-9203(98)90781-7. - DOI - PubMed

Publication types

MeSH terms