Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Sep;14(134):20170520.
doi: 10.1098/rsif.2017.0520.

Improved prediction accuracy for disease risk mapping using Gaussian process stacked generalization

Affiliations

Improved prediction accuracy for disease risk mapping using Gaussian process stacked generalization

Samir Bhatt et al. J R Soc Interface. 2017 Sep.

Abstract

Maps of infectious disease-charting spatial variations in the force of infection, degree of endemicity and the burden on human health-provide an essential evidence base to support planning towards global health targets. Contemporary disease mapping efforts have embraced statistical modelling approaches to properly acknowledge uncertainties in both the available measurements and their spatial interpolation. The most common such approach is Gaussian process regression, a mathematical framework composed of two components: a mean function harnessing the predictive power of multiple independent variables, and a covariance function yielding spatio-temporal shrinkage against residual variation from the mean. Though many techniques have been developed to improve the flexibility and fitting of the covariance function, models for the mean function have typically been restricted to simple linear terms. For infectious diseases, known to be driven by complex interactions between environmental and socio-economic factors, improved modelling of the mean function can greatly boost predictive power. Here, we present an ensemble approach based on stacked generalization that allows for multiple nonlinear algorithmic mean functions to be jointly embedded within the Gaussian process framework. We apply this method to mapping Plasmodium falciparum prevalence data in sub-Saharan Africa and show that the generalized ensemble approach markedly outperforms any individual method.

Keywords: Gaussian process; disease mapping; malaria; stacked generalization.

PubMed Disclaimer

Conflict of interest statement

We declare we have no competing interests.

Figures

Figure 1.
Figure 1.
(a) Plot of the 23 131 prevalence surveys conducted between 2000 and 2015. The survey data are age and diagnostic standardized and presented as a continuum of blue to red from 0 to 1. (b) Study area of stable malaria transmission in sub-Saharan Africa. Our analysis was performed on four zones—western Africa, north eastern Africa, eastern Africa and southern Africa
Figure 2.
Figure 2.
Comparisons of cross-validation MSE versus MAE versus correlation. Level 1 generalizers and the standard Gaussian process are shown in blue and all level 0 generalizers are shown in red. SGP, stacked Gaussian process; CWM, stacked constrained weighted mean; GP, standard Gaussian process; GBM, gradient-boosted trees; GAS, generalized additive splines; FR, random forests; MARS, multivariate adaptive regression splines and LIN, elastic net regularized linear regression. (a) Eastern Africa, (b) southern Africa, (c) north eastern Africa and (d) western Africa. (Online version in colour.)
Figure 3.
Figure 3.
Predicted prevalence maps for eastern Africa in 2011 for gradient-boosted trees (GBM), random forests (FR), elastic net regularized linear regression (LIN), multivariate adaptive regression splines (MARS), generalized additive splines (GAS) and the new stacked Gaussian process (SGP).

References

    1. Diggle P, Ribeiro PJ. 2007. Model-based geostatistics. New York, NY: Springer.
    1. Hay SI. 2013. Global mapping of infectious disease. Phil. Trans. R. Soc. B 368, 20120250 (10.1098/rstb.2012.0250) - DOI - PMC - PubMed
    1. Freifeld CC, Mandl KD, Reis BY, Brownstein JS. 2008. HealthMap: global infectious disease monitoring through automated classification and visualization of Internet media reports. J. Am. Med. Inform. Assoc. 15, 150–157. (10.1197/jamia.M2544) - DOI - PMC - PubMed
    1. Bhatt S, et al. 2015. The effect of malaria control on Plasmodium falciparum in Africa between 2000 and 2015. Nature 526, 207–211. (10.1038/nature15535) - DOI - PMC - PubMed
    1. Rasmussen CE, Williams CKI. 2006. Gaussian processes for machine learning, vol. 14 Cambridge, MA: The MIT Press.

Publication types