Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Aug 2;4(8):e0002224.
doi: 10.1371/journal.pgph.0002224. eCollection 2024.

Species distribution modeling for disease ecology: A multi-scale case study for schistosomiasis host snails in Brazil

Affiliations

Species distribution modeling for disease ecology: A multi-scale case study for schistosomiasis host snails in Brazil

Alyson L Singleton et al. PLOS Glob Public Health. .

Abstract

Species distribution models (SDMs) are increasingly popular tools for profiling disease risk in ecology, particularly for infectious diseases of public health importance that include an obligate non-human host in their transmission cycle. SDMs can create high-resolution maps of host distribution across geographical scales, reflecting baseline risk of disease. However, as SDM computational methods have rapidly expanded, there are many outstanding methodological questions. Here we address key questions about SDM application, using schistosomiasis risk in Brazil as a case study. Schistosomiasis is transmitted to humans through contact with the free-living infectious stage of Schistosoma spp. parasites released from freshwater snails, the parasite's obligate intermediate hosts. In this study, we compared snail SDM performance across machine learning (ML) approaches (MaxEnt, Random Forest, and Boosted Regression Trees), geographic extents (national, regional, and state), types of presence data (expert-collected and publicly-available), and snail species (Biomphalaria glabrata, B. straminea, and B. tenagophila). We used high-resolution (1km) climate, hydrology, land-use/land-cover (LULC), and soil property data to describe the snails' ecological niche and evaluated models on multiple criteria. Although all ML approaches produced comparable spatially cross-validated performance metrics, their suitability maps showed major qualitative differences that required validation based on local expert knowledge. Additionally, our findings revealed varying importance of LULC and bioclimatic variables for different snail species at different spatial scales. Finally, we found that models using publicly-available data predicted snail distribution with comparable AUC values to models using expert-collected data. This work serves as an instructional guide to SDM methods that can be applied to a range of vector-borne and zoonotic diseases. In addition, it advances our understanding of the relevant environment and bioclimatic determinants of schistosomiasis risk in Brazil.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Biomphalaria presence points by species (color) and source (shape), thinned to 1 km.
A) National, B) Minas Gerais, C) São Paulo. Maps were built in R (version 4.2.2) using shapefiles from the geobr package [80].
Fig 2
Fig 2. Large variation in snail suitability probabilities at a national scale.
National prediction maps of B. glabrata (A–C), B. straminea (D–F), and B. tenagophila (G–I) suitability probabilities by model type (MaxEnt: A, D, G; Random Forest: B, E, H; Boosted Regression Tree: C, F, I). Each pixel shows the mean value across 10 bootstrapping iterations in which models were provided 80% of the available species presence records. Maps were built in R (version 4.2.2) using shapefiles from the geobr package [80].
Fig 3
Fig 3. Scale and species drive SDM performance metrics more than model type.
Plots of ten-fold spatially cross-validated, out-of-sample AUC values across species (A, B, C), scales (panels), and model types (colors). Plots display mean (point) and +/- standard error (error bars).
Fig 4
Fig 4. State and national models produce substantially different state-level prediction maps.
Minas Gerais prediction maps of B. glabrata suitability probabilities by model type (rows) and model geographic extent (columns) Each pixel shows the mean value across 10 bootstrapping iterations in which models were provided 80% of the species presence records available at a given scale. Parallel prediction maps of B. straminea in Minas Gerais and B. tenagophila in São Paulo can be found in Figs C and D in S1 Text. Compared to national models (Fig 2), at smaller geographic scales it becomes more obvious that suitability probabilities can be highly localized, producing points of high suitability surrounded by areas with low suitability. Maps were built in R (version 4.2.2) using shapefiles from the geobr package [80].
Fig 5
Fig 5
Examples of marginal effects of covariates on suitability probabilities that vary across model type (A), geographic scale (B), and species (C). Partial dependence plots for three covariates (columns) across model types (color), species (top two rows vs. bottom two rows), and scale (first row vs. second and third row vs. fourth).
Fig 6
Fig 6. Variable importance of land use/land cover (LULC) variables can increase at smaller scales.
Proportion of total variable importance averaged across all training folds attributable to distance to high population density and proportion of temporary crop cover. Displayed for all species (A,B, C) and model types (color).
Fig 7
Fig 7. Expert collected and public GBIF data produce visually different suitability maps for B. tenagophila in São Paulo across model type.
Predicted suitability maps with varying input data (columns) supplied to all model types (rows). Compared to national models (Fig 2), at smaller geographic scales it becomes more obvious that suitability probabilities can be highly localized, producing points of high suitability surrounded by areas with low suitability. Maps were built in R (version 4.2.2) using shapefiles from the geobr package [80].
Fig 8
Fig 8

References

    1. Lippi CA, Mundis SJ, Sippy R, Flenniken JM, Chaudhary A, Hecht G, et al.. Trends in mosquito species distribution modeling: insights for vector surveillance and disease control. Parasit Vectors. 2023. Aug 28;16(1):302. doi: 10.1186/s13071-023-05912-z - DOI - PMC - PubMed
    1. Hollings T, Robinson A, Andel M van, Jewell C, Burgman M. Species distribution models: A comparison of statistical approaches for livestock and disease epidemics. PLOS ONE. 2017. Aug 24;12(8):e0183626. doi: 10.1371/journal.pone.0183626 - DOI - PMC - PubMed
    1. de Almeida TM, Neto IR, Consalter R, Brum FT, Rojas EAG, da Costa-Ribeiro MCV. Predictive modeling of sand fly distribution incriminated in the transmission of Leishmania (Viannia) braziliensis and the incidence of Cutaneous Leishmaniasis in the state of Paraná, Brazil. Acta Trop. 2022. May 1;229:106335. - PubMed
    1. MacDonald AJ, McComb S, Sambado S. Linking Lyme disease ecology and epidemiology: reservoir host identity, not richness, determines tick infection and human disease in California. Environ Res Lett. 2022. Nov;17(11):114041.
    1. de la Vega GJ, Medone P, Ceccarelli S, Rabinovich J, Schilman PE. Geographical distribution, climatic variability and thermo-tolerance of Chagas disease vectors. Ecography. 2015;38(8):851–60.

LinkOut - more resources