Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;8(2):e55158.
doi: 10.1371/journal.pone.0055158. Epub 2013 Feb 14.

The effects of sampling bias and model complexity on the predictive performance of MaxEnt species distribution models

Affiliations

The effects of sampling bias and model complexity on the predictive performance of MaxEnt species distribution models

Mindy M Syfert et al. PLoS One. 2013.

Erratum in

  • PLoS One. 2013;8(7). doi:10.1371/annotation/35be5dff-7709-4029-8cfa-f1357e5001f5

Abstract

Species distribution models (SDMs) trained on presence-only data are frequently used in ecological research and conservation planning. However, users of SDM software are faced with a variety of options, and it is not always obvious how selecting one option over another will affect model performance. Working with MaxEnt software and with tree fern presence data from New Zealand, we assessed whether (a) choosing to correct for geographical sampling bias and (b) using complex environmental response curves have strong effects on goodness of fit. SDMs were trained on tree fern data, obtained from an online biodiversity data portal, with two sources that differed in size and geographical sampling bias: a small, widely-distributed set of herbarium specimens and a large, spatially clustered set of ecological survey records. We attempted to correct for geographical sampling bias by incorporating sampling bias grids in the SDMs, created from all georeferenced vascular plants in the datasets, and explored model complexity issues by fitting a wide variety of environmental response curves (known as "feature types" in MaxEnt). In each case, goodness of fit was assessed by comparing predicted range maps with tree fern presences and absences using an independent national dataset to validate the SDMs. We found that correcting for geographical sampling bias led to major improvements in goodness of fit, but did not entirely resolve the problem: predictions made with clustered ecological data were inferior to those made with the herbarium dataset, even after sampling bias correction. We also found that the choice of feature type had negligible effects on predictive performance, indicating that simple feature types may be sufficient once sampling bias is accounted for. Our study emphasizes the importance of reducing geographical sampling bias, where possible, in datasets used to train SDMs, and the effectiveness and essentialness of sampling bias correction within MaxEnt.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Data locations in New Zealand.
Tree fern occurrence locations (orange) and “absence” locations (blue) are based on a) herbarium data extracted from GBIF; b) NVS ecological survey data extracted from GBIF, and c) LUCAS plot data. In the case of the herbarium and NVS datasets, “absences” are background locations based on locations of other vascular plants; in the case of the LUCAS dataset, true absences are shown.
Figure 2
Figure 2. Density distribution plots of environmental variables.
Tree fern occurrences (orange) and background locations are based on locations of other vascular plants (blue) compared to all NZ locations (∼1 km resolution; black) for the herbarium dataset (upper row) and the NVS dataset (lower row). Temperature seasonality is represented as standard deviations multiplied by 10.
Figure 3
Figure 3. Comparison of presence-only calibration (POC) plots.
MaxEnt LQ models were trained on (a) herbarium and (b) NVS data, correcting for geographical sampling bias; plots were derived from the average predictions of 40 runs. Values above the linear diagonal signify model underestimation of species prevalence and values below the line signifies overestimation of species prevalence. The calibration curve is shown in cyan and the orange lines represent ±2 standard deviations. Presence and background data are marked at the bottom of each graph at their corresponding predicted probabilities of presence: presences are orange and background data are black.
Figure 4
Figure 4. Box plots of AUC values. AUC values derived from MaxEnt models fitted using different functional forms (“feature types”) and two different training datasets: herbarium (a–d) and NVS (e–h).
Evaluations are made using randomly withheld test data without and with correcting geographical sampling bias (a & e) and (b & f), respectively; evaluations are made using independent LUCAS data without and with correcting for sampling bias are (c & g) and (d & h), respectively. Box plots indicate variation in AUC among 40 runs (boxes encompass 25th and 75th percentiles, whiskers approximate 99% of the data range, points are outliers).
Figure 5
Figure 5. LUCAS presence/absence locations with predicted presences and absences generated from average LQ model predictions (with geographical sampling bias correction).
Correct agreement between predicted presences/absences and LUCAS presences/absences are shown in green and incorrect agreements are shown in orange. LUCAS presence locations are shown with predictions from (a) herbarium dataset and (b) NVS dataset, LUCAS absence locations are shown with predictions from (c) herbarium dataset and (d) NVS dataset.

Similar articles

Cited by

References

    1. Thuiller W, Albert C, Araujo MB, Berry PM, Cabeza M, et al. (2008) Predicting global change impacts on plant species’ distributions: Future challenges. Perspectives in Plant Ecology Evolution and Systematics 9: 137–152.
    1. Elith J, Leathwick JR (2009) Species Distribution Models: Ecological Explanation and Prediction Across Space and Time. Annual Review of Ecology Evolution and Systematics 40: 677–697.
    1. Guisan A, Thuiller W (2005) Predicting species distribution: offering more than simple habitat models. Ecology Letters 8: 993–1009. - PubMed
    1. Araújo MB, Luoto M (2007) The importance of biotic interactions for modelling species distributions under climate change. Global Ecology and Biogeography 16: 743–753.
    1. Yates CJ, McNeill A, Elith J, Midgley GF (2010) Assessing the impacts of climate change and land transformation on Banksia in the South West Australian Floristic Region. Diversity and Distributions 16: 187–201.

Publication types

LinkOut - more resources