Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Dec 19;15(1):10700.
doi: 10.1038/s41467-024-55240-8.

Challenges in data-driven geospatial modeling for environmental research and practice

Affiliations
Review

Challenges in data-driven geospatial modeling for environmental research and practice

Diana Koldasbayeva et al. Nat Commun. .

Abstract

Machine learning-based geospatial applications offer unique opportunities for environmental monitoring due to domains and scales adaptability and computational efficiency. However, the specificity of environmental data introduces biases in straightforward implementations. We identify a streamlined pipeline to enhance model accuracy, addressing issues like imbalanced data, spatial autocorrelation, prediction errors, and the nuances of model generalization and uncertainty estimation. We examine tools and techniques for overcoming these obstacles and provide insights into future geospatial AI developments. A big picture of the field is completed from advances in data processing in general, including the demands of industry-related solutions relevant to outcomes of applied sciences.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. General workflow for the tasks, including the geospatial modeling process and common issues relevant to each stage.
The pipeline includes challenges typical for geospatial modeling and requires approaches to handle them. At each step of the modeling process, we have identified the main challenges, metrics, and solutions. Illustrated prediction and uncertainty rasters were created from the SoilGrids open data distributed under the Creative Commons CC-BY 4.0 (https://www.isric.org/explore/soilgrids).
Fig. 2
Fig. 2. Handling imbalance data for artificial species distribution generated data.
A Point data generation using virtualspecies R package based on annual mean temperature and annual precipitation, obtained from WordlClim database. B Oversampling the minority class by SMOTE method with smotefamily R package. C Achieving a balanced dataset through random undersampling of the prevalent class. The image was created using the open-source Geographic Information System QGIS. Basemap is visualized from tiles by CartoDB, distributed under CC BY 3.0, based on the data from OpenStreetMap, distributed under ODbL (https://cartodb.com/basemaps). Boundaries used are taken from geoBoundaries Global Database (www.geoboundaries.org), distributed under CC BY 4.0.
Fig. 3
Fig. 3. The difference in SAC on the example of geochemical maps; raster and point data are obtained from USGS Open-File Report.
A There appears to be a strong positive SAC, with high concentrations of Aluminum (in red) and low concentrations (in blue) clustered together. B The Bismuth distribution map shows more scattered and less distinct clustering, indicating weaker SAC. The central and eastern regions show interspersed high and low values, suggesting a negative or weaker SAC. The image was created using the open-source Geographic Information System QGIS. Basemap is visualized from tiles by CartoDB, distributed under CC BY 3.0, based on the data from OpenStreetMap, distributed under ODbL (https://cartodb.com/basemaps). Boundaries used are taken from geoBoundaries Global Database (www.geoboundaries.org), distributed under CC BY 4.0.
Fig. 4
Fig. 4. Example of uncertainty quantification for spatial mapping provided within the project SoilGrids.
A Maps of one of the target variables—soil pH(water) in the topsoil layer. B Maps of associated uncertainty calculated as ratio between the inter-quantile range and the median for the same territory. The image was created using the open-source Geographic Information System QGIS. Basemap is visualized from tiles by CartoDB, distributed under CC BY 3.0, based on the data from OpenStreetMap, distributed under ODbL (https://cartodb.com/basemaps). Boundaries used are taken from geoBoundaries Global Database (www.geoboundaries.org), distributed under CC BY 4.0. SoilGrids data are publicly available under the CC-BY 4.0 (https://www.isric.org/explore/soilgrids).

Similar articles

Cited by

References

    1. Gewin, V. Mapping opportunities. Nature427, 376–377 (2004). - PubMed
    1. Fick, S. E. & Hijmans, R. J. Worldclim 2: new 1-km spatial resolution climate surfaces for global land areas. Int. J. Climatol.37, 4302–4315 (2017).
    1. Chuvieco, E. et al. Historical background and current developments for mapping burned area from satellite earth observation. Remote Sens. Environ.225, 45–64 (2019).
    1. Reichstein, M. et al. Deep learning and process understanding for data-driven earth system science. Nature566, 195–204 (2019). - PubMed
    1. Brown, C. F. et al. Dynamic world, near real-time global 10 m land use land cover mapping. Sci. Data9, 251 (2022).

LinkOut - more resources