Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 18;16(1):2170.
doi: 10.1038/s41467-025-56906-7.

Global gridded population datasets systematically underrepresent rural population

Affiliations

Global gridded population datasets systematically underrepresent rural population

Josias Láng-Ritter et al. Nat Commun. .

Abstract

Numerous initiatives towards sustainable development rely on global gridded population data. Such data have been calibrated primarily for urban environments, but their accuracy in the rural domain remains largely unexplored. This study systematically validates global gridded population datasets in rural areas, based on reported human resettlement from 307 large dam construction projects in 35 countries. We find large discrepancies between the examined datasets, and, without exception, significant negative biases of -53%, -65%, -67%, -68%, and -84% for WorldPop, GWP, GRUMP, LandScan, and GHS-POP, respectively. This implies that rural population is, even in the most accurate dataset, underestimated by half compared to reported figures. To ensure equitable access to services and resources for rural communities, past and future applications of the datasets must undergo a critical discussion in light of the identified biases. Improvements in the datasets' accuracies in rural areas can be attained through strengthened population censuses, alternative population counts, and a more balanced calibration of population models.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Examples of five global gridded population datasets.
The map shows part of the rural province of Tuyên Quang in northern Vietnam, with population data for the reference year 2000 from a GWP, b GRUMP, c GHS-POP, d LandScan, and e WorldPop. The Na Hang Reservoir in this area (indicated by the grey polygon) was completed in 2008 and caused resettlement of 4000 people. Supplementary Fig. 1 shows an enlargement of panel d. Country boundary courtesy of ©EuroGeographics.
Fig. 2
Fig. 2. Locations of the 307 rural areas analysed in this study.
The reported population numbers are indicated by marker size, while reference years of the rural areas are shown by marker colour. Country boundaries courtesy of ©EuroGeographics.
Fig. 3
Fig. 3. Characteristics of evaluated rural areas and population datasets.
a Temporal distribution of the 307 evaluated rural areas. b Temporal coverage and model complexity of the five population datasets examined in this study.
Fig. 4
Fig. 4. Comparison of reported rural populations (x-axis) with those predicted by the five population datasets (y-axis) for the reference year 2000.
Each of the vertically aligned groups of five data points represents one of the 33 rural areas evaluated for the reference year 2000. Note the logarithmic scale on both axes.
Fig. 5
Fig. 5. Validation scatter plots for the five analysed population datasets, comparing reported rural populations (x-axes) with those predicted by the population datasets (y-axes).
Each data point represents one of the 307 analysed rural areas with colouring according to reference year (ae) and colouring according to country income level using World Bank classification (fj). The accuracy metrics shown in all plots represent bias percentage and symmetric mean absolute percentage error (sMAPE). Note the logarithmic scale on all axes.
Fig. 6
Fig. 6. Trend analyses of dataset accuracy over map reference year and country income level.
a, b Influence of map reference year on dataset accuracy. c, d Influence of country income level on dataset accuracy. The analysis in c, d is based solely on the 63 areas with reference years 2000–2010 to exclude effects of different time periods covered by the different population datasets. The accuracy metrics used are bias percentage and symmetric mean absolute percentage error (sMAPE); for both accuracy metrics, the optimal value is zero.
Fig. 7
Fig. 7. Mean bias percentages over the five population grids in the 35 countries with evaluated rural areas.
Note that most countries include data points for only some of the five datasets, as indicated in Fig. 8. Country boundaries courtesy of ©EuroGeographics.
Fig. 8
Fig. 8. Bias percentages for each of the five population grids in the 35 countries with evaluated rural areas.
Countries are sorted alphabetically by ISO3 country code. The numbers below the ISO3 country codes indicate the totals of rural areas evaluated for each country. The lack of reference years for computing bias percentage of a given dataset is indicated by x-symbols.
Fig. 9
Fig. 9. Surface area validation of three reservoir polygon sources combined in GeoDAR (i.e. GRanD, HydroLAKES, and UCLA Circa 2015) against surface areas reported by ICOLD.
The validation shows a systematic underrepresentation of the real reservoir area by GeoDAR polygons (mean bias = −18.8%).

Similar articles

Cited by

References

    1. Islam, Md. S. et al. A grid-based assessment of global water scarcity including virtual water trading. Water Resour. Manag.21, 19–33 (2007).
    1. Deichmann, U., Meisner, C., Murray, S. & Wheeler, D. The economics of renewable energy expansion in rural Sub-Saharan Africa. Energy Policy39, 215–227 (2011).
    1. Hierink, F. et al. Differences between gridded population data impact measures of geographic access to healthcare in sub-Saharan Africa. Commun. Med.2, 1–13 (2022). - PMC - PubMed
    1. World Bank. Measuring rural access: using new technologies. https://documents.worldbank.org/en/publication/documents-reports/documen... (2016).
    1. Bergroth, C., Järv, O., Tenkanen, H., Manninen, M. & Toivonen, T. A 24-hour population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland. Sci. Data9, 39 (2022). - PMC - PubMed