Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan 25;12(1):144.
doi: 10.1038/s41597-025-04380-7.

A National Synthetic Populations Dataset for the United States

Affiliations

A National Synthetic Populations Dataset for the United States

James Rineer et al. Sci Data. .

Abstract

Geospatially explicit and statistically accurate person and household data allow researchers to study community-and neighborhood-level effects and design and test hypotheses that would otherwise not be possible without the generation of synthetic data. In this article, we demonstrate the workflow for generating spatially explicit household- and individual-level synthetic populations for the United States representing the year 2019. We use publicly available U.S. Census American Community Survey (ACS) 5-year estimates from the 2015-2019 ACS. We use Iterative Proportional Fitting (IPF) to create our synthetic population and use the resulting joint counts to sample representative households and people directly from microdata. Our dataset contains records for 120,754,708 households and 303,128,287 individuals across the United States. We spatially allocate households using the Environmental Protection Agency (EPA) Integrated Climate and Land Use Scenarios (ICLUS) project household distribution estimates to create a spatially explicit dataset. Our validation shows strong correlation with original census variables, with many categories reporting a greater than 0.99 Pearson's r correlation coefficient.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Fig. 1
Fig. 1
2019 synthetic population generation workflow.
Fig. 2
Fig. 2
(a) We distribute out synthetic households according to the EPA ICLUS v1.3 household density grid. For each block group, the number of synthetic households were distributed according the probability surface of the density grid, and then individual household points were uniformly distributed within each grid cell to arrive at the final household coordinates. (b) Each household record is linked to one or more person records by a unique household ID. Note: HH = household.
Fig. 3
Fig. 3
Block groups containing group quarters populations show the highest level of error in our dataset. As the percent of group quarters populations within the block group increases, so does our total person count error. The percent error in block group synthetic person counts compared with the percent of the block group population in group quarters.
Fig. 4
Fig. 4
The percent difference of household synthetic population variables compare to ACS data, by variable, aggregated at the state level.
Fig. 5
Fig. 5
Our synthetic populations data have been used to identify hotspots of opioid use in relation to the location of treatment facilities.

References

    1. Raghunathan, T. E. Synthetic Data. Annual Review of Statistics and Its Application8, 129–140, 10.1146/annurev-statistics-040720-031848 (2021). - DOI
    1. Jiang, N., Kavak, H., Kennedy, W. G. & Crooks, A. T. in 2021 Annual Modeling and Simulation Conference (ANNSIM) 1–12 (2021).
    1. Kokosi, T. et al. An overview on synthetic administrative data for research. International Journal of Population Data Science710.23889/ijpds.v7i1.1727 (2022). - PMC - PubMed
    1. Wu, G., Heppenstall, A., Meier, P., Purshouse, R. & Lomax, N. A synthetic population dataset for estimating small area health and socio-economic outcomes in Great Britain. Sci Data9, 19, 10.1038/s41597-022-01124-9 (2022). - DOI - PMC - PubMed
    1. Moeckel, R. Constraints in household relocation: Modeling land-use/transport interactions that respect time and monetary budgets. Journal of Transport and Land Use10, 211–228, 10.5198/jtlu.2015.810 (2016). - DOI

LinkOut - more resources