A National Synthetic Populations Dataset for the United States
- PMID: 39863626
- PMCID: PMC11762717
- DOI: 10.1038/s41597-025-04380-7
A National Synthetic Populations Dataset for the United States
Abstract
Geospatially explicit and statistically accurate person and household data allow researchers to study community-and neighborhood-level effects and design and test hypotheses that would otherwise not be possible without the generation of synthetic data. In this article, we demonstrate the workflow for generating spatially explicit household- and individual-level synthetic populations for the United States representing the year 2019. We use publicly available U.S. Census American Community Survey (ACS) 5-year estimates from the 2015-2019 ACS. We use Iterative Proportional Fitting (IPF) to create our synthetic population and use the resulting joint counts to sample representative households and people directly from microdata. Our dataset contains records for 120,754,708 households and 303,128,287 individuals across the United States. We spatially allocate households using the Environmental Protection Agency (EPA) Integrated Climate and Land Use Scenarios (ICLUS) project household distribution estimates to create a spatially explicit dataset. Our validation shows strong correlation with original census variables, with many categories reporting a greater than 0.99 Pearson's r correlation coefficient.
© 2025. The Author(s).
Conflict of interest statement
Competing interests: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Figures
References
-
- Raghunathan, T. E. Synthetic Data. Annual Review of Statistics and Its Application8, 129–140, 10.1146/annurev-statistics-040720-031848 (2021). - DOI
-
- Jiang, N., Kavak, H., Kennedy, W. G. & Crooks, A. T. in 2021 Annual Modeling and Simulation Conference (ANNSIM) 1–12 (2021).
-
- Moeckel, R. Constraints in household relocation: Modeling land-use/transport interactions that respect time and monetary budgets. Journal of Transport and Land Use10, 211–228, 10.5198/jtlu.2015.810 (2016). - DOI
Grants and funding
LinkOut - more resources
Full Text Sources
Research Materials
