Simple Strategies for Improving Inference with Linked Data: A Case Study of the 1850-1930 IPUMS Linked Representative Historical Samples
- PMID: 33005066
- PMCID: PMC7523567
- DOI: 10.1080/01615440.2019.1630343
Simple Strategies for Improving Inference with Linked Data: A Case Study of the 1850-1930 IPUMS Linked Representative Historical Samples
Abstract
New large-scale linked data are revolutionizing quantitative history and demography. This paper proposes two complementary strategies for improving inference with linked historical data: the use of validation variables to identify higher quality links and a simple, regression-based weighting procedure to increase the representativeness of custom research samples. We demonstrate the potential value of these strategies using the 1850-1930 Integrated Public Use Microdata Series Linked Representative Samples (IPUMS-LRS)-a high quality, publicly available linked historical dataset. We show that, while incorrect linking rates appear low in the IPUMS-LRS, researchers can reduce error rates further using validation variables. We also show how researchers can reweight linked samples to balance observed characteristics in the linked sample with those in a reference population using a simple regression-based procedure.
References
-
- Alsan M, & Goldin C (2015). Watersheds in Infant Mortality: The Role of Effective Water and Sewage Infrastructure, 1880 to 1915. NBER Working Paper 21263.
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources