Assessing methods for generalizing experimental impact estimates to target populations

Holger L Kern¹, Elizabeth A Stuart², Jennifer Hill³, Donald P Green⁴

Affiliations

¹ Department of Political Science, Florida State University.
² Departments of Mental Health, Biostatistics, and Health, Policy, and Management, Bloomberg School of Public Health, Johns Hopkins University.
³ Professor of Applied Statistics and Data Sciences, Steinhardt School of Culture, Education, and Human Development, New York University.
⁴ Department of Political Science, Columbia University.

PMID: 27668031
PMCID: PMC5030077
DOI: 10.1080/19345747.2015.1060282

Assessing methods for generalizing experimental impact estimates to target populations

Holger L Kern et al. J Res Educ Eff. 2016.

. 2016;9(1):103-127.

doi: 10.1080/19345747.2015.1060282. Epub 2016 Jan 14.

Authors

Holger L Kern¹, Elizabeth A Stuart², Jennifer Hill³, Donald P Green⁴

Affiliations

¹ Department of Political Science, Florida State University.
² Departments of Mental Health, Biostatistics, and Health, Policy, and Management, Bloomberg School of Public Health, Johns Hopkins University.
³ Professor of Applied Statistics and Data Sciences, Steinhardt School of Culture, Education, and Human Development, New York University.
⁴ Department of Political Science, Columbia University.

PMID: 27668031
PMCID: PMC5030077
DOI: 10.1080/19345747.2015.1060282

Abstract

Randomized experiments are considered the gold standard for causal inference, as they can provide unbiased estimates of treatment effects for the experimental participants. However, researchers and policymakers are often interested in using a specific experiment to inform decisions about other target populations. In education research, increasing attention is being paid to the potential lack of generalizability of randomized experiments, as the experimental participants may be unrepresentative of the target population of interest. This paper examines whether generalization may be assisted by statistical methods that adjust for observed differences between the experimental participants and members of a target population. The methods examined include approaches that reweight the experimental data so that participants more closely resemble the target population and methods that utilize models of the outcome. Two simulation studies and one empirical analysis investigate and compare the methods' performance. One simulation uses purely simulated data while the other utilizes data from an evaluation of a school-based dropout prevention program. Our simulations suggest that machine learning methods outperform regression-based methods when the required structural (ignorability) assumptions are satisfied. When these assumptions are violated, all of the methods examined perform poorly. Our empirical analysis uses data from a multi-site experiment to assess how well results from a given site predict impacts in other sites. Using a variety of extrapolation methods, predicted effects for each site are compared to actual benchmarks. Flexible modeling approaches perform best, although linear regression is not far behind. Taken together, these results suggest that flexible modeling techniques can aid generalization while underscoring the fact that even state-of-the-art statistical techniques still rely on strong assumptions.

PubMed Disclaimer

Figures

**Figure 1**
Results for ignorable scenarios 1–4

**Figure 2**
Results for non-ignorable scenarios 5–8

**Figure 3**
Results for Projection of Site-Specific Impacts

See this image and copyright information in PMC

References

1. Agodini Roberto, Dynarski Mark. Are experiments the only option? A look at dropout prevention programs. Review of Economics and Statistics. 2004;86(1):180–194.
1. Benz Matthias, Meier Stephan. Do people behave in experiments as in the field?—evidence from donations. Experimental Economics. 2008;11(3):268–281.
1. Brooks-Gunn Jeanne, Carton Cecilia M, Casey Patrick H, McCormick Marie C, Bauer Charles C, Bernbaum Judy C, Tyson Jon, Swanson Mark, Bennett Forrest C, Scott David T, Tonascia James, Meinert Curtis L. Early Intervention in Low-Birth-Weight Premature Infants: Results Through Age 5 from the Infant Health and Development Program. emph Journal of the American Medical Association. 1994;272:1257–1262. - PubMed
1. Berkowitz Leonard, Donnerstein Edward. External validity is more than skin deep. American Psychologist. 1982;37(3):245–257.
1. Chipman Hugh A, George Edward I, McCulloch Robert E. Bayesian ensemble learning. In: Schölkopf Bernhard, Platt John, Hofmann Thomas., editors. Advances in neural information processing systems. Vol. 19. Cambridge, MA: MIT Press; 2007.

Grants and funding

K25 MH083846/MH/NIMH NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Assessing methods for generalizing experimental impact estimates to target populations

Affiliations

Assessing methods for generalizing experimental impact estimates to target populations

Authors

Affiliations

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources