The use of propensity scores to assess the generalizability of results from randomized trials

Elizabeth A Stuart¹, Stephen R Cole², Catherine P Bradshaw³, Philip J Leaf³

Affiliations

¹ Johns Hopkins Bloomberg School of Public Health Departments of Mental Health and Biostatistics, 624 N Broadway, 8th Floor, Baltimore, MD; estuart@jhsph.edu ; 410-502-6222.
² Department of Epidemiology, Gillings School of Global Public Health and Center for AIDS Research, University of North Carolina, Chapel Hill, NC.
³ Department of Mental Health and Center for the Prevention of Youth Violence, Johns Hopkins Bloomberg School of Public Health, 624 N Broadway, Baltimore, MD.

PMID: 24926156
PMCID: PMC4051511
DOI: 10.1111/j.1467-985X.2010.00673.x

The use of propensity scores to assess the generalizability of results from randomized trials

Elizabeth A Stuart et al. J R Stat Soc Ser A Stat Soc. 2001.

. 2001 Apr 1;174(2):369-386.

doi: 10.1111/j.1467-985X.2010.00673.x.

Authors

Elizabeth A Stuart¹, Stephen R Cole², Catherine P Bradshaw³, Philip J Leaf³

Affiliations

¹ Johns Hopkins Bloomberg School of Public Health Departments of Mental Health and Biostatistics, 624 N Broadway, 8th Floor, Baltimore, MD; estuart@jhsph.edu ; 410-502-6222.
² Department of Epidemiology, Gillings School of Global Public Health and Center for AIDS Research, University of North Carolina, Chapel Hill, NC.
³ Department of Mental Health and Center for the Prevention of Youth Violence, Johns Hopkins Bloomberg School of Public Health, 624 N Broadway, Baltimore, MD.

PMID: 24926156
PMCID: PMC4051511
DOI: 10.1111/j.1467-985X.2010.00673.x

Abstract

Randomized trials remain the most accepted design for estimating the effects of interventions, but they do not necessarily answer a question of primary interest: Will the program be effective in a target population in which it may be implemented? In other words, are the results generalizable? There has been very little statistical research on how to assess the generalizability, or "external validity," of randomized trials. We propose the use of propensity-score-based metrics to quantify the similarity of the participants in a randomized trial and a target population. In this setting the propensity score model predicts participation in the randomized trial, given a set of covariates. The resulting propensity scores are used first to quantify the difference between the trial participants and the target population, and then to match, subclassify, or weight the control group outcomes to the population, assessing how well the propensity score-adjusted outcomes track the outcomes actually observed in the population. These metrics can serve as a first step in assessing the generalizability of results from randomized trials to target populations. This paper lays out these ideas, discusses the assumptions underlying the approach, and illustrates the metrics using data on the evaluation of a schoolwide prevention program called Positive Behavioral Interventions and Supports.

Keywords: Causal inference; External validity; Positive Behavioral Interventions and Supports; Research synthesis.

PubMed Disclaimer

Figures

**Figure 1**
Distribution of propensity scores among the schools across the state (density plot) and schools in Project Target trial (vertical lines). State population consists of all elementary schools across the state of Maryland not implementing PBIS and not enrolled in the trial.

**Figure 2**
Distribution of propensity score distances between sampled and unsampled schools, where samples of size 37 repeatedly drawn from population of elementary schools in Maryland. Left-hand plot shows simple differences (sampled minus unsampled); right-hand side shows standardized difference, standardized by standard deviation of propensity scores. Vertical line in each shows the value observed for the Project Target trial schools.

**Figure 3**
Observed and predicted outcome values for schools across the state of Maryland. For math and reading scores, numbers shown are percent of children scoring “Proficient” or “Advanced” on the standardized test. Numbers shown for suspensions are the percentage of students suspended in a school year. Black thick line shows observed state averages, where the state population refers to schools across the state not implementing PBIS and not enrolled in the trial. Thin dashed line shows average for control schools in Project Target trial; thin solid line shows weighted average for control schools in trial, with weights calculated from full matching. For all three outcomes, the weighted average tracks the state mean much more closely than the observed average among control schools in the trial.

See this image and copyright information in PMC

References

1. Barrett S, Bradshaw C, Lewis-Palmer T. Maryland state-wide PBIS initiative: Systems, evaluation, and next steps. Journal of Positive Behavior Interventions. 2008;10:105–114.
1. Bradshaw C, Mitchell M, Leaf P. Examining the effects of Schoolwide Positive Behavioral Interventions and Supports on student outcomes: Results from a randomized controlled effectiveness trial in elementary schools. Journal of Positive Behavior Interventions. 2010;12(3):133–148.
1. Bradshaw CP, Koth CW, Thornton LA, Leaf PJ. Altering school climate through school-wise Positive Behavioral Interventions and Supports: Findings from a group-randomized effectiveness trial. Prevention Science. 2009;10:100–115. - PubMed
1. Brookhart MA, Schneeweiss S, Rothman KJ, Glynn RJ, Avorn J, Sturmer T. Variable selection for propensity score models. American Journal of Epidemiology. 2006;163(12):1149–1156. - PMC - PubMed
1. Brown CH, Wang W, Kellam SG, Muthen B, Petras H, Toyinbo P, Poduska J, Ialongo N, Wyman PA, Chamberlain P, Sloboda Z, MacKinnon DP, Windham A, The Prevention Science Methodology Group Methods for testing theory and evaluating impact in randomized field trials: Intent-to-treat analyses for integrating the perspectives of person, place, and time. Drug and Alcohol Dependence. 2008;95:S74–S104. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The use of propensity scores to assess the generalizability of results from randomized trials

Affiliations

The use of propensity scores to assess the generalizability of results from randomized trials

Authors

Affiliations

Abstract

Figures

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources