Confounding and regression adjustment in difference-in-differences studies

Bret Zeldow¹, Laura A Hatfield²

Affiliations

¹ Department of Mathematics and Statistics, Colby College, Waterville, Maine, USA.
² Department of Health Care Policy, Harvard Medical School, Boston, Massachusetts, USA.

PMID: 33978956
PMCID: PMC8522571
DOI: 10.1111/1475-6773.13666

Confounding and regression adjustment in difference-in-differences studies

Bret Zeldow et al. Health Serv Res. 2021 Oct.

. 2021 Oct;56(5):932-941.

doi: 10.1111/1475-6773.13666. Epub 2021 May 12.

Authors

Bret Zeldow¹, Laura A Hatfield²

Affiliations

¹ Department of Mathematics and Statistics, Colby College, Waterville, Maine, USA.
² Department of Health Care Policy, Harvard Medical School, Boston, Massachusetts, USA.

PMID: 33978956
PMCID: PMC8522571
DOI: 10.1111/1475-6773.13666

Abstract

Objective: To define confounding bias in difference-in-difference studies and compare regression- and matching-based estimators designed to correct bias due to observed confounders.

Data sources: We simulated data from linear models that incorporated different confounding relationships: time-invariant covariates with a time-varying effect on the outcome, time-varying covariates with a constant effect on the outcome, and time-varying covariates with a time-varying effect on the outcome. We considered a simple setting that is common in the applied literature: treatment is introduced at a single time point and there is no unobserved treatment effect heterogeneity.

Study design: We compared the bias and root mean squared error of treatment effect estimates from six model specifications, including simple linear regression models and matching techniques.

Data collection: Simulation code is provided for replication.

Principal findings: Confounders in difference-in-differences are covariates that change differently over time in the treated and comparison group or have a time-varying effect on the outcome. When such a confounding variable is measured, appropriately adjusting for this confounder (ie, including the confounder in a regression model that is consistent with the causal model) can provide unbiased estimates with optimal SE. However, when a time-varying confounder is affected by treatment, recovering an unbiased causal effect using difference-in-differences is difficult.

Conclusions: Confounding in difference-in-differences is more complicated than in cross-sectional settings, from which techniques and intuition to address observed confounding cannot be imported wholesale. Instead, analysts should begin by postulating a causal model that relates covariates, both time-varying and those with time-varying effects on the outcome, to treatment. This causal model will then guide the specification of an appropriate analytical model (eg, using regression or matching) that can produce unbiased treatment effect estimates. We emphasize the importance of thoughtful incorporation of covariates to address confounding bias in difference-in-difference studies.

Keywords: difference-in-differences; matching; parallel trends; regression adjustment; time-varying confounding.

PubMed Disclaimer

Figures

**FIGURE 1**
Adjusting for the main effect of a covariate does not correct for diverging trends, but adjusting for its interaction with time does. Legend: In this simulated example, untreated potential outcomes depend on a time‐invariant covariate with a time‐varying effect. Panel A shows mean untreated potential outcomes by group. Panels B to D show residuals from linear models, denoted using pseudo‐code for the function lm, which fits a linear model for outcome y. In panel B, the only predictor is time. In panel C, the predictors are time and the covariate x. In panel D, the predictors are time, the covariate, and their interaction [Color figure can be viewed at wileyonlinelibrary.com]

**FIGURE 2**
Simulation results for a time‐invariant covariate. Legend: Six regression and matching methods were compared across three simulation scenarios. Each panel shows results from 400 simulated datasets of 800 units each. In Scenario 1, the distribution of the covariate varied by treatment group but the covariate's effect on the outcome did not change (ie, no interaction between the covariate and time). In Scenario 2, the covariate's effect on the outcome changed over time. In the third scenario, the distribution of the covariate was the same in the treated and comparison groups, and the covariate's effect on the outcome changed over time. All analyses were assessed on the mean percent bias and mean standard error (SE) of the effect estimate. CA = Covariate‐adjusted; TVA = Time‐varying adjusted

**FIGURE 3**
Simulation results for a time‐varying covariate with a time‐invariant effect on the outcome. Legend: Six regression and matching methods were compared across three simulation scenarios. Each panel shows results from 400 simulated datasets of 800 units each. For all scenarios, the covariate's effect on the outcome was constant over time. In Scenario 4a, the time‐varying covariate evolved in the same way for the treated and comparison group. In Scenario 5a, the covariate evolved differently between the two groups starting from the first timepoint (before treatment was implemented). In Scenario 6a, the covariate evolved the same prior to treatment. Once treatment was implemented, evolution of the covariate diverged relative to the two groups. All analyses were assessed on the mean percent bias and mean standard error (SE) of the effect estimate. CA = Covariate adjusted; TVA = Time‐varying adjusted

**FIGURE 4**
Simulation results for a time‐varying covariate with a time‐varying effect on the outcome. Legend: Six regression and matching methods were compared across three simulation scenarios. Each panel shows results from 400 simulated datasets of 800 units each. For all scenarios, the covariate's effect on the outcome differed across time. In Scenario 4b, the time‐varying covariate evolved in the same way for the treated and comparison group. In Scenario 5b, the covariate evolved differently between the two groups starting from the first timepoint (before treatment was implemented). In Scenario 6b, the covariate evolved the same prior to treatment. Once treatment was implemented, evolution of the covariate diverged relative to the two groups. All analyses were assessed on the mean percent bias and mean standard error (SE) of the effect estimate. CA = Covariate adjusted; TVA = Time‐varying adjusted

See this image and copyright information in PMC

References

1. National Federation of Independent Business v. Sebelius. (2011). www.oyez.org/cases/2011/11-393
1. Antonisse L, Garfield R, Rudowitz R, Artiga S. The effects of Medicaid expansion under the ACA: updated findings from a literature review. Published 2018. https://www.kff.org/medicaid/issue-brief/the-effects-of-medicaid-expansi...
1. VanderWeele TJ, Shpitser I. On the definition of a confounder. Ann Stat. 2013;41(1):196‐220. 10.1214/12-AOS1058. - DOI - PMC - PubMed
1. Abadie A. Semiparametric difference‐in‐differences estimators. Rev Econ Stud. 2005;72:1‐19. 10.1111/0034-6527.00321. - DOI
1. Bilinski A, Hatfield LA. Seeking evidence of absence: Reconsidering tests of model assumptions. ArXiv180503273 Stat. Published online May 8, 2018. Accessed July 23, 2018. http://arxiv.org/abs/1805.03273

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Confounding and regression adjustment in difference-in-differences studies

Affiliations

Confounding and regression adjustment in difference-in-differences studies

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources