Introduction to computational causal inference using reproducible Stata, R, and Python code: A tutorial

Affiliations

¹ Inequalities in Cancer Outcomes Network, Department of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK.
² Department of Epidemiology and Biostatistics, Tehran University of Medical Sciences, Tehran, Iran.
³ Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.
⁴ Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.
⁵ Non-communicable Disease and Cancer Epidemiology Group, Instituto de Investigacion Biosanitaria de Granada (ibs.GRANADA), Andalusian School of Public Health, University of Granada, Granada, Spain.
⁶ Biomedical Network Research Centers of Epidemiology and Public Health (CIBERESP), Madrid, Spain.

PMID: 34713468
PMCID: PMC11795351
DOI: 10.1002/sim.9234

Observational Study

Introduction to computational causal inference using reproducible Stata, R, and Python code: A tutorial

Matthew J Smith et al. Stat Med. 2022.

. 2022 Jan 30;41(2):407-432.

doi: 10.1002/sim.9234. Epub 2021 Oct 28.

Authors

Affiliations

¹ Inequalities in Cancer Outcomes Network, Department of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK.
² Department of Epidemiology and Biostatistics, Tehran University of Medical Sciences, Tehran, Iran.
³ Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.
⁴ Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.
⁵ Non-communicable Disease and Cancer Epidemiology Group, Instituto de Investigacion Biosanitaria de Granada (ibs.GRANADA), Andalusian School of Public Health, University of Granada, Granada, Spain.
⁶ Biomedical Network Research Centers of Epidemiology and Public Health (CIBERESP), Madrid, Spain.

PMID: 34713468
PMCID: PMC11795351
DOI: 10.1002/sim.9234

Abstract

The main purpose of many medical studies is to estimate the effects of a treatment or exposure on an outcome. However, it is not always possible to randomize the study participants to a particular treatment, therefore observational study designs may be used. There are major challenges with observational studies; one of which is confounding. Controlling for confounding is commonly performed by direct adjustment of measured confounders; although, sometimes this approach is suboptimal due to modeling assumptions and misspecification. Recent advances in the field of causal inference have dealt with confounding by building on classical standardization methods. However, these recent advances have progressed quickly with a relative paucity of computational-oriented applied tutorials contributing to some confusion in the use of these methods among applied researchers. In this tutorial, we show the computational implementation of different causal inference estimators from a historical perspective where new estimators were developed to overcome the limitations of the previous estimators (ie, nonparametric and parametric g-formula, inverse probability weighting, double-robust, and data-adaptive estimators). We illustrate the implementation of different methods using an empirical example from the Connors study based on intensive care medicine, and most importantly, we provide reproducible and commented code in Stata, R, and Python for researchers to adapt in their own observational study. The code can be accessed at https://github.com/migariane/Tutorial_Computational_Causal_Inference_Estimators.

Keywords: G-methods; causal inference; double-robust methods; g-formula; inverse probability weighting; machine learning; propensity score; regression adjustment; targeted maximum likelihood estimation.

PubMed Disclaimer

Figures

**FIGURE 1**
$Y$ : outcome; $A$ : treatment; $W$ : sufficient set of variables to control for confounding, as outlined in Connors et al

**FIGURE 2**
Propensity score overlap by treatment status

See this image and copyright information in PMC

References

1. Pearl J. Causal inference in statistics: an overview. Stat Surv. 2009;3:96–146.
1. Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol. 1974;66(5):688–701.
1. Robins J. A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Math Model. 1986;7(9):1393–1512.
1. Laan MJ, Rose S. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer Series in Statistics. New York, NY: Springer; 2011.
1. Luque-Fernandez MA, Redondo-Sanchez D, Schomaker M. Effect modification and collapsibility in evaluations of public health interventions. Am J Public Health. 2019;109(3):e12–e13. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

18525/CRUK_/Cancer Research UK/United Kingdom

LinkOut - more resources

Full Text Sources
- Andalusian Health Repository - access to free full text
- PubMed Central
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Introduction to computational causal inference using reproducible Stata, R, and Python code: A tutorial

Affiliations

Introduction to computational causal inference using reproducible Stata, R, and Python code: A tutorial

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources