Machine learning in causal inference for epidemiology

Chiara Moccia¹, Giovenale Moirano², Maja Popovic², Costanza Pizzi², Piero Fariselli³, Lorenzo Richiardi², Claus Thorn Ekstrøm⁴, Milena Maule²

Affiliations

¹ Cancer Epidemiology Unit, Department of Medical Sciences, University of Turin and CPO Piedmont, Via Santena 7, Turin, 10126, Italy. chiara.moccia@unito.it.
² Cancer Epidemiology Unit, Department of Medical Sciences, University of Turin and CPO Piedmont, Via Santena 7, Turin, 10126, Italy.
³ Department of Medical Sciences, University of Turin, Turin, Italy.
⁴ Section of Biostatistics, Department of Public Health, University of Copenhagen, Copenhagen, Denmark.

PMID: 39535572
PMCID: PMC11599438
DOI: 10.1007/s10654-024-01173-x

Machine learning in causal inference for epidemiology

Chiara Moccia et al. Eur J Epidemiol. 2024 Oct.

. 2024 Oct;39(10):1097-1108.

doi: 10.1007/s10654-024-01173-x. Epub 2024 Nov 13.

Authors

Chiara Moccia¹, Giovenale Moirano², Maja Popovic², Costanza Pizzi², Piero Fariselli³, Lorenzo Richiardi², Claus Thorn Ekstrøm⁴, Milena Maule²

Affiliations

¹ Cancer Epidemiology Unit, Department of Medical Sciences, University of Turin and CPO Piedmont, Via Santena 7, Turin, 10126, Italy. chiara.moccia@unito.it.
² Cancer Epidemiology Unit, Department of Medical Sciences, University of Turin and CPO Piedmont, Via Santena 7, Turin, 10126, Italy.
³ Department of Medical Sciences, University of Turin, Turin, Italy.
⁴ Section of Biostatistics, Department of Public Health, University of Copenhagen, Copenhagen, Denmark.

PMID: 39535572
PMCID: PMC11599438
DOI: 10.1007/s10654-024-01173-x

Abstract

In causal inference, parametric models are usually employed to address causal questions estimating the effect of interest. However, parametric models rely on the correct model specification assumption that, if not met, leads to biased effect estimates. Correct model specification is challenging, especially in high-dimensional settings. Incorporating Machine Learning (ML) into causal analyses may reduce the bias arising from model misspecification, since ML methods do not require the specification of a functional form of the relationship between variables. However, when ML predictions are directly plugged in a predefined formula of the effect of interest, there is the risk of introducing a "plug-in bias" in the effect measure. To overcome this problem and to achieve useful asymptotic properties, new estimators that combine the predictive potential of ML and the ability of traditional statistical methods to make inference about population parameters have been proposed. For epidemiologists interested in taking advantage of ML for causal inference investigations, we provide an overview of three estimators that represent the current state-of-art, namely Targeted Maximum Likelihood Estimation (TMLE), Augmented Inverse Probability Weighting (AIPW) and Double/Debiased Machine Learning (DML).

Keywords: Causal inference; Doubly-robustness; Machine learning; Targeted learning.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval: Not applicable. Competing interests: The authors have no relevant financial or non-financial interests to disclose.

Figures

**Fig. 1**
Visual synthesis of the article. In A, the different steps of a causal inference framework. In B, estimators for causal effect that integrate Machine Learning methods, bridging the gap between statistical inference and Machine Learning

See this image and copyright information in PMC

References

1. Adlung L, Cohen Y, Mor U, Elinav E. Machine learning in clinical decision making. Med. 2021;2(6):642–65. - PubMed
1. Kino S, Hsu YT, Shiba K, Chien YS, Mita C, Kawachi I, Daoud A. A scoping review on the use of machine learning in research on social determinants of health: Trends and research prospects. SSM-population Health. 2021;15:100836. - PMC - PubMed
1. van Boven MR, Henke CE, Leemhuis AG, Hoogendoorn M, van Kaam AH, Königs M, Oosterlaan J. (2022). Machine learning prediction models for neurodevelopmental outcome after preterm birth: a scoping review and new machine learning evaluation framework. Pediatrics, 150(1), e2021056052. - PubMed
1. Naimi AI, Cole SR, Kennedy EH. An introduction to g methods. Int J Epidemiol. 2017;46(2):756–62. - PMC - PubMed
1. Kennedy EH. (2022). Semiparametric doubly robust targeted double machine learning: a review. arXiv preprint arXiv:2203.06469.

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- PubMed Central
- Springer
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Machine learning in causal inference for epidemiology

Affiliations

Machine learning in causal inference for epidemiology

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous