Random Forests Approach for Causal Inference with Clustered Observational Data
- PMID: 32856937
- DOI: 10.1080/00273171.2020.1808437
Random Forests Approach for Causal Inference with Clustered Observational Data
Abstract
There is a growing interest in using machine learning (ML) methods for causal inference due to their (nearly) automatic and flexible ability to model key quantities such as the propensity score or the outcome model. Unfortunately, most ML methods for causal inference have been studied under single-level settings where all individuals are independent of each other and there is little work in using these methods with clustered or nested data, a common setting in education studies. This paper investigates using one particular ML method based on random forests known as Causal Forests to estimate treatment effects in multilevel observational data. We conduct simulation studies under different types of multilevel data, including two-level, three-level, and cross-classified data. Our simulation study shows that when the ML method is supplemented with estimated propensity scores from multilevel models that account for clustered/hierarchical structure, the modified ML method outperforms preexisting methods in a wide variety of settings. We conclude by estimating the effect of private math lessons in the Trends in International Mathematics and Science Study data, a large-scale educational assessment where students are nested within schools.
Keywords: Causal inference; hierarchical linear modeling; machine learning methods; multilevel observational data; multilevel propensity score matching.
Similar articles
-
Propensity score methods for observational studies with clustered data: A review.Stat Med. 2022 Aug 15;41(18):3612-3626. doi: 10.1002/sim.9437. Epub 2022 May 23. Stat Med. 2022. PMID: 35603766 Free PMC article. Review.
-
An overview of propensity score matching methods for clustered data.Stat Methods Med Res. 2023 Apr;32(4):641-655. doi: 10.1177/09622802221133556. Epub 2022 Nov 25. Stat Methods Med Res. 2023. PMID: 36426585 Free PMC article.
-
Robust Machine Learning for Treatment Effects in Multilevel Observational Studies Under Cluster-level Unmeasured Confounding.Psychometrika. 2022 Mar;87(1):310-343. doi: 10.1007/s11336-021-09805-x. Epub 2021 Oct 15. Psychometrika. 2022. PMID: 34652613
-
Causal Inference with Multilevel Data: A Comparison of Different Propensity Score Weighting Approaches.Multivariate Behav Res. 2022 Nov-Dec;57(6):916-939. doi: 10.1080/00273171.2021.1925521. Epub 2021 Jun 15. Multivariate Behav Res. 2022. PMID: 34128730 Review.
-
Parametric and nonparametric propensity score estimation in multilevel observational studies.Stat Med. 2023 Oct 15;42(23):4147-4176. doi: 10.1002/sim.9852. Epub 2023 Aug 2. Stat Med. 2023. PMID: 37532119
Cited by
-
Propensity score methods for observational studies with clustered data: A review.Stat Med. 2022 Aug 15;41(18):3612-3626. doi: 10.1002/sim.9437. Epub 2022 May 23. Stat Med. 2022. PMID: 35603766 Free PMC article. Review.
-
Inverse probability weighting for causal inference in hierarchical data.BMC Med Res Methodol. 2025 Aug 1;25(1):185. doi: 10.1186/s12874-025-02627-w. BMC Med Res Methodol. 2025. PMID: 40751124 Free PMC article.
-
Detecting heterogeneity in the causal direction of dependence: A model-based recursive partitioning approach.Behav Res Methods. 2024 Apr;56(4):2711-2730. doi: 10.3758/s13428-023-02253-8. Epub 2023 Oct 19. Behav Res Methods. 2024. PMID: 37858004
-
Causal Forest Machine Learning Analysis of Parkinson's Disease in Resting-State Functional Magnetic Resonance Imaging.Tomography. 2024 Jun 6;10(6):894-911. doi: 10.3390/tomography10060068. Tomography. 2024. PMID: 38921945 Free PMC article.
-
Designing Optimal, Data-Driven Policies from Multisite Randomized Trials.Psychometrika. 2023 Dec;88(4):1171-1196. doi: 10.1007/s11336-023-09937-2. Epub 2023 Oct 24. Psychometrika. 2023. PMID: 37874510
MeSH terms
LinkOut - more resources
Full Text Sources