. 2022 Jun 16;19(1):217-238.

doi: 10.1515/ijb-2020-0127. eCollection 2023 May 1.

The optimal dynamic treatment rule superlearner: considerations, performance, and application to criminal justice interventions

Lina M Montoya¹, Mark J van der Laan², Alexander R Luedtke³, Jennifer L Skeem⁴, Jeremy R Coyle², Maya L Petersen⁵

Affiliations

¹ Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
² Division of Biostatistics, University of California Berkeley, Berkeley, USA.
³ Department of Statistics, University of Washington, Seattle, USA.
⁴ School of Social Work and Goldman School of Public Policy, University of California Berkeley, Berkeley, USA.
⁵ Divisions of Biostatistics and Epidemiology, University of California Berkeley, Berkeley, USA.

PMID: 35708222
PMCID: PMC10238854
DOI: 10.1515/ijb-2020-0127

The optimal dynamic treatment rule superlearner: considerations, performance, and application to criminal justice interventions

Lina M Montoya et al. Int J Biostat. 2022.

. 2022 Jun 16;19(1):217-238.

doi: 10.1515/ijb-2020-0127. eCollection 2023 May 1.

Authors

Lina M Montoya¹, Mark J van der Laan², Alexander R Luedtke³, Jennifer L Skeem⁴, Jeremy R Coyle², Maya L Petersen⁵

Affiliations

¹ Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
² Division of Biostatistics, University of California Berkeley, Berkeley, USA.
³ Department of Statistics, University of Washington, Seattle, USA.
⁴ School of Social Work and Goldman School of Public Policy, University of California Berkeley, Berkeley, USA.
⁵ Divisions of Biostatistics and Epidemiology, University of California Berkeley, Berkeley, USA.

PMID: 35708222
PMCID: PMC10238854
DOI: 10.1515/ijb-2020-0127

Abstract

The optimal dynamic treatment rule (ODTR) framework offers an approach for understanding which kinds of patients respond best to specific treatments - in other words, treatment effect heterogeneity. Recently, there has been a proliferation of methods for estimating the ODTR. One such method is an extension of the SuperLearner algorithm - an ensemble method to optimally combine candidate algorithms extensively used in prediction problems - to ODTRs. Following the ``causal roadmap," we causally and statistically define the ODTR and provide an introduction to estimating it using the ODTR SuperLearner. Additionally, we highlight practical choices when implementing the algorithm, including choice of candidate algorithms, metalearners to combine the candidates, and risk functions to select the best combination of algorithms. Using simulations, we illustrate how estimating the ODTR using this SuperLearner approach can uncover treatment effect heterogeneity more effectively than traditional approaches based on fitting a parametric regression of the outcome on the treatment, covariates and treatment-covariate interactions. We investigate the implications of choices in implementing an ODTR SuperLearner at various sample sizes. Our results show the advantages of: (1) including a combination of both flexible machine learning algorithms and simple parametric estimators in the library of candidate algorithms; (2) using an ensemble metalearner to combine candidates rather than selecting only the best-performing candidate; (3) using the mean outcome under the rule as a risk function. Finally, we apply the ODTR SuperLearner to the ``Interventions" study, an ongoing randomized controlled trial, to identify which justice-involved adults with mental illness benefit most from cognitive behavioral therapy to reduce criminal re-offending.

Keywords: causal roadmap; heterogeneous treatment effects; optimal dynamic treatment rules; precision health; superlearner.

PubMed Disclaimer

Figures

**Figure 1:**
Performance of candidate estimators of the ODTR. Plot shows mean and 2.5th and 97.5th quantiles of the empirical mean of the true expected counterfactual outcome under the estimated ODTR, i.e., $E_{n} [Q_{0} (Y | A = d_{n}^{*}, W)]$ , an approximation of $E_{0} [Q_{0} (Y | A = d_{n}^{*} (W), W)]$ , for DGP 1 (top two) and DGP 2 (bottom two). The horizontal black line depicts $E_{P_{U, X}} [Y_{d_{0}^{*}}]$ ; red line depicts $E_{P_{U, X}} [Y_{1}]$ ; blue line depicts $E_{P_{U, X}} [Y_{0}]$ (where sometimes the blue and red lines coincide and thus overlap). We compare the ODTR SuperLearner to an incorrectly specified GLM (in gray, with N/A as the metalearner and a diamond with no fill). We also compare (1) having a SuperLearner library with (a) only algorithms that estimate the blip (i.e., “Blip only” libraries) that only have parametric algorithms (blue) or only have machine-learning blip algorithms (red) or both (purple) versus (b) an expanded or “Full” library with blip function regressions estimated via machine learning only (orange-yellow) or machine learning and parametric algorithms (green), with methods that directly estimate the ODTR and static rules, (2) having a metalearner (depicted on the x-axis) either that chooses one algorithm (i.e., the “discrete” SuperLearner) or combines blip predictions/treatment predictions (i.e., the “continuous” SuperLearner), and (3) using the MSE risk function (R _MSE as a square) versus the mean outcome under the candidate rule risk function ( $R_{E [Y_{d}]}$ as a triangle). The percent match at the top of each plot reports the average across simulation repetitions of the percent of the sample assigned their true optimal treatment by the estimated rule.

**Figure 2:**
Subgroup plots for each of the covariates for the “Interventions” data. The x-axis for each of the plots is the different levels of the covariates; the y-axis is the difference in proportion of people who were not re-arrested between those who received CBT versus TAU, in that covariate subgroup.

**Figure 3:**
Distribution of predicted blip estimates from the ODTR SuperLearner. The frequencies are divided into three groups because the ODTR SuperLearner allocated all coefficient weights to a GLM using substance use, a variable with only 3 treatment levels. One can interpret the ODTR SuperLearner for this sample as follows: CBT may reduce the probability of re-arrest among justice-involved adults with low levels of substance use. Estimation and inference of the value of the ODTR SuperLearner compared to, for example, treating everyone or no one, informs us if there is, in fact, a differential effect by substance use, and thus a benefit to assigning CBT in this individualized way.

See this image and copyright information in PMC

Cited by

Adapt for Adolescents: Protocol for a sequential multiple assignment randomized trial to improve retention and viral suppression among adolescents and young adults living with HIV in Kenya.
Abuogi LL, Kulzer JL, Akama E, Odeny TA, Eshun-Wilson I, Petersen M, Shade SB, Montoya LM, Beres LK, Iguna S, Adhiambo HF, Osoro J, Opondo I, Sang N, Kwena Z, Bukusi EA, Geng EH. Abuogi LL, et al. Contemp Clin Trials. 2023 Apr;127:107123. doi: 10.1016/j.cct.2023.107123. Epub 2023 Feb 20. Contemp Clin Trials. 2023. PMID: 36813086 Free PMC article.
Learning optimal dynamic treatment regimes from longitudinal data.
Williams NT, Hoffman KL, Díaz I, Rudolph KE. Williams NT, et al. Am J Epidemiol. 2024 Dec 2;193(12):1768-1775. doi: 10.1093/aje/kwae122. Am J Epidemiol. 2024. PMID: 38879744 Free PMC article.
Machine learning for estimating and comparing clinical rules for treating diarrheal illness with antibiotics.
Codi A, Kim S, McQuade ER, Benkeser D; AntiBiotics for Children with severe Diarrhea (ABCD) Study Group. Codi A, et al. medRxiv [Preprint]. 2025 Jan 12:2025.01.10.25320357. doi: 10.1101/2025.01.10.25320357. medRxiv. 2025. PMID: 39830249 Free PMC article. Preprint.
Estimators for the value of the optimal dynamic treatment rule with application to criminal justice interventions.
Montoya LM, van der Laan MJ, Skeem JL, Petersen ML. Montoya LM, et al. Int J Biostat. 2022 Jun 6;19(1):239-259. doi: 10.1515/ijb-2020-0128. eCollection 2023 May 1. Int J Biostat. 2022. PMID: 35659857 Free PMC article.

References

1. Dahabreh IJ, Hayward R, Kent DM. Using group data to treat individuals: understanding heterogeneous treatment effects in the age of precision medicine and patient-centred evidence. Int J Epidemiol. 2016;45:2184–93. - PMC - PubMed
1. Kent DM, Steyerberg E, van Klaveren D. Personalized evidence based medicine: predictive approaches to heterogeneous treatment effects. Br Med J. 2018;363:k4245. - PMC - PubMed
1. Kosorok MR, Laber EB. Precision medicine. Annu Rev Stat Appl. 2019;6:263–86. - PMC - PubMed
1. Skeem JL, Manchak S, Peterson JK. Correctional policy for offenders with mental illness: creating a new paradigm for recidivism reduction. Law Hum Behav. 2011;35:110–26. - PubMed
1. Skeem JL, Winter E, Kennealy PJ, Louden JE, Tatar JR. Offenders with mental illness have criminogenic needs, too: toward recidivism reduction. Law Hum Behav. 2014;38:212–24. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The optimal dynamic treatment rule superlearner: considerations, performance, and application to criminal justice interventions

Affiliations

The optimal dynamic treatment rule superlearner: considerations, performance, and application to criminal justice interventions

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources