Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Sep;72(3):865-76.
doi: 10.1111/biom.12493. Epub 2016 Feb 17.

Estimating optimal shared-parameter dynamic regimens with application to a multistage depression clinical trial

Affiliations

Estimating optimal shared-parameter dynamic regimens with application to a multistage depression clinical trial

Bibhas Chakraborty et al. Biometrics. 2016 Sep.

Abstract

A dynamic treatment regimen consists of decision rules that recommend how to individualize treatment to patients based on available treatment and covariate history. In many scientific domains, these decision rules are shared across stages of intervention. As an illustrative example, we discuss STAR*D, a multistage randomized clinical trial for treating major depression. Estimating these shared decision rules often amounts to estimating parameters indexing the decision rules that are shared across stages. In this article, we propose a novel simultaneous estimation procedure for the shared parameters based on Q-learning. We provide an extensive simulation study to illustrate the merit of the proposed method over simple competitors, in terms of the treatment allocation matching of the procedure with the "oracle" procedure, defined as the one that makes treatment recommendations based on the true parameter values as opposed to their estimates. We also look at bias and mean squared error of the individual parameter-estimates as secondary metrics. Finally, we analyze the STAR*D data using the proposed method.

Keywords: Dynamic treatment regimens; Q-learning; STAR*D; Shared parameters.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A schematic of the treatment assignment algorithm in the STAR*D study. An “R” within a circle denotes randomization.
Figure 2
Figure 2
Convergence patterns of ψ0, ψ1, ψ2, ψ3 and ψ4 for the five versions of the Q-shared method (corresponding to five initial values) in the STAR*D study.
Figure 3
Figure 3
Confidence planes for the contrast functions and resulting regions of varying recommended optimal treatments in the (QIDS.start, QIDS.slope) plane, for subjects who have experienced low side effect intensity (0) and were treated with combination therapy (−1) at the previous stage, based on the Q-shared method for estimation and m-out-of-n bootstrap for inference.

References

    1. Antos A, Szepesvari C, Munos R. Learning near-optimal policies with bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning. 2008;71:89–129.
    1. Cain L, Robins J, Lanoy E, Logan R, Costagliola D, Hernán M. When to start treatment? A systematic approach to the comparison of dynamic regimes using observational data. The International Journal of Biostatistics. 2010;6 - PMC - PubMed
    1. Chakraborty B, Laber E, Zhao Y. Inference for optimal dynamic treatment regimes using an adaptive m-out-of-n bootstrap scheme. Biometrics. 2013;69:714–723. - PMC - PubMed
    1. Chakraborty B, Moodie E. Statistical Methods for Dynamic Treatment Regimes: Reinforcement Learning, Causal Inference, and Personalized Medicine. Springer; New York: 2013.
    1. Ernst D, Geurts P, Wehenkel L. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research. 2005;6:503–556.

Publication types