Q- and A-learning Methods for Estimating Optimal Dynamic Treatment Regimes
- PMID: 25620840
- PMCID: PMC4300556
- DOI: 10.1214/13-STS450
Q- and A-learning Methods for Estimating Optimal Dynamic Treatment Regimes
Abstract
In clinical practice, physicians make a series of treatment decisions over the course of a patient's disease based on his/her baseline and evolving characteristics. A dynamic treatment regime is a set of sequential decision rules that operationalizes this process. Each rule corresponds to a decision point and dictates the next treatment action based on the accrued information. Using existing data, a key goal is estimating the optimal regime, that, if followed by the patient population, would yield the most favorable outcome on average. Q- and A-learning are two main approaches for this purpose. We provide a detailed account of these methods, study their performance, and illustrate them using data from a depression study.
Keywords: Advantage learning; bias-variance tradeoff; model misspecification; personalized medicine; potential outcomes; sequential decision making.
Figures






References
-
- Bather J. Decision Theory: an Introduction to Dynamic Programming and Sequential Decisions. Chichester: Wiley; 2000.
-
- Blatt D, Murphy SA, Zhu J. Technical Report 04-63. The Methodology Center, Pennsylvania State University; 2004. A-learning for approximate planning.
-
- Estimating optimal dynamic treatment regimes with shared decision rules across stages: An extension of Q-learning. 2012 Unpublished manuscript.