Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Nov;29(4):640-661.
doi: 10.1214/13-STS450.

Q- and A-learning Methods for Estimating Optimal Dynamic Treatment Regimes

Affiliations

Q- and A-learning Methods for Estimating Optimal Dynamic Treatment Regimes

Phillip J Schulte et al. Stat Sci. 2014 Nov.

Abstract

In clinical practice, physicians make a series of treatment decisions over the course of a patient's disease based on his/her baseline and evolving characteristics. A dynamic treatment regime is a set of sequential decision rules that operationalizes this process. Each rule corresponds to a decision point and dictates the next treatment action based on the accrued information. Using existing data, a key goal is estimating the optimal regime, that, if followed by the patient population, would yield the most favorable outcome on average. Q- and A-learning are two main approaches for this purpose. We provide a detailed account of these methods, study their performance, and illustrate them using data from a depression study.

Keywords: Advantage learning; bias-variance tradeoff; model misspecification; personalized medicine; potential outcomes; sequential decision making.

PubMed Disclaimer

Figures

Fig 1
Fig 1
Monte Carlo MSE ratios for estimators of components of ψ1 (left and center panels) and efficiencies R(d^Qopt)andR(d^Aopt) for estimating the true dopt (right panel) under misspecification of the propensity model. MSE ratios > 1 favor Q-learning
Fig 2
Fig 2
Monte Carlo MSE ratios for estimators of components of ψ1 (left and center panels) and efficiencies R(d^Qopt)andR(d^Aopt) for estimating the true dopt (right panel) under misspecification of the Q-function. MSE ratios > 1 favor Q-learning
Fig 3
Fig 3
Monte Carlo MSE ratios for estimators of components of ψ1 (left and center panels) and efficiencies R(d^Qopt)andR(d^Aopt) for estimating the true dopt (right panel) under misspecification of both the propensity model and the Q-function. MSE ratios > 1 favor Q-learning
Fig 4
Fig 4
Monte Carlo MSE ratios for estimators of components of ψ2 and ψ1 (upper row and lower row left and center panels) and efficiencies R(d^Qopt)andR(d^Aopt) for estimating the true dopt (lower right panel) under misspecification of the propensity model. MSE ratios > 1 favor Q-learning
Fig 5
Fig 5
Monte Carlo MSE ratios for estimators of components ofψ2 and ψ1 (upper row and lower row left and center panels) and efficiencies R(d^Qopt)andR(d^Aopt) for estimating the true dopt (lower right panel) under misspecification of the Q-functions. MSE ratios > 1 favor Q-learning
Fig 6
Fig 6
Monte Carlo MSE ratios for estimators of components of ψ2 and ψ1 (upper row and lower row left and center panels) and efficiencies R(d^Qopt)andR(d^Aopt) for estimating the true dopt (lower right panel) under misspecification of both the propensity models and Q-functions. MSE ratios > 1 favor Q-learning

References

    1. Almirall D, Ten Have T, Murphy SA. Structural nested mean models for assessing time-varying effect moderation. Biometrics. 2010;66:131–139. - PMC - PubMed
    1. Bather J. Decision Theory: an Introduction to Dynamic Programming and Sequential Decisions. Chichester: Wiley; 2000.
    1. Blatt D, Murphy SA, Zhu J. Technical Report 04-63. The Methodology Center, Pennsylvania State University; 2004. A-learning for approximate planning.
    1. Estimating optimal dynamic treatment regimes with shared decision rules across stages: An extension of Q-learning. 2012 Unpublished manuscript.
    1. Chakraborty B, Murphy SA, Strecher V. Inference for non-regular parameters in optimal dynamic treatment regimes. Statistical Methods in Medical Research. 2010;19:317–343. - PMC - PubMed