Interactive Q-learning for Quantiles

Kristin A Linn¹, Eric B Laber², Leonard A Stefanski²

Affiliations

¹ Department of Biostatistics and Epidemiology, University of Pennsylvania, Philadelphia, PA 19104.
² Department of Statistics, North Carolina State University, Raleigh, NC 27695.

PMID: 28890584
PMCID: PMC5586239
DOI: 10.1080/01621459.2016.1155993

Interactive Q-learning for Quantiles

Kristin A Linn et al. J Am Stat Assoc. 2017.

. 2017;112(518):638-649.

doi: 10.1080/01621459.2016.1155993. Epub 2017 Mar 31.

Authors

Kristin A Linn¹, Eric B Laber², Leonard A Stefanski²

Affiliations

¹ Department of Biostatistics and Epidemiology, University of Pennsylvania, Philadelphia, PA 19104.
² Department of Statistics, North Carolina State University, Raleigh, NC 27695.

PMID: 28890584
PMCID: PMC5586239
DOI: 10.1080/01621459.2016.1155993

Abstract

A dynamic treatment regime is a sequence of decision rules, each of which recommends treatment based on features of patient medical history such as past treatments and outcomes. Existing methods for estimating optimal dynamic treatment regimes from data optimize the mean of a response variable. However, the mean may not always be the most appropriate summary of performance. We derive estimators of decision rules for optimizing probabilities and quantiles computed with respect to the response distribution for two-stage, binary treatment settings. This enables estimation of dynamic treatment regimes that optimize the cumulative distribution function of the response at a prespecified point or a prespecified quantile of the response distribution such as the median. The proposed methods perform favorably in simulation experiments. We illustrate our approach with data from a sequentially randomized trial where the primary outcome is remission of depression symptoms.

Keywords: Dynamic Treatment Regime; Personalized Medicine; Sequential Decision Making; Sequential Multiple Assignment Randomized Trial.

PubMed Disclaimer

Figures

**Figure 1**
*Left to Right: λ* = −2, 2, 4. Solid black, true optimal threshold probabilities; dotted black, probabilites under randomization; dashed with circles/squares/crossed squares/triangles, probabilities under TIQ-, Q-, binary Q-, and Interactive Q-learning, respectively.

**Figure 2**
*From left:* True optimal first-stage treatments for 1,000 test set patients when λ = −4, −3, …, 4, coded light gray when $π_{1, λ}^{TIQ} (h_{1}) = 1$ and dark gray otherwise; TIQ-learning estimated optimal first-stage treatments; Q-learning estimated optimal first-stage treatments, plotted constant in λ to aid visual comparison; and binary Q-learning estimated optimal first-stage treatments for each λ.

**Figure 3**
*Left to Right: τ* = 0.1, 0.5, 0.75. Solid black, true optimal quantiles; dotten black, quantiles under randomization; dashed with circles/squares/triangles, quantiles under QIQ-, Q-, and IQ-learning, respectively.

**Figure 4**
*Left to Right: τ* = 0.1,0.5,0.75. Solid black, true optimal threshold probabilities; dotted black, probabilites under randomization; dashed with circles/squares/triangles, probabilities under TIQ-, Q-, and Interactive Q-learning, respectively. Training set size of n = 500.

See this image and copyright information in PMC

References

1. Bembom O, van der Laan MJ. Analyzing sequentially randomized trials based on causal effect models for realistic individualized treatment rules. Statistics in medicine. 2008;27(19):3689–3716. - PubMed
1. Berkowitz RI, Wadden TA, Gehrman CA, Bishop-Gilyard CT, Moore RH, Womble LG, Cronquist JL, Trumpikas NL, Katz LEL, Xanthopoulos MS. Meal Replacements in the Treatment of Adolescent Obesity: A Randomized Controlled Trial. Obesity. 2010;19(6):1193–1199. - PMC - PubMed
1. Carroll RJ, Ruppert D. Transformation and Weighting in Regression. New York: Chapman and Hall; 1988.
1. Chakraborty B, Moodie EE. Statistical Methods for Dynamic Treatment Regimes: Reinforcement Learning, Causal Inference, and Personalized Medicine. Vol. 76. Springer Science & Business Media; 2013.
1. Chakraborty B, Murphy SA, Strecher VJ. Inference for Non-Regular Parameters in Optimal Dynamic Treatment Regimes. Statistical Methods in Medical Research. 2010;19(3):317–343. - PMC - PubMed

Publication types

Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Other Literature Sources
- figshare - Access datasets and other research materials.
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Interactive Q-learning for Quantiles

Affiliations

Interactive Q-learning for Quantiles

Authors

Affiliations

Abstract

Figures

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous