jmlr jmlr2012 jmlr2012-58 jmlr2012-58-reference knowledge-graph by maker-knowledge-mining

58 jmlr-2012-Linear Fitted-Q Iteration with Multiple Reward Functions

Source: pdf

Author: Daniel J. Lizotte, Michael Bowling, Susan A. Murphy

Abstract: We present a general and detailed development of an algorithm for ﬁnite-horizon ﬁtted-Q iteration with an arbitrary number of reward signals and linear value function approximation using an arbitrary number of state features. This includes a detailed treatment of the 3-reward function case using triangulation primitives from computational geometry and a method for identifying globally dominated actions. We also present an example of how our methods can be used to construct a realworld decision aid by considering symptom reduction, weight gain, and quality of life in sequential treatments for schizophrenia. Finally, we discuss future directions in which to take this work that will further enable our methods to make a positive impact on the ﬁeld of evidence-based clinical decision support. Keywords: reinforcement learning, dynamic programming, decision making, linear regression, preference elicitation

reference text

D. B. Allison, J. L. Mentore, M. Heo, L. P. Chandler, J. C. Cappelleri, M. C. Infante, and P. J. Weiden. Antipsychotic-induced weight gain: A comprehensive research synthesis. American Journal of Psychiatry, 156:1686–1696, November 1999. L. Barrett and S. Narayanan. Learning all optimal policies with multiple criteria. In Proceedings of the 25th International Conference on Machine Learning, pages 41–47, 2008. 3291 L IZOTTE , B OWLING AND M URPHY D. P. Bertsekas and J. N. Tsitsiklis. Neuro-Dynamic Programming, chapter 2.1, page 12. Athena Scientiﬁc, 1996. O. Bonnichsen. Elicitation of ostomy pouch preferences: a discrete-choice experiment. Patient, 4 (3):163–175, 2011. C. Boutilier. A POMDP formulation of preference elicitation problems. In Proceedings of the 18th National Conference on Artiﬁcial Intelligence, pages 239–246, 2002. P. Brass. On the size of higher-dimensional triangulations. In J. E. Goodman, J. Pach, and E. Welzl, editors, Combinatorial and Computational Geometry, volume 52, pages 147–152. MSRI Publications, 2005. A. Castelletti, S. Galelli, M. Restelli, and R. Soncini-Sessa. Tree-based reinforcement learning for optimal water reservoir operation. Water Resources Research, 46(W06502), 2010. CGAL. C GAL, Computational Geometry Algorithms Library, 2011. URL http://www.cgal.org. L. P. Chew. Constrained delaunay triangulations. In Proceedings of the Third Annual Symposium on Computational geometry, SCG ’87, pages 215–222, New York, NY, USA, 1987. R. D. Cook and S. Weisberg. Applied Regression Including Computing and Graphics. Wiley, August 1999. D. A. Cox, D. O’Shea, and J. B. Little. Ideals, Varieties and Algorithms: An Introduction to Computational Algebraic Geometry and Commutative Algebra. Springer, 1997. J. A. Cramer, R. Rosenheck, W. Xu, J. Thomas, W. Henderson, and D. S. Charney. Quality of life in schizophrenia: A comparison of instruments. Schizophrenia Bulletin, 26(3):659–666, 2000. C. C. Davis, M. Claudius, L. A. Palinkas, J. B. Wong, and L. K. Leslie. Putting families in the center: Family perspectives on decision making and ADHD and implications for ADHD care. Journal of Attention Disorders, Oct 2011. E-pub ahead of print. M. de Berg, O. Cheong, M. van Kreveld, and M. Overmars. Computational Geometry Algorithms and Applications. Springer, 3 edition, 2008. M. Ehrgott. Multicriteria Optimization, chapter 3. Springer, second edition, 2005. Z. Gábor, Z. Kalmár, and C. Szepesvári. Multi-criteria reinforcement learning. In Proceedings of the 15th International Conference on Machine Learning, pages 197–205, 1998. G. H. Golub and C. F. Van Loan. Matrix Computation. John Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press, 1996. ISBN 0471935271. B. Grünbaum. Convex Polytopes, volume 221 of Graduate Texts in Mathematics. Springer-Verlag, 1967. ISBN 0387004246;. D. W. Heinrichs, T. E. Hanlon, and W. T. C. Jr. The Quality of Life Scale: An instrument for rating the schizophrenic deﬁcit syndrome. Schizophrenia Bulletin, 10(3):388–398, 1984. 3292 L INEAR F ITTED -Q I TERATION WITH M ULTIPLE R EWARD F UNCTIONS L. P. Kaelbling, M. L. Littman, and A. R. Cassandra. Planning and acting in partially observable stochastic domains. Artiﬁcial Intelligence, 101(12):99–134, 1998. S. R. Kay, A. Fiszbein, and L. A. Opfer. The Positive and Negative Syndrome Scale (PANSS) for schizophrenia. Schizophrenia Bulletin, 13(2):261–276, 1987. P. W. Keller, S. Mannor, and D. Precup. Automatic basis function construction for approximate dynamic programming and reinforcement learning. In Proceedings of the 23rd International Conference on Machine learning, ICML ’06, pages 449–456, 2006. A. Y. Kuk, J. Li, and A. J. Rush. Recursive subsetting to identify patients in the STAR*D: a method to enhance the accuracy of early prediction of treatment outcome and to inform personalized care. Journal of Clinical Psychiatry, 71(11):1502–8, November 2010. E. B. Laber, M. Qian, D. J. Lizotte, and S. A. Murphy. Statistical inference in dynamic treatment regimes. Technical Report 50, Univ. of Michigan Statistics Department, 2009. E. B. Laber, D. J. Lizotte, and B. Ferguson. Set-valued dynamic treatment regimes for competing outcomes. arXiv:1207.3100v2 [stat.ME], 2012. D. J. Lizotte, M. Bowling, and S. A. Murphy. Efﬁcient reinforcement learning with multiple reward functions for randomized controlled trial analysis. In J. Fürnkranz and T. Joachims, editors, Proceedings of the 27th International Conference on Machine Learning, pages 695–702, Haifa, Israel, June 2010. Omnipress. S. Mannor and N. Shimkin. A geometric approach to multi-criterion reinforcement learning. Journal of Machine Learning Research, 5:325–260, 2004. S. Mannor and J. N. Tsitsiklis. Mean-variance optimization in Markov decision processes. In Proceedings of the 28th International Conference on Machine Learning, pages 177–184, 2011. M. S. McDonagh, K. Peterson, S. Carson, R. Fu, and S. Thakurta. Drug class review: Atypical antipsychotic drugs. Technical report, Oregon Health & Science University, July 2010. Drug Effectiveness Review Project, Update 3. J. R. McKay. Treating Substance Use Disorders with Adaptive Continuing Aftercare. American Psychological Association, 2009. A. L. Miller, J. A. Chiles, and J. K. C. et al. The Texas Medication Algorithm Project (TMAP) schizophrenia algorithms. Journal of Clinical Psychiatry, 60:649–657, 1999. S. A. Murphy. An experimental design for the development of adaptive treatment strategies. Statistics in Medicine, 24:1455–1481, 2005. S. A. Murphy, S. W. Oslin, A. J. Rush, and J. Zhu. Methodological challenges in constructing effective treatment sequences for chronic psychiatric disorders. Neuropsychopharmacology, 32: 257–262, 2007. S. Natarajan and P. Tadepalli. Dynamic preferences in multi-criteria reinforcement learning. In Proceedings of the 22nd International Conference on Machine Learning, pages 601–608, 2005. 3293 L IZOTTE , B OWLING AND M URPHY National Institutes of Health. Clinical Guidelines on the Identiﬁcation, and Treatment of Overweight and Obesity in Adults: The Evidence Report. National Heart, Lung, and Blood Institute, Sept 1998. NIH Publication no. 98-4083. A. Y. Ng and S. J. Russell. Algorithms for inverse reinforcement learning. In Proceedings of the 17th International Conference on Machine Learning, pages 663–670, 2000. J. Pineau, G. Gordon, and S. Thrun. Point-based value iteration: An anytime algorithm for POMDPs. In International Joint Conference on Artiﬁcial Intelligence, pages 1025–1032, 2003. J. Pineau, G. Gordon, and S. Thrun. Anytime point-based approximations for large POMDPs. Journal of Artiﬁcial Intelligence Research, 27:335–380, 2006. J. Pineau, M. G. Bellemare, A. J. Rush, A. Ghizaru, and S. A. Murphy. Constructing evidence-based treatment strategies using methods from computer science. Drug and Alcohol Dependence, 88 (Suppl 2):S52–S60, May 2007. P. Poupart and C. Boutilier. Value-directed compression of POMDPs. Advances in Neural Information Processing Systems, 15:1547–1554, 2002. D. L. Sackett. Evidence-based medicine: What it is and what it isn’t. British Medical Journal, 312 (7023):71–72, 1996. S. Shortreed, E. B. Laber, D. J. Lizotte, T. S. Stroup, J. Pineau, and S. A. Murphy. Informing sequential clinical decision-making through reinforcement learning: an empirical study. Machine Learning, pages 1–28, 2010. ISSN 0885-6125. T. S. Stroup, J. P. McEvoy, M. S. Swartz, M. J. Byerly, I. D. Glick, J. M. Canive, M. McGee, G. M. Simpson, M. D. Stevens, and J. A. Lieberman. The national institute of mental health clinical antipschotic trials of intervention effectiveness (CATIE) project: Schizophrenia trial design and protocol deveplopment. Schizophrenia Bulletin, 29(1):15–31, 2003. B. Sturmfels. Solving Systems of Polynomial Equations. Regional conference series in mathematics. Published for the Conference Board of the Mathematical Sciences by the American Mathematical Society, 2002. ISBN 9780821832516. R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998. M. S. Swartz, D. O. Perkins, T. S. Stroup, J. P. McEvoy, J. M. Nieri, and D. D. Haal. Assessing clinical and functional outcomes in the clinical antipsychotic of intervention effectiveness (CATIE) schizophrenia trial. Schizophrenia Bulletin, 29(1):33–43, 2003. C. Szepesvári. Algorithms for Reinforcement Learning. Morgan & Claypool, July 2010. P. F. Thall. Some geometric methods for constructing decision criteria based on two-dimensional parameters. Journal of Statistical Planning and Inference, 138(2):516–527, 2008. D. J. Uherka and A. M. Sergott. On the continuous dependence of the roots of a polynomial on its coefﬁcients. The American Mathematical Monthly, 84(5):368–370, 1977. ISSN 0002-9890. 3294 L INEAR F ITTED -Q I TERATION WITH M ULTIPLE R EWARD F UNCTIONS P. Vamplew, R. Dazeley, A. Berry, R. Issabekov, and E. Dekker. Empirical evaluation methods for multiobjective reinforcement learning algorithms. Machine Learning, 84:51–80, 2011. ISSN 0885-6125. 10.1007/s10994-010-5232-5. T. Wang, P. Poupart, M. Bowling, and D. Schuurmans. Compact, convex upper bound iteration for approximate pomdp planning. In Proceedings of the 21st National Conference on Artiﬁcial Intelligence, pages 1245–1251, 2006. J. R. Weisz, B. C. Chu, and A. J. Polo. Treatment dissemination and evidence-based practice: Strengthening intervention through clinician-researcher collaboration. Clinical Psychology: Science and Practice, 11(3):300–307, 2004. ISSN 1468-2850. Working Group for the Canadian Psychiatric Association and the Canadian Alliance for Research on Schizophrenia. Canadian clinical practice guidelines for the treatment of schizophrenia. Canadian Journal of Psychiatry, 43(suppl. 2):25–40S, 1998. Z. Zamani, S. Sanner, and C. Fang. Symbolic dynamic programming for continuous state and action MDPs. In Proceedings of the 26th AAAI Conference on Artiﬁcial Intelligence, 2012. To appear. Y. Zhao, M. R. Kosorok, and D. Zeng. Reinforcement learning design for cancer clinical trials. Statistics in Medicine, 28:3294–3315, 2009. 3295