nips nips2008 nips2008-131 nips2008-131-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Mahdi M. Fard, Joelle Pineau
Abstract: Markov Decision Processes (MDPs) have been extensively studied and used in the context of planning and decision-making, and many methods exist to find the optimal policy for problems modelled as MDPs. Although finding the optimal policy is sufficient in many domains, in certain applications such as decision support systems where the policy is executed by a human (rather than a machine), finding all possible near-optimal policies might be useful as it provides more flexibility to the person executing the policy. In this paper we introduce the new concept of non-deterministic MDP policies, and address the question of finding near-optimal non-deterministic policies. We propose two solutions to this problem, one based on a Mixed Integer Program and the other one based on a search algorithm. We include experimental results obtained from applying this framework to optimize treatment choices in the context of a medical decision support system. 1
[1] A. Schaefer, M. Bailey, S. Shechter, and M. Roberts. Handbook of Operations Research / Management Science Applications in Health Care, chapter Medical decisions using Markov decision processes. Kluwer Academic Publishers, 2004.
[2] M. Hauskrecht and H. Fraser. Planning treatment of ischemic heart disease with partially observable Markov decision processes. Artificial Intelligence in Medicine, 18(3):221–244, 2000.
[3] P. Magni, S. Quaglini, M. Marchetti, and G. Barosi. Deciding when to intervene: a Markov decision process approach. International Journal of Medical Informatics, 60(3):237–253, 2000.
[4] D. Ernst, G. B. Stan, J. Concalves, and L. Wehenkel. Clinical data based optimal sti strategies for hiv: a reinforcement learning approach. In Proceedings of Benelearn, 2006.
[5] D.P. Bertsekas. Dynamic Programming and Optimal Control, Vol 2. Athena Scientific, 1995.
[6] R.S. Sutton and A.G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998.
[7] M. Kearns and S. Singh. Near-optimal reinforcement learning in poly. time. Machine Learning, 49, 2002.
[8] S. Mannor, D. Simester, P. Sun, and J.N. Tsitsiklis. Bias and variance in value function estimation. In Proceedings of ICML, 2004.
[9] M. Fava, A.J. Rush, and M.H. Trivedi et al. Background and rationale for the sequenced treatment alternatives to relieve depression (STAR*D) study. Psychiatr Clin North Am, 26(2):457–94, 2003.