nips nips2010 nips2010-64 nips2010-64-reference knowledge-graph by maker-knowledge-mining

64 nips-2010-Distributionally Robust Markov Decision Processes


Source: pdf

Author: Huan Xu, Shie Mannor

Abstract: We consider Markov decision processes where the values of the parameters are uncertain. This uncertainty is described by a sequence of nested sets (that is, each set contains the previous one), each of which corresponds to a probabilistic guarantee for a different confidence level so that a set of admissible probability distributions of the unknown parameters is specified. This formulation models the case where the decision maker is aware of and wants to exploit some (yet imprecise) a-priori information of the distribution of parameters, and arises naturally in practice where methods to estimate the confidence region of parameters abound. We propose a decision criterion based on distributional robustness: the optimal policy maximizes the expected total reward under the most adversarial probability distribution over realizations of the uncertain parameters that is admissible (i.e., it agrees with the a-priori information). We show that finding the optimal distributionally robust policy can be reduced to a standard robust MDP where the parameters belong to a single uncertainty set, hence it can be computed in polynomial time under mild technical conditions.


reference text

[1] M. L. Puterman. Markov Decision Processes. John Wiley & Sons, New York, 1994.

[2] D. P. Bertsekas and J. N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1996.

[3] R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.

[4] S. Mannor, D. Simester, P. Sun, and J. Tsitsiklis. Bias and variance in value vunction estimation. In Proceedings of the 21th international conference on Machine learning, 2004.

[5] A. Nilim and L. El Ghaoui. Robust control of Markov decision processes with uncertain transition matrices. Operations Research, 53(5):780–798, September 2005.

[6] A. Bagnell, A. Ng, and J. Schneider. Solving uncertain Markov decision problems. Technical Report CMU-RI-TR-01-25, Carnegie Mellon University, August 2001.

[7] C. C. White III and H. K. El Deib. Markov decision processes with imprecise transition probabilities. Operations Research, 42(4):739–748, July 1992.

[8] G. N. Iyengar. Robust dynamic programming. Mathematics of Operations Research, 30(2):257–280, 2005.

[9] L. G. Epstein and M. Schneider. Learning under ambiguity. Review of Economic Studies, 74(4):1275–1303, 2007.

[10] A. Nilim and L. El Ghaoui. Robustness in Markov decision problems with uncertain transition matrices. In Advances in Neural Information Processing Systems 16, 2004.

[11] E. Delage and S. Mannor. Percentile optimization for Markov decision processes with parameter uncertainty. Operations Research, (1):203–213, 2010.

[12] H. Xu and S. Mannor. The robustness-performance tradeoff in Markov decision processes. In B. Sch¨ lkopf, J. C. Platt, and T. Hofmann, editors, Advances in Neural Information Processing o Systems 19, pages 1537–1544. MIT Press, 2007.

[13] H. Scarf. A min-max solution of an inventory problem. In Studies in Mathematical Theory of Inventory and Production, pages 201–209. Stanford University Press, 1958.

[14] J. Dupacov´ . The minimax approach to stochastic programming and an illustrative application. a Stochastics, 20:73–88, 1987.

[15] P. Kall. Stochastic programming with recourse: Upper bounds and moment problems, a review. In Advances in Mathematical Optimization. Academie-Verlag, Berlin, 1988.

[16] A. Shapiro. Worst-case distribution analysis of stochastic programs. Mathematical Programming, 107(1):91–96, 2006.

[17] I. Popescu. Robust mean-covariance solutions for stochastic optimization. Operations Research, 55(1):98–112, 2007.

[18] E. Delage and Y. Ye. Distributional robust optimization under moment uncertainty with applications to data-driven problems. To appear in Operations Research, 2010.

[19] A. Ruszczy´ ski. Risk-averse dynamic programming for Markov decision processes. Mathen matical Programming, Series B, 125:235–261, 2010.

[20] H. F¨ llmer and A. Schied. Stochastic finance: An introduction in discrete time. Berlin: Walter o de Gruyter, 2002.

[21] I. Gilboa and D. Schmeidler. Maxmin expected utility with a non-unique prior. Journal of Mathematical Economics, 18:141–153, 1989.

[22] D. Kelsey. Maxmin expected utility and weight of evidence. Oxford Economic Papers, 46:425– 444, 1994.

[23] J. Baron. Thinking and Deciding. Cambridge University Press, 2000. 9