nips nips2012 nips2012-297 nips2012-297-reference knowledge-graph by maker-knowledge-mining

297 nips-2012-Robustness and risk-sensitivity in Markov decision processes


Source: pdf

Author: Takayuki Osogami

Abstract: We uncover relations between robust MDPs and risk-sensitive MDPs. The objective of a robust MDP is to minimize a function, such as the expectation of cumulative cost, for the worst case when the parameters have uncertainties. The objective of a risk-sensitive MDP is to minimize a risk measure of the cumulative cost when the parameters are known. We show that a risk-sensitive MDP of minimizing the expected exponential utility is equivalent to a robust MDP of minimizing the worst-case expectation with a penalty for the deviation of the uncertain parameters from their nominal values, which is measured with the Kullback-Leibler divergence. We also show that a risk-sensitive MDP of minimizing an iterated risk measure that is composed of certain coherent risk measures is equivalent to a robust MDP of minimizing the worst-case expectation when the possible deviations of uncertain parameters from their nominal values are characterized with a concave function. 1


reference text

[1] C. D. Charalambous, F. Rezaei, and A. Kyprianou. Relations between information theory, robustness, and statistical mechanics of stochastic systems. In Proceedings of the 43rd IEEE Conference on Decision and Control, volume 4, pages 3479–3484, 2004.

[2] T. M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley & Sons, Inc., Hoboken, New Jersey, 2nd edition, 2006.

[3] E. Delage and S. Mannor. Percentile optimization in uncertain MDP with application to efficient exploration. In Proceedings of the 24th Annual International Conference on Machine Learning (ICML 2007), pages 225–232, June 2007.

[4] E. Delage and S. Mannor. Percentile optimization for MDP with parameter uncertainty. Operations Research, 58(1):203–213, 2010.

[5] E. V. Denardo and U. G. Rothblum. Optimal stopping, exponential utility, and linear programming. Mathematical Programming, 16:228–244, 1979.

[6] H. F¨ llmer and A. Schied. Stochastic Finance: An Introduction in Discrete Time. Walter de o Gruyter, Berlin, Germany, 3rd edition, 2010.

[7] R. Howard and J. Matheson. Risk-sensitive Markov decision processes. Management Science, 18(7):356–369, 1972.

[8] S. C. Jaquette. A utility criterion for Markov decision processes. Management Science, 23(1):43–49, 1976.

[9] S. Kusuoka. On law invariant coherent risk measures. In S. Kusuoka and T. Maruyama, editors, Advances in Mathematical Economics, volume 3, pages 83–95. Springer, Tokyo, 2001.

[10] S. Mannor, O. Mebel, and H. Xu. Lightning does not strike twice: Robust MDPs with coupled uncertainty. In Proceedings of the International Conference on Machine Learning (ICML 2012), pages 385–392, 2012.

[11] A. Nilim and L. El Ghaoui. Robust control of Markov decision processes with uncertain transition matrices. Operations Research, 53(5):780–798, 2005.

[12] A. Nilim and L. E. Ghaoui. Robustness in Markov decision problems with uncertain transition matrices. In S. Thrun, L. Saul, and B. Sch¨ lkopf, editors, Advances in Neural Information o Processing Systems 16. MIT Press, Cambridge, MA, 2004.

[13] T. Osogami. Iterated risk measures for risk-sensitive Markov decision processes with discounted cost. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI 2011), pages 567–574, July 2011.

[14] T. Osogami and T. Morimura. Time-consistency of optimization problems. In Proceedings of the 26th Conference on Artificial Intelligence (AAAI-12), July 2012.

[15] S. D. Patek. On terminating Markov decision processes with a risk-averse objective function. Automatica, 37(9):1379–1386, 2001.

[16] I. R. Petersen, M. R. James, and P. Dupuis. Minimax optimal control of stochastic uncertain systems with relative entropy constraints. IEEE Transactions on Automatic Control, 45(3):398–412, 2000.

[17] M. L. Puterman. Markov Decision Processes: Discrete Dynamic Programming. WileyInterscience, Hoboken, NJ, second edition, 2005.

[18] U. G. Rothblum. Multiplicative Markov decision chains. Mathematics of Operations Research, 9(1):6–24, 1984.

[19] A. Ruszczy´ ski. Risk-averse dynamic programming for Markov decision processes. Mathen matical Programming, 125:235–261, 2010.

[20] H. Xu and S. Mannor. The robustness-performance tradeoff in Markov decision processes. In B. Sch¨ lkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing o Systems 19, pages 1537–1544. MIT Press, Cambridge, MA, 2007.

[21] H. Xu and S. Mannor. Distributionally robust Markov decision processes. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 2505–2513. MIT Press, Cambridge, MA, 2010. 9