nips nips2006 nips2006-191 nips2006-191-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Huan Xu, Shie Mannor
Abstract: Computation of a satisfactory control policy for a Markov decision process when the parameters of the model are not exactly known is a problem encountered in many practical applications. The traditional robust approach is based on a worstcase analysis and may lead to an overly conservative policy. In this paper we consider the tradeoff between nominal performance and the worst case performance over all possible models. Based on parametric linear programming, we propose a method that computes the whole set of Pareto efficient policies in the performancerobustness plane when only the reward parameters are subject to uncertainty. In the more general case when the transition probabilities are also subject to error, we show that the strategy with the “optimal” tradeoff might be non-Markovian and hence is in general not tractable. 1
[1] A. L. Soyster. Convex programming with set-inclusive constraints and applications to inexact linear programming. Oper. Res., 1973.
[2] A. Bagnell, A. Ng, and J. Schneider. Solving uncertain markov decision processes. Technical Report CMU-RI-TR-01-25, Carnegie Mellon University, August 2001.
[3] A. Ben-Tal and A. Nemirovski. Robust solutions of uncertain linear programs. Oper. Res. Lett., 25(1):1– 13, August 1999.
[4] C. C. White III and H. K. El-Deib. Markov decision process with imprecise transition probabilities. Oper. Res., 42(4):739–748, July 1992.
[5] A. Nilim and L. El Ghaoui. Robust control of markov decision processes with uncertain transition matrices. Oper. Res., 53(5):780–798, September 2005.
[6] D. Bertsimas and M. Sim. The price of robustness. Oper. Res., 52(1):35–53, January 2004.
[7] M. Heger. Consideration of risk in reinforcement learning. In Proc. 11th International Conference on Machine Learning, pages 105–111. Morgan Kaufmann, 1994.
[8] R. Neuneier and O. Mihatsch. Risk sensitive reinforcement learning. In Advances in Neural Information Processing Systems 11, pages 1031–1037, Cambridge, MA, USA, 1999. MIT Press.
[9] P. Geibel. Reinforcement learning with bounded risk. In Proc. 18th International Conf. on Machine Learning, pages 162–169. Morgan Kaufmann, San Francisco, CA, 2001.
[10] D. Bertsimas and J. N. Tsitsiklis. Introduction to Linear Optimization. Athena Scientific, 1997.
[11] M. Ehrgott. Multicriteria Optimization. Springer-Verlag Berlin Heidelberg, 2000.
[12] K. G. Murty. Linear Programming. John Wiley & Sons, 1983.
[13] D. P. Bertsekas and J. N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1996.
[14] M. L. Puterman. Markov Decision Processes. John Wiley & Sons, INC, 1994.