nips nips2001 nips2001-187 nips2001-187-reference knowledge-graph by maker-knowledge-mining

187 nips-2001-The Steering Approach for Multi-Criteria Reinforcement Learning

Source: pdf

Author: Shie Mannor, Nahum Shimkin

Abstract: We consider the problem of learning to attain multiple goals in a dynamic environment, which is initially unknown. In addition, the environment may contain arbitrarily varying elements related to actions of other agents or to non-stationary moves of Nature. This problem is modelled as a stochastic (Markov) game between the learning agent and an arbitrary player, with a vector-valued reward function. The objective of the learning agent is to have its long-term average reward vector belong to a given target set. We devise an algorithm for achieving this task, which is based on the theory of approachability for stochastic games. This algorithm combines, in an appropriate way, a ﬁnite set of standard, scalar-reward learning algorithms. Suﬃcient conditions are given for the convergence of the learning algorithm to a general target set. The specialization of these results to the single-controller Markov decision problem are discussed as well. 1

reference text

[1] J. Abounadi, D. Bertsekas, and V. Borkar. Learning algorithms for markov decision processes with average cost. LIDS-P 2434, Lab. for Info. and Decision Systems, MIT, October 1998.

[2] A.G. Barto and R.S. Sutton. Reinforcement Learning. MIT Press, 1998.

[3] D. Blackwell. An analog of the minimax theorem for vector payoﬀs. Paciﬁc J. Math., 6(1):1–8, 1956. 2

[4] R.I. Brafman and M. Tennenholtz. A near optimal polynomial time algorithm for learning in certain classes of stochastic games. Artiﬁcial Intelligence, 121(1-2):31–47, April 2000.

[5] C. Derman. Finite state Markovian decision processes. Academic Press, 1970.

[6] J. Filar and K. Vrieze. Competitive Markov Decision Processes. Springer Verlag, 1996.

[7] L.P. Kaelbling, M. Littman, and A.W. Moore. Reinforcement learning - a survey. Journal of Artiﬁcial Intelligence Research, (4):237–285, May 1996.

[8] M. Kearns and S. Singh. Near-optimal reinforcement learning in polynomial time. In Proc. of the 15th Int. Conf. on Machine Learning, pages 260–268. Morgan Kaufmann, 1998.

[9] M.L. Littman. Markov games as a framework for multi-agent reinforcement learning. In Morgan Kaufman, editor, Eleventh International Conference on Machine Learning, pages 157–163, 1994.

[10] S. Mahadevan. Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning, 22(1):159–196, 1996.

[11] S. Mannor and N. Shimkin. The empirical bayes envelope approach to regret minimization in stochastic games. Technical report EE- 1262, Faculty of Electrical Engineering, Technion, Israel, October 2000.

[12] J.F. Mertens and A. Neyman. Stochastic games. International Journal of Game Theory, 10(2):53–66, 1981.

[13] A. Schwartz. A reinforcement learning method for maximizing undiscounted rewards. In Proceedings of the Tenth International Conference on Machine Learning, pages 298–305. Morgan Kaufmann, 1993.

[14] N. Shimkin and A. Shwartz. Guaranteed performance regions in markovian systems with competing decision makers. IEEE Trans. on Automatic Control, 38(1):84–95, January 1993.