nips nips2011 nips2011-237 nips2011-237-reference knowledge-graph by maker-knowledge-mining

237 nips-2011-Reinforcement Learning using Kernel-Based Stochastic Factorization

Source: pdf

Author: Andre S. Barreto, Doina Precup, Joelle Pineau

Abstract: Kernel-based reinforcement-learning (KBRL) is a method for learning a decision policy from a set of sample transitions which stands out for its strong theoretical guarantees. However, the size of the approximator grows with the number of transitions, which makes the approach impractical for large problems. In this paper we introduce a novel algorithm to improve the scalability of KBRL. We resort to a special decomposition of a transition matrix, called stochastic factorization, to ﬁx the size of the approximator while at the same time incorporating all the information contained in the data. The resulting algorithm, kernel-based stochastic factorization (KBSF), is much faster but still converges to a unique solution. We derive a theoretical upper bound for the distance between the value functions computed by KBRL and KBSF. The effectiveness of our method is illustrated with computational experiments on four reinforcement-learning problems, including a difﬁcult task in which the goal is to learn a neurostimulation policy to suppress the occurrence of seizures in epileptic rat brains. We empirically demonstrate that the proposed approach is able to compress the information contained in KBRL’s model. Also, on the tasks studied, KBSF outperforms two of the most prominent reinforcement-learning algorithms, namely least-squares policy iteration and ﬁtted Q-iteration. 1

reference text

[1] D. Ormoneit and S. Sen. Kernel-based reinforcement learning. Machine Learning, 49 (2–3): 161–178, 2002.

[2] D. Ormoneit and P. Glynn. Kernel-based reinforcement learning in average-cost problems. IEEE Transactions on Automatic Control, 47(10):1624–1636, 2002.

[3] M. G. Lagoudakis and R. Parr. Least-squares policy iteration. Journal of Machine Learning Research, 4:1107–1149, 2003.

[4] D. Ernst, P. Geurts, and L. Wehenkel. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6:503–556, 2005.

[5] M. L. Puterman. Markov Decision Processes—Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., 1994.

[6] N. Jong and P. Stone. Kernel-based models for reinforcement learning in continuous state spaces. In Proceedings of the International Conference on Machine Learning—Workshop on Kernel Machines and Reinforcement Learning, 2006.

[7] J. E. Cohen and U. G. Rothblum. Nonnegative ranks, decompositions and factorizations of nonnegative matrices. Linear Algebra and its Applications, 190:149–168, 1991.

[8] A. Cutler and L. Breiman. Archetypal analysis. Technometrics, 36(4):338–347, 1994.

[9] A. M. S. Barreto and M. D. Fragoso. Computing the stationary distribution of a ﬁnite Markov chain through stochastic factorization. SIAM Journal on Matrix Analysis and Applications. In press.

[10] J. Sorg and S. Singh. Transfer via soft homomorphisms. In Autonomous Agents & Multiagent Systems / Agent Theories, Architectures, and Languages, pages 741–748, 2009.

[11] S. A. Vavasis. On the complexity of nonnegative matrix factorization. SIAM Journal on Optimization, 20:1364–1377, 2009.

[12] W. Whitt. Approximations of dynamic programs, I. Mathematics of Operations Research, 3 (3):231–243, 1978.

[13] B. Ravindran. An Algebraic Approach to Abstraction in Reinforcement Learning. PhD thesis, University of Massachusetts, Amherst, MA, 2004.

[14] L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley and Sons, 1990.

[15] R. S. Sutton. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Advances in Neural Information Processing Systems, volume 8, pages 1038– 1044, 1996.

[16] F. J. Gomez. Robust Non-linear Control Through Neuroevolution. PhD thesis, The University of Texas at Austin, 2003.

[17] A. P. Wieland. Evolving neural network controllers for unstable systems. In Proceedings of the International Joint Conference on Neural Networks, volume 2, pages 667–673, 1991.

[18] F. Gomez, J. Schmidhuber, and R. Miikkulainen. Efﬁcient non-linear control through neuroevolution. In Proceedings of the 17th European Conference on Machine Learning, pages 654–662, 2006.

[19] K. Bush, J. Pineau, and M. Avoli. Manifold embeddings for model-based reinforcement learning of neurostimulation policies. In Proceedings of the ICML / UAI / COLT Workshop on Abstraction in Reinforcement Learning, 2009.

[20] K. Bush and J. Pineau. Manifold embeddings for model-based reinforcement learning under partial observability. In Advances in Neural Information Processing Systems, volume 22, pages 189–197, 2009.

[21] K. Jerger and S. J. Schiff. Periodic pacing an in vitro epileptic focus. Journal of Neurophysiology, (2):876–879, 1995. 9