nips nips2010 nips2010-160 nips2010-160-reference knowledge-graph by maker-knowledge-mining

160 nips-2010-Linear Complementarity for Regularized Policy Evaluation and Improvement

Source: pdf

Author: Jeffrey Johns, Christopher Painter-wakefield, Ronald Parr

Abstract: Recent work in reinforcement learning has emphasized the power of L1 regularization to perform feature selection and prevent overﬁtting. We propose formulating the L1 regularized linear ﬁxed point problem as a linear complementarity problem (LCP). This formulation offers several advantages over the LARS-inspired formulation, LARS-TD. The LCP formulation allows the use of efﬁcient off-theshelf solvers, leads to a new uniqueness result, and can be initialized with starting points from similar problems (warm starts). We demonstrate that warm starts, as well as the efﬁciency of LCP solvers, can speed up policy iteration. Moreover, warm starts permit a form of modiﬁed policy iteration that can be used to approximate a “greedy” homotopy path, a generalization of the LARS-TD homotopy path that combines policy evaluation and optimization. 1

reference text

[1] J. Kolter and A. Ng. Regularization and feature selection in least-squares temporal difference learning. In Proc. ICML, pages 521–528, 2009.

[2] S. Bradtke and A. Barto. Linear least-squares algorithms for temporal difference learning. Machine Learning, 22(1-3):33–57, 1996.

[3] M. Petrik, G. Taylor, R. Parr, and S. Zilberstein. Feature selection using regularization in approximate linear programs for Markov decision processes. In To appear in Proc. ICML, 2010.

[4] R. Tibshirani. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1):267–288, 1996.

[5] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. The Annals of Statistics, 32(2):407–451, 2004.

[6] J. J´ dice and F. Pires. A block principal pivoting algorithm for large-scale strictly monotone u linear complementarity problems. Computers and Operations Research, 21(5):587–596, 1994.

[7] K. Murty. Linear Complementarity, Linear and Nonlinear Programming. Heldermann Verlag, 1988.

[8] J. Kim and H. Park. Fast active-set-type algorithms for L1 -regularized linear regression. In Proc. AISTAT, pages 397–404, 2010.

[9] H. Lee, A. Battle, R. Raina, and A. Ng. Efﬁcient sparse coding algorithms. In Advances in Neural Information Processing Systems 19, pages 801–808, 2007.

[10] S. Mahadevan and M. Maggioni. Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes. JMLR, 8:2169–2231, 2007.

[11] R. Parr, L. Li, G. Taylor, C. Painter-Wakeﬁeld, and M. Littman. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning. In Proc. ICML, 2008.

[12] A. Farahmand, M. Ghavamzadeh, C. Szepesv´ ri, and S. Mannor. Regularized ﬁtted Q-iteration a for planning in continuous-space Markovian decision problems. In Proc. ACC. IEEE Press, 2009.

[13] M. Loth, M. Davy, and P. Preux. Sparse temporal difference learning using LASSO. In IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, 2007.

[14] J. Johns and S. Mahadevan. Sparse approximate policy evaluation using graph-based basis functions. Technical Report UM-CS-2009-041, University of Massachusetts Amherst, Department of Computer Science, 2009.

[15] S. Mallat and Z. Zhang. Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing, 41(12):3397–3415, 1993.

[16] Y. Pati, R. Rezaiifar, and P. Krishnaprasad. Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition. In Proceedings of the 27th Annual Asilomar Conference on Signals, Systems, and Computers, volume 1, pages 40–44, 1993.

[17] M. Lagoudakis and R. Parr. Least-squares policy iteration. Journal of Machine Learning Research, 4:1107–1149, 2003.

[18] S. Lee and H. Seol. A survey on the matrix completion problem. Trends in Mathematics, 4(1):38–43, 2001.

[19] J. J´ dice and F. Pires. Basic-set algorithm for a generalized linear complementarity problem. u Journal of Optimization Theory and Applications, 74(3):391–411, 1992.

[20] M. Puterman and M. Shin. Modiﬁed policy iteration algorithms for discounted Markov decision problems. Management Science, 24(11), 1978.

[21] R. Sutton and A. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998. 9