nips nips2011 nips2011-10 nips2011-10-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Oliver B. Kroemer, Jan R. Peters
Abstract: In this paper, we consider the problem of policy evaluation for continuousstate systems. We present a non-parametric approach to policy evaluation, which uses kernel density estimation to represent the system. The true form of the value function for this model can be determined, and can be computed using Galerkin’s method. Furthermore, we also present a unified view of several well-known policy evaluation methods. In particular, we show that the same Galerkin method can be used to derive Least-Squares Temporal Difference learning, Kernelized Temporal Difference learning, and a discrete-state Dynamic Programming solution, as well as our proposed method. In a numerical evaluation of these algorithms, the proposed approach performed better than the other methods. 1
[1] Dimitri P. Bertsekas. Dynamic Programming and Optimal Control, Vol. II. Athena Scientific, 2007.
[2] R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. 1998.
[3] H. Maei, C. Szepesvari, S. Bhatnagar, D. Precup, D. Silver, and R. Sutton. Convergent temporal-difference learning with arbitrary smooth function approximation. In NIPS, pages 1204–1212, 2009.
[4] Richard Bellman. Bottleneck problems and dynamic programming. Proceedings of the National Academy of Sciences of the United States of America, 39(9):947–951, 1953.
[5] R.E. Kalman. Contributions to the theory of optimal control, 1960.
[6] Warren B. Powell. Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics). Wiley-Interscience, 2007.
[7] Rémi Munos. Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation. Journal of Machine Learning Research, 7:413–427, 2006.
[8] Ralf Schoknecht. Optimality of reinforcement learning algorithms with linear function approximation. In NIPS, pages 1555–1562, 2002.
[9] Leemon Baird. Residual algorithms: Reinforcement learning with function approximation. In ICML, 1995.
[10] Christopher G. Atkeson and Juan C. Santamaria. A Comparison of Direct and Model-Based Reinforcement Learning. In ICRA, pages 3557–3564, 1997.
[11] H. Bersini and V. Gorrini. Three connectionist implementations of dynamic programming for optimal control: A preliminary comparative analysis. In Nicrosp, 1996.
[12] E. Nadaraya. On estimating regression. Theory of Prob. and Appl., 9:141–142, 1964.
[13] G. Watson. Smooth regression analysis. Sankhya, Series, A(26):359–372, 1964.
[14] Justin A. Boyan. Least-squares temporal difference learning. In ICML, pages 49–56, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc.
[15] Taylor, Gavin and Parr, Ronald. Kernelized value function approximation for reinforcement learning. In ICML, pages 1017–1024, New York, NY, USA, 2009. ACM.
[16] Dimitri P. Bertsekas and John N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1996.
[17] Murray Rosenblatt. Remarks on Some Nonparametric Estimates of a Density Function. The Annals of Mathematical Statistics, 27(3):832–837, September 1956.
[18] Emanuel Parzen. On Estimation of a Probability Density Function and Mode. The Annals of Mathematical Statistics, 33(3):1065–1076, 1962.
[19] G. S. Kimeldorf and G. Wahba. Some results on Tchebycheffian spline functions. Journal of Mathematical Analysis and Applications, 33(1):82–95, 1971.
[20] Rémi Munos. Error bounds for approximate policy iteration. In ICML, pages 560–567, 2003.
[21] Kendall E. Atkinson. The Numerical Solution of Integral Equations of the Second Kind. Cambridge University Press, 1997.
[22] Dominik Wied and Rafael Weissbach. Consistency of the kernel density estimator: a survey. Statistical Papers, pages 1–21, 2010.
[23] Yaakov Engel, Shie Mannor, and Ron Meir. Reinforcement learning with Gaussian processes. In ICML, pages 201–208, New York, NY, USA, 2005. ACM.
[24] Xin Xu, Tau Xie, Dewen Hu, and Xicheng Lu. Kernel least-squares temporal difference learning. International Journal of Information Technology, 11:54–63, 1997.
[25] J. Zico Kolter and Andrew Y. Ng. Regularization and feature selection in least-squares temporal difference learning. In ICML, pages 521–528. ACM, 2009.
[26] Nicholas K. Jong and Peter Stone. Model-based function approximation for reinforcement learning. In AAMAS, May 2007.
[27] Dirk Ormoneit and Śaunak Sen. Kernel-Based reinforcement learning. Machine Learning, 49(2):161–178, November 2002.
[28] B. W. Silverman. Density estimation: for statistics and data analysis. London, 1986. 9