nips nips2012 nips2012-243 nips2012-243-reference knowledge-graph by maker-knowledge-mining

243 nips-2012-Non-parametric Approximate Dynamic Programming via the Kernel Method

Source: pdf

Author: Nikhil Bhat, Vivek Farias, Ciamac C. Moallemi

Abstract: This paper presents a novel non-parametric approximate dynamic programming (ADP) algorithm that enjoys graceful approximation and sample complexity guarantees. In particular, we establish both theoretically and computationally that our proposal can serve as a viable alternative to state-of-the-art parametric ADP algorithms, freeing the designer from carefully specifying an approximation architecture. We accomplish this by developing a kernel-based mathematical program for ADP. Via a computational study on a controlled queueing network, we show that our procedure is competitive with parametric ADP approaches. 1

reference text

[1] D. P. Bertsekas. Dynamic Programming and Optimal Control, Vol. II. Athena Scientiﬁc, 2007.

[2] B. Bethke, J. P. How, and A. Ozdaglar. Kernel-based reinforcement learning using Bellman residual elimination. MIT Working Paper, 2008.

[3] Y. Engel, S. Mannor, and R. Meir. Bayes meets Bellman: The Gaussian process approach to temporal difference learning. In Proceedings of the 20th International Conference on Machine Learning, pages 154–161. AAAI Press, 2003.

[4] X. Xu, D. Hu, and X. Lu. Kernel-based least squares policy iteration for reinforcement learning. IEEE Transactions on Neural Networks, 18(4):973–992, 2007.

[5] D. Ormoneit and S. Sen. Kernel-based reinforcement learning. Machine Learning, 49(2):161– 178, 2002.

[6] D. Ormoneit and P. Glynn. Kernel-based reinforcement learning in average cost poblems. IEEE Transactions on Automatic Control, 47(10):1624–1636, 2002.

[7] A. M. S. Barreto, D. Precup, and J. Pineau. Reinforcement learning using kernel-based stochastic factorization. In Advances in Neural Information Processing Systems, volume 24, pages 720–728. MIT Press, 2011.

[8] T. G. Dietterich and X. Wang. Batch value function approximation via support vectors. In Advances in Neural Information Processing Systems, volume 14, pages 1491–1498. MIT Press, 2002.

[9] J. Kolter and A. Ng. Regularization and feature selection in least-squares temporal difference learning. ICML ’09, pages 521–528. ACM, 2009.

[10] M. Petrik, G. Taylor, R. Parr, and S. Zilberstein. Feature selection using regularization in approximate linear programs for Markov decision processes. ICML ’10, pages 871–879, 2010.

[11] J. Pazis and R. Parr. Non-parametric approximate linear programming for MDPs. AAAI Conference on Artiﬁcial Intelligence. AAAI, 2011.

[12] V. V. Desai, V. F. Farias, and C. C. Moallemi. Approximate dynamic programming via a smoothed linear program. To appear in Operations Research, 2011.

[13] D. P. de Farias and B. Van Roy. The linear programming approach to approximate dynamic programming. Operations Research, 51(6):850–865, 2003.

[14] D. P. de Farias and B. Van Roy. On constraint sampling in the linear programming approach to approximate dynamic programming. Mathematics of Operations Research, 29:3:462–478, 2004.

[15] P. Schweitzer and A. Seidman. Generalized polynomial approximations in Markovian decision processes. Journal of Mathematical Analysis and Applications, 110:568–582, 1985.

[16] E. Osuna, R. Freund, and F. Girosi. An improved training algorithm for support vector machines. In Neural Networks for Signal Processing, Proceedings of the 1997 IEEE Workshop, pages 276 –285, sep 1997.

[17] T. Joachims. Making large-scale support vector machine learning practical, pages 169–184. MIT Press, Cambridge, MA, USA, 1999.

[18] R. R. Chen and S. Meyn. Value iteration and optimization of multiclass queueing networks. In Decision and Control, 1998. Proceedings of the 37th IEEE Conference on, volume 1, pages 50 –55 vol.1, 1998.

[19] C. C. Moallemi, S. Kumar, and B. Van Roy. Approximate and data-driven dynamic programming for queueing networks. Working Paper, 2008.

[20] L. Tassiulas and A. Ephremides. Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks. IEEE Transactions on Automatic Control, 37(12):1936–1948, December 1992.

[21] A. L. Stolyar. Maxweight scheduling in a generalized switch: State space collapse and workload minimization in heavy trafﬁc. The Annals of Applied Probability, 14:1–53, 2004. 9