nips nips2012 nips2012-243 nips2012-243-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Nikhil Bhat, Vivek Farias, Ciamac C. Moallemi
Abstract: This paper presents a novel non-parametric approximate dynamic programming (ADP) algorithm that enjoys graceful approximation and sample complexity guarantees. In particular, we establish both theoretically and computationally that our proposal can serve as a viable alternative to state-of-the-art parametric ADP algorithms, freeing the designer from carefully specifying an approximation architecture. We accomplish this by developing a kernel-based mathematical program for ADP. Via a computational study on a controlled queueing network, we show that our procedure is competitive with parametric ADP approaches. 1
[1] D. P. Bertsekas. Dynamic Programming and Optimal Control, Vol. II. Athena Scientific, 2007.
[2] B. Bethke, J. P. How, and A. Ozdaglar. Kernel-based reinforcement learning using Bellman residual elimination. MIT Working Paper, 2008.
[3] Y. Engel, S. Mannor, and R. Meir. Bayes meets Bellman: The Gaussian process approach to temporal difference learning. In Proceedings of the 20th International Conference on Machine Learning, pages 154–161. AAAI Press, 2003.
[4] X. Xu, D. Hu, and X. Lu. Kernel-based least squares policy iteration for reinforcement learning. IEEE Transactions on Neural Networks, 18(4):973–992, 2007.
[5] D. Ormoneit and S. Sen. Kernel-based reinforcement learning. Machine Learning, 49(2):161– 178, 2002.
[6] D. Ormoneit and P. Glynn. Kernel-based reinforcement learning in average cost poblems. IEEE Transactions on Automatic Control, 47(10):1624–1636, 2002.
[7] A. M. S. Barreto, D. Precup, and J. Pineau. Reinforcement learning using kernel-based stochastic factorization. In Advances in Neural Information Processing Systems, volume 24, pages 720–728. MIT Press, 2011.
[8] T. G. Dietterich and X. Wang. Batch value function approximation via support vectors. In Advances in Neural Information Processing Systems, volume 14, pages 1491–1498. MIT Press, 2002.
[9] J. Kolter and A. Ng. Regularization and feature selection in least-squares temporal difference learning. ICML ’09, pages 521–528. ACM, 2009.
[10] M. Petrik, G. Taylor, R. Parr, and S. Zilberstein. Feature selection using regularization in approximate linear programs for Markov decision processes. ICML ’10, pages 871–879, 2010.
[11] J. Pazis and R. Parr. Non-parametric approximate linear programming for MDPs. AAAI Conference on Artificial Intelligence. AAAI, 2011.
[12] V. V. Desai, V. F. Farias, and C. C. Moallemi. Approximate dynamic programming via a smoothed linear program. To appear in Operations Research, 2011.
[13] D. P. de Farias and B. Van Roy. The linear programming approach to approximate dynamic programming. Operations Research, 51(6):850–865, 2003.
[14] D. P. de Farias and B. Van Roy. On constraint sampling in the linear programming approach to approximate dynamic programming. Mathematics of Operations Research, 29:3:462–478, 2004.
[15] P. Schweitzer and A. Seidman. Generalized polynomial approximations in Markovian decision processes. Journal of Mathematical Analysis and Applications, 110:568–582, 1985.
[16] E. Osuna, R. Freund, and F. Girosi. An improved training algorithm for support vector machines. In Neural Networks for Signal Processing, Proceedings of the 1997 IEEE Workshop, pages 276 –285, sep 1997.
[17] T. Joachims. Making large-scale support vector machine learning practical, pages 169–184. MIT Press, Cambridge, MA, USA, 1999.
[18] R. R. Chen and S. Meyn. Value iteration and optimization of multiclass queueing networks. In Decision and Control, 1998. Proceedings of the 37th IEEE Conference on, volume 1, pages 50 –55 vol.1, 1998.
[19] C. C. Moallemi, S. Kumar, and B. Van Roy. Approximate and data-driven dynamic programming for queueing networks. Working Paper, 2008.
[20] L. Tassiulas and A. Ephremides. Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks. IEEE Transactions on Automatic Control, 37(12):1936–1948, December 1992.
[21] A. L. Stolyar. Maxweight scheduling in a generalized switch: State space collapse and workload minimization in heavy traffic. The Annals of Applied Probability, 14:1–53, 2004. 9