jmlr jmlr2011 jmlr2011-81 jmlr2011-81-reference knowledge-graph by maker-knowledge-mining

81 jmlr-2011-Robust Approximate Bilinear Programming for Value Function Approximation


Source: pdf

Author: Marek Petrik, Shlomo Zilberstein

Abstract: Value function approximation methods have been successfully used in many applications, but the prevailing techniques often lack useful a priori error bounds. We propose a new approximate bilinear programming formulation of value function approximation, which employs global optimization. The formulation provides strong a priori guarantees on both robust and expected policy loss by minimizing specific norms of the Bellman residual. Solving a bilinear program optimally is NP-hard, but this worst-case complexity is unavoidable because the Bellman-residual minimization itself is NP-hard. We describe and analyze the formulation as well as a simple approximate algorithm for solving bilinear programs. The analysis shows that this algorithm offers a convergent generalization of approximate policy iteration. We also briefly analyze the behavior of bilinear programming algorithms under incomplete samples. Finally, we demonstrate that the proposed approach can consistently minimize the Bellman residual on simple benchmark problems. Keywords: value function approximation, approximate dynamic programming, Markov decision processes


reference text

Pieter Abbeel, Varun Ganapathi, and Andrew Y. Ng. Learning vehicular dynamics, with application to modeling helicopters. In Advances in Neural Information Processing Systems, pages 1–8, 2006. Daniel Adelman. A price-directed approach to stochastic inventory/routing. Operations Research, 52:499–514, 2004. Richard Bellman. Dynamic Programming. Princeton University Press, 1957. Kristin P. Bennett and O. L. Mangasarian. Bilinear separation of two sets in n-space. Computation Optimization and Applications, 2, 1993. Dimitri P. Bertsekas and Sergey Ioffe. Temporal differences-based policy iteration and applications in neuro-dynamic programming. Technical Report LIDS-P-2349, LIDS, 1997. Dimitri P. Bertsekas and John N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1996. Alberto Carpara and Michele Monaci. Bidimensional packing by bilinear programming. Mathematical Programming Series A, 118:75–108, 2009. Daniela P. de Farias. The Linear Programming Approach to Approximate Dynamic Programming: Theory and Application. PhD thesis, Stanford University, 2002. Daniela P. de Farias and Ben van Roy. The linear programming approach to approximate dynamic programming. Operations Research, 51:850–856, 2003. 3062 ROBUST A PPROXIMATE B ILINEAR P ROGRAMMING Daniela P. de Farias and Benjamin van Roy. On constraint sampling in the linear programming approach to approximate dynamic programming. Mathematics of Operations Research, 29(3): 462–478, 2004. Carlos Guestrin, Daphne Koller, Ronald Parr, and Shobha Venkataraman. Efficient solution algorithms for factored MDPs. Journal of Artificial Intelligence Research, 19:399–468, 2003. Reiner Horst and Hoang Tuy. Global optimization: Deterministic approaches. Springer, 1996. Michail G. Lagoudakis and Ronald Parr. Least-squares policy iteration. Journal of Machine Learning Research, 4:1107–1149, 2003. Olvi L. Mangasarian. The linear complementarity problem as a separable bilinear program. Journal of Global Optimization, 12:1–7, 1995. Remi Munos. Error bounds for approximate policy iteration. In International Conference on Machine Learning, pages 560–567, 2003. Marek Petrik. Optimization-based Approximate Dynamic Programming. PhD thesis, University of Massachusetts Amherst, 2010. Marek Petrik and Shlomo Zilberstein. Anytime coordination using separable bilinear programs. In Conference on Artificial Intelligence, pages 750–755, 2007. Marek Petrik and Shlomo Zilberstein. Constraint relaxation in approximate linear programs. In International Conference on Machine Learning, pages 809–816, 2009. Marek Petrik, Gavin Taylor, Ron Parr, and Shlomo Zilberstein. Feature selection using regularization in approximate linear programs for Markov decision processes. In International Conference on Machine Learning, pages 871–878, 2010. Warren B. Powell. Approximate Dynamic Programming. Wiley-Interscience, 2007. Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., 2005. Kenneth O. Stanley and Risto Miikkulainen. Competitive coevolution through evolutionary complexification. Journal of Artificial Intelligence Research, 21:63–100, 2004. Richard S. Sutton and Andrew Barto. Reinforcement Learning. MIT Press, 1998. Istvan Szita and Andras Lorincz. Learning Tetris using the noisy cross-entropy method. Neural Computation, 18(12):2936–2941, 2006. Hua O. Wang, Kazuo Tanaka, and Meichael F. Griffin. An approach to fuzzy control of nonlinear systems: Stability and design issues. IEEE Transactions on Fuzzy Systems, 4:14–23, 1996. Ronald J. Williams and Leemon C. Baird. Tight performance bounds on greedy policies based on imperfect value functions. In Yale Workshop on Adaptive and Learning Systems, 1994. 3063