nips nips2001 nips2001-40 nips2001-40-reference knowledge-graph by maker-knowledge-mining

40 nips-2001-Batch Value Function Approximation via Support Vectors

Source: pdf

Author: Thomas G. Dietterich, Xin Wang

Abstract: We present three ways of combining linear programming with the kernel trick to find value function approximations for reinforcement learning. One formulation is based on SVM regression; the second is based on the Bellman equation; and the third seeks only to ensure that good moves have an advantage over bad moves. All formulations attempt to minimize the number of support vectors while fitting the data. Experiments in a difficult, synthetic maze problem show that all three formulations give excellent performance, but the advantage formulation is much easier to train. Unlike policy gradient methods, the kernel methods described here can easily 'adjust the complexity of the function approximator to fit the complexity of the value function. 1

reference text

Baird, L. C. (1993). Advantage updating. Tech. rep. 93-1146, Wright-Patterson AFB. Moll, R., Barto, A. G., Perkins, T. J., & Sutton, R. S. (1999). Learning instanceindependent value functions to enhance local search. NIPS-II, 1017-1023. Tesauro, G. (1995). Temporal difference learning and TD-Gammon. CACM, 28(3), 58-68. Utgoff, P. E., & Saxena, S. (1987). Learning a preference predicate. In ICML-87, 115-121. Vapnik, V. (2000). The Nature of Statistical Learning Theory, 2nd Ed. Springer. Zhang, W., & Dietterich, T. G. (1995). A reinforcement learning approach to jobshop scheduling. In IJCAI95, 1114-1120.