nips nips2006 nips2006-202 nips2006-202-reference knowledge-graph by maker-knowledge-mining

202 nips-2006-iLSTD: Eligibility Traces and Convergence Analysis

Source: pdf

Author: Alborz Geramifard, Michael Bowling, Martin Zinkevich, Richard S. Sutton

Abstract: We present new theoretical and empirical results with the iLSTD algorithm for policy evaluation in reinforcement learning with linear function approximation. iLSTD is an incremental method for achieving results similar to LSTD, the dataefﬁcient, least-squares version of temporal difference learning, without incurring the full cost of the LSTD computation. LSTD is O(n2 ), where n is the number of parameters in the linear function approximator, while iLSTD is O(n). In this paper, we generalize the previous iLSTD algorithm and present three new results: (1) the ﬁrst convergence proof for an iLSTD algorithm; (2) an extension to incorporate eligibility traces without changing the asymptotic computational complexity; and (3) the ﬁrst empirical results with an iLSTD algorithm for a problem (mountain car) with feature vectors large enough (n = 10, 000) to show substantial computational advantages over LSTD. 1

reference text

[Bertsekas and Tsitsiklis, 1996] Dmitri P. Bertsekas and John N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientiﬁc, 1996. [Boyan, 1999] Justin A. Boyan. Least-squares temporal difference learning. In Proceedings of the Sixteenth International Conference on Machine Learning, pages 49–56. Morgan Kaufmann, San Francisco, CA, 1999. [Boyan, 2002] Justin A. Boyan. Technical update: Least-squares temporal difference learning. Machine Learning, 49:233–246, 2002. [Bradtke and Barto, 1996] S. Bradtke and A. Barto. Linear least-squares algorithms for temporal difference learning. Machine Learning, 22:33–57, 1996. [Geramifard et al., 2006] Alborz Geramifard, Michael Bowling, and Richard S. Sutton. Incremental least-squares temporal difference learning. In Proceedings of the Twenty-First National Conference on Artiﬁcial Intelligence (AAAI), pages 356–361. AAAI Press, 2006. [RL Library, 2006] RL Library. The University of Alberta reinforcement learning library. http: //rlai.cs.ualberta.ca/RLR/environment.html, 2006. [Stone et al., 2005] Peter Stone, Richard S. Sutton, and Gregory Kuhlmann. Reinforcement learning for robocup soccer keepaway. International Society for Adaptive Behavior, 13(3):165–188, 2005. [Sutton and Barto, 1998] R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998. [Sutton, 1988] Richard S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3:9–44, 1988. [Sutton, 1996] Richard S. Sutton. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Advances in Neural Information Processing Systems 8, pages 1038–1044. The MIT Press, 1996. [Tsitsiklis and Van Roy, 1997] John N. Tsitsiklis and Benjamin Van Roy. An analysis of temporaldifference learning with function approximation. IEEE Transactions on Automatic Control, 42(5):674–690, 1997.