nips nips2012 nips2012-313 nips2012-313-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Marc Bellemare, Joel Veness, Michael Bowling
Abstract: Hashing is a common method to reduce large, potentially infinite feature vectors to a fixed-size table. In reinforcement learning, hashing is often used in conjunction with tile coding to represent states in continuous spaces. Hashing is also a promising approach to value function approximation in large discrete domains such as Go and Hearts, where feature vectors can be constructed by exhaustively combining a set of atomic features. Unfortunately, the typical use of hashing in value function approximation results in biased value estimates due to the possibility of collisions. Recent work in data stream summaries has led to the development of the tug-of-war sketch, an unbiased estimator for approximating inner products. Our work investigates the application of this new data structure to linear value function approximation. Although in the reinforcement learning setting the use of the tug-of-war sketch leads to biased value estimates, we show that this bias can be orders of magnitude less than that of standard hashing. We provide empirical results on two RL benchmark domains and fifty-five Atari 2600 games to highlight the superior learning performance obtained when using tug-of-war hashing. 1
[1] Dimitris Achlioptas. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Journal of Computer and System Sciences, 66(4):671–687, 2003.
[2] Marc G. Bellemare, Joel Veness, and Michael Bowling. Investigating contingency awareness using Atari 2600 games. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012.
[3] Michael Bowling and Manuela Veloso. Scalable learning in stochastic games. In AAAI Workshop on Game Theoretic and Decision Theoretic Agents, 2002.
[4] Michael Bowling and Manuela Veloso. Simultaneous adversarial multi-robot learning. In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, pages 699–704, 2003.
[5] J. Lawrence Carter and Mark N. Wegman. Universal classes of hash functions. Journal of Computer and System Sciences, 18(2):143–154, 1979.
[6] Graham Cormode. Sketch techniques for massive data. In Graham Cormode, Minos Garofalakis, Peter Haas, and Chris Jermaine, editors, Synopses for Massive Data: Samples, Histograms, Wavelets and Sketches, Foundations and Trends in Databases. NOW publishers, 2011.
[7] Graham Cormode and Minos Garofalakis. Sketching streams through the net: Distributed approximate query tracking. In Proceedings of the 31st International Conference on Very Large Data Bases, pages 13–24, 2005.
[8] Anirban Dasgupta, Ravi Kumar, and Tam´ s Sarl´ s. A sparse Johnson-Lindenstrauss transform. In Proa o ceedings of the 42nd ACM Symposium on Theory of Computing, pages 341–350, 2010.
[9] Carlos Diuk, A. Andre Cohen, and Michael L. Littman. An object-oriented representation for efficient reinforcement learning. In Proceedings of the Twenty-Fifth International Conference on Machine Learning, pages 240–247, 2008.
[10] Mahdi Milani Fard, Yuri Grinberg, Joelle Pineau, and Doina Precup. Compressed least-squares regression on sparse spaces. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence. AAAI, 2012.
[11] Mohammad Ghavamzadeh, Alessandro Lazaric, Oldaric-Ambrym Maillard, and R´ mi Munos. LSTD e with random projections. In Advances in Neural Information Processing Systems 23, pages 721–729, 2010.
[12] Matthew Hausknecht, Piyush Khandelwal, Risto Miikkulainen, and Peter Stone. HyperNEAT-GGP: A HyperNEAT-based Atari general game player. In Genetic and Evolutionary Computation Conference (GECCO), 2012.
[13] Daniel M. Kane and Jelani Nelson. A derandomized sparse Johnson-Lindenstrauss transform. arXiv preprint arXiv:1006.3585, 2010.
[14] Ping Li, Trevor J. Hastie, and Kenneth W. Church. Very sparse random projections. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 287–296, 2006.
[15] The Reinforcement Learning Library, 2010. http://library.rl-community.org.
[16] Oldaric-Ambrym Maillard and R´ mi Munos. Compressed least squares regression. In Advances in Neural e Information Processing Systems 22, pages 1213–1221, 2009.
[17] Yavar Naddaf. Game-independent AI agents for playing Atari 2600 console games. PhD thesis, University of Alberta, 2010.
[18] E. Schuitema, D.G.E. Hobbelen, P.P. Jonker, M. Wisse, and J.G.D. Karssen. Using a controller based on reinforcement learning for a passive dynamic walking robot. In Proceedings of the Fifth IEEE-RAS International Conference on Humanoid Robots, pages 232–237, 2005.
[19] David Silver, Richard S. Sutton, and Martin M¨ ller. Reinforcement learning of local shape in the game u of Go. In 20th International Joint Conference on Artificial Intelligence, pages 1053–1058, 2007.
[20] Peter Stone, Richard S. Sutton, and Gregory Kuhlmann. Reinforcement learning for RoboCup soccer keepaway. Adaptive Behavior, 13(3):165, 2005.
[21] Nathan Sturtevant and Adam White. Feature construction for reinforcement learning in Hearts. Computers and Games, pages 122–134, 2006.
[22] Richard S. Sutton. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In David S. Touretzky, Michael C. Mozer, and Michael E. Hasselmo, editors, Advances in Neural Information Processing Systems, volume 8, pages 1038–1044, 1996.
[23] Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.
[24] Russ Tedrake, Teresa Weirui Zhang, and H. Sebastian Seung. Stochastic policy gradient reinforcement learning on a simple 3D biped. In Proceedings of Intelligent Robots and Systems 2004, volume 3, pages 2849–2854, 2004.
[25] John N. Tsitsiklis and Benjamin Van Roy. An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42(5):674–690, 1997.
[26] Samuel Wintermute. Using imagery to simplify perceptual abstraction in reinforcement learning agents. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, 2010. 9