nips nips2007 nips2007-185 nips2007-185-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Tao Wang, Michael Bowling, Dale Schuurmans, Daniel J. Lizotte
Abstract: Recently, we have introduced a novel approach to dynamic programming and reinforcement learning that is based on maintaining explicit representations of stationary distributions instead of value functions. In this paper, we investigate the convergence properties of these dual algorithms both theoretically and empirically, and show how they can be scaled up by incorporating function approximation. 1
[1] M. Puterman. Markov Decision Processes: Discrete Dynamic Programming. Wiley, 1994.
[2] D. Bertsekas. Dynamic Programming and Optimal Control, volume 2. Athena Scientific, 1995.
[3] D. Bertsekas and J. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1996.
[4] T. Wang, M. Bowling, and D. Schuurmans. Dual representations for dynamic programming and reinforcement learning. In Proceeding of the IEEE International Symposium on ADPRL, pages 44–51, 2007.
[5] L. C. Baird. Residual algorithms: Reinforcement learning with function approximation. In International Conference on Machine Learning, pages 30–37, 1995.
[6] R. Sutton and A. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.
[7] J. Tsitsiklis and B. Van Roy. An analysis of temporal-difference learning with function approximation. IEEE Trans. Automat. Control, 42(5):674–690, 1997.
[8] D. de Farias and B. Van Roy. On the existence of fixed points for approximate value iteration and temporal-difference learning. J. Optimization Theory and Applic., 105(3):589–608, 2000.
[9] J. A. Boyan and A. W. Moore. Generalization in reinforcement learning: Safely approximating the value function. In NIPS 7, pages 369–376, 1995.
[10] R. S. Sutton. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Advances in Neural Information Processing Systems, pages 1038–1044, 1996.