nips nips2005 nips2005-199 nips2005-199-reference knowledge-graph by maker-knowledge-mining

199 nips-2005-Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions

Source: pdf

Author: Sridhar Mahadevan, Mauro Maggioni

Abstract: We investigate the problem of automatically constructing efﬁcient representations or basis functions for approximating value functions based on analyzing the structure and topology of the state space. In particular, two novel approaches to value function approximation are explored based on automatically constructing basis functions on state spaces that can be represented as graphs or manifolds: one approach uses the eigenfunctions of the Laplacian, in effect performing a global Fourier analysis on the graph; the second approach is based on diffusion wavelets, which generalize classical wavelets to graphs using multiscale dilations induced by powers of a diffusion operator or random walk on the graph. Together, these approaches form the foundation of a new generation of methods for solving large Markov decision processes, in which the underlying representation and policies are simultaneously learned.

reference text

[1] D. P. Bertsekas and J. N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientiﬁc, Belmont, Massachusetts, 1996.

[2] J. Bremer, R. Coifman, M. Maggioni, and A. Szlam. Diffusion wavelet packets. Technical Report Tech. Rep. YALE/DCS/TR-1304, Yale University, 2004. to appear in Appl. Comp. Harm. Anal.

[3] F. Chung. Spectral Graph Theory. American Mathematical Society, 1997.

[4] R. Coifman and M Maggioni. Diffusion wavelets. Technical Report Tech. Rep. YALE/DCS/TR1303, Yale University, 2004. to appear in Appl. Comp. Harm. Anal.

[5] M. Lagoudakis and R. Parr. Least-squares policy iteration. Journal of Machine Learning Research, 4:1107–1149, 2003.

[6] M. Maggioni and S. Mahadevan. Fast direct policy evaluation using multiscale Markov Diffusion Processes. Technical Report Tech. Rep.TR-2005-39, University of Massachusetts, 2005.

[7] S. Mahadevan. Proto-value functions: Developmental reinforcement learning. In Proceedings of the 22nd International Conference on Machine Learning, 2005.

[8] S. Mahadevan. Representation policy iteration. In Proceedings of the 21st International Conference on Uncertainty in Artiﬁcial Intelligence, 2005.

[9] S. Mahadevan. Samuel meets Amarel: Automating value function approximation using global state space analysis. In National Conference on Artiﬁcial Intelligence (AAAI), 2005.

[10] M. L. Puterman. Markov decision processes. Wiley Interscience, New York, USA, 1994.

[11] S Rosenberg. The Laplacian on a Riemannian Manifold. Cambridge University Press, 1997.