nips nips2002 nips2002-82 nips2002-82-reference knowledge-graph by maker-knowledge-mining

82 nips-2002-Exponential Family PCA for Belief Compression in POMDPs


Source: pdf

Author: Nicholas Roy, Geoffrey J. Gordon

Abstract: Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are intractable for large models. The intractability of these algorithms is due to a great extent to their generating an optimal policy over the entire belief space. However, in real POMDP problems most belief states are unlikely, and there is a structured, low-dimensional manifold of plausible beliefs embedded in the high-dimensional belief space. We introduce a new method for solving large-scale POMDPs by taking advantage of belief space sparsity. We reduce the dimensionality of the belief space by exponential family Principal Components Analysis [1], which allows us to turn the sparse, highdimensional belief space into a compact, low-dimensional representation in terms of learned features of the belief state. We then plan directly on the low-dimensional belief features. By planning in a low-dimensional space, we can find policies for POMDPs that are orders of magnitude larger than can be handled by conventional techniques. We demonstrate the use of this algorithm on a synthetic problem and also on a mobile robot navigation task.


reference text

[1] M. Collins, S. Dasgupta, and R. E. Schapire. A generalization of principal components analysis to the exponential family. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems, volume 14, Cambridge, MA, 2002. MIT Press.

[2] Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101:99–134, 1998.

[3] Andrew Ng and Michael Jordan. PEGASUS: A policy search method for large MDPs and POMDPs. In Proceedings of Uncertainty in Artificial Intelligence (UAI), 2000.

[4] Milos Hauskrecht. Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Research, 13:33–94, 2000.

[5] Andrew Ng, Ron Parr, and Daphne Koller. Policy search via density estimation. In Advances in Neural Information Processing Systems 12, 1999.

[6] Jonathan Baxter and Peter Bartlett. Reinforcement learning in POMDP’s via direct gradient ascent. In Proc. the 17th International Conference on Machine Learning, 2000.

[7] J. Andrew Bagnell and Jeff Schneider. Autonomous helicopter control using reinforcement learning policy search methods. In Proceedings of the International Conference on Robotics and Automation, 2001.

[8] Anthony R. Cassandra, Leslie Pack Kaelbling, and James A. Kurien. Acting under uncertainty: Discrete Bayesian models for mobile-robot navigation. In Proceedings of the IEEE/RSJ Interational Conference on Intelligent Robotic Systems (IROS), 1996.

[9] Nicholas Roy and Sebastian Thrun. Coastal navigation with mobile robots. In Advances in Neural Processing Systems 12, pages 1043–1049, 1999.

[10] I. T. Joliffe. Principal Component Analysis. Springer-Verlag, 1986.

[11] Sam Roweis and Lawrence Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323–2326, December 2000.

[12] J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319–2323, December 2000.

[13] S. T. Roweis, L. K. Saul, and G. E. Hinton. Global coordination of local linear models. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems, volume 14, Cambridge, MA, 2002. MIT Press.

[14] Daniel D. Lee and H. Sebastian Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401:788–791, 1999.

[15] Geoffrey Gordon. Generalized linear models. In Suzanna Becker, Sebastian Thrun, and Klaus Obermayer, editors, Advances in Neural Information Processing Systems 15. MIT Press, 2003.

[16] Sebastian Thrun. Monte Carlo POMDPs. In Advances in Neural Information Processing Systems 12, 1999.

[17] S. Thrun, M. Beetz, M. Bennewitz, W. Burgard, A.B. Cremers, F. Dellaert, D. Fox, D. Hhnel, C. Rosenberg, N. Roy, J. Schulte, , and D. Schulz. Probabilistic algorithms and the interactive museum tour-guide robot Minerva. International Journal of Robotics Research, 19(11):972– 999, 2000.