nips nips2010 nips2010-11 nips2010-11-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Mauricio Araya, Olivier Buffet, Vincent Thomas, Françcois Charpillet
Abstract: Partially Observable Markov Decision Processes (POMDPs) model sequential decision-making problems under uncertainty and partial observability. Unfortunately, some problems cannot be modeled with state-dependent reward functions, e.g., problems whose objective explicitly implies reducing the uncertainty on the state. To that end, we introduce ρPOMDPs, an extension of POMDPs where the reward function ρ depends on the belief state. We show that, under the common assumption that ρ is convex, the value function is also convex, what makes it possible to (1) approximate ρ arbitrarily well with a piecewise linear and convex (PWLC) function, and (2) use state-of-the-art exact or approximate solving algorithms with limited changes. 1
[1] R. Smallwood and E. Sondik. The optimal control of partially observable Markov decision processes over a finite horizon. Operation Research, 21:1071–1088, 1973.
[2] S. Thrun. Probabilistic algorithms in robotics. AI Magazine, 21(4):93–109, 2000.
[3] L. Mihaylova, T. Lefebvre, H. Bruyninckx, K. Gadeyne, and J. De Schutter. Active sensing for robotics - a survey. In Proc. 5th Intl. Conf. On Numerical Methods and Applications, 2002.
[4] S. Ji and L. Carin. Cost-sensitive feature acquisition and classification. Pattern Recogn., 40(5):1474–1485, 2007.
[5] A. Hero, D. Castan, D. Cochran, and K. Kastella. Foundations and Applications of Sensor Management. Springer Publishing Company, Incorporated, 2007.
[6] A. Cassandra. Exact and approximate algorithms for partially observable Markov decision processes. PhD thesis, Providence, RI, USA, 1998.
[7] R. Bellman. The theory of dynamic programming. Bull. Amer. Math. Soc., 60:503–516, 1954.
[8] G. Monahan. A survey of partially observable Markov decision processes. Management Science, 28:1–16, 1982. 8
[9] M. Hauskrecht. Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Research, 13:33–94.
[10] W. Lovejoy. Computationally feasible bounds for partially observed Markov decision processes. Operations Research, 39(1):162–175.
[11] J. Pineau, G. Gordon, and S. Thrun. Anytime point-based approximations for large POMDPs. Journal of Artificial Intelligence Research (JAIR), 27:335–380, 2006.
[12] M. Spaan and N. Vlassis. Perseus: Randomized point-based value iteration for POMDPs. Journal of Artificial Intelligence Research, 24:195–220, 2005.
[13] T. Smith and R. Simmons. Point-based POMDP algorithms: Improved analysis and implementation. In Proc. of the Int. Conf. on Uncertainty in Artificial Intelligence (UAI), 2005.
[14] H. Kurniawati, D. Hsu, and W. Lee. SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. In Robotics: Science and Systems IV, 2008.
[15] R. Kaplow. Point-based POMDP solvers: Survey and comparative analysis. Master’s thesis, Montreal, Quebec, Canada, 2010.
[16] T. Cover and J. Thomas. Elements of Information Theory. Wiley-Interscience, 1991.
[17] M. Araya-L´ pez, O. Buffet, V. Thomas, and F. Charpillet. A POMDP extension with beliefo dependent rewards – extended version. Technical Report RR-7433, INRIA, Oct 2010. (See also NIPS supplementary material).
[18] R. Saigal. On piecewise linear approximations to smooth mappings. Mathematics of Operations Research, 4(2):153–161, 1979.
[19] M. Spaan. Cooperative active perception using POMDPs. In AAAI 2008 Workshop on Advancements in POMDP Solvers, July 2008.
[20] S. Ross, J. Pineau, S. Paquet, and B. Chaib-draa. Online planning algorithms for POMDPs. Journal of Artificial Intelligence Research (JAIR), 32:663–704, 2008. 9