nips nips2010 nips2010-11 nips2010-11-reference knowledge-graph by maker-knowledge-mining

11 nips-2010-A POMDP Extension with Belief-dependent Rewards

Source: pdf

Author: Mauricio Araya, Olivier Buffet, Vincent Thomas, Françcois Charpillet

Abstract: Partially Observable Markov Decision Processes (POMDPs) model sequential decision-making problems under uncertainty and partial observability. Unfortunately, some problems cannot be modeled with state-dependent reward functions, e.g., problems whose objective explicitly implies reducing the uncertainty on the state. To that end, we introduce ρPOMDPs, an extension of POMDPs where the reward function ρ depends on the belief state. We show that, under the common assumption that ρ is convex, the value function is also convex, what makes it possible to (1) approximate ρ arbitrarily well with a piecewise linear and convex (PWLC) function, and (2) use state-of-the-art exact or approximate solving algorithms with limited changes. 1

reference text

[1] R. Smallwood and E. Sondik. The optimal control of partially observable Markov decision processes over a ﬁnite horizon. Operation Research, 21:1071–1088, 1973.

[2] S. Thrun. Probabilistic algorithms in robotics. AI Magazine, 21(4):93–109, 2000.

[3] L. Mihaylova, T. Lefebvre, H. Bruyninckx, K. Gadeyne, and J. De Schutter. Active sensing for robotics - a survey. In Proc. 5th Intl. Conf. On Numerical Methods and Applications, 2002.

[4] S. Ji and L. Carin. Cost-sensitive feature acquisition and classiﬁcation. Pattern Recogn., 40(5):1474–1485, 2007.

[5] A. Hero, D. Castan, D. Cochran, and K. Kastella. Foundations and Applications of Sensor Management. Springer Publishing Company, Incorporated, 2007.

[6] A. Cassandra. Exact and approximate algorithms for partially observable Markov decision processes. PhD thesis, Providence, RI, USA, 1998.

[7] R. Bellman. The theory of dynamic programming. Bull. Amer. Math. Soc., 60:503–516, 1954.

[8] G. Monahan. A survey of partially observable Markov decision processes. Management Science, 28:1–16, 1982. 8

[9] M. Hauskrecht. Value-function approximations for partially observable Markov decision processes. Journal of Artiﬁcial Intelligence Research, 13:33–94.

[10] W. Lovejoy. Computationally feasible bounds for partially observed Markov decision processes. Operations Research, 39(1):162–175.

[11] J. Pineau, G. Gordon, and S. Thrun. Anytime point-based approximations for large POMDPs. Journal of Artiﬁcial Intelligence Research (JAIR), 27:335–380, 2006.

[12] M. Spaan and N. Vlassis. Perseus: Randomized point-based value iteration for POMDPs. Journal of Artiﬁcial Intelligence Research, 24:195–220, 2005.

[13] T. Smith and R. Simmons. Point-based POMDP algorithms: Improved analysis and implementation. In Proc. of the Int. Conf. on Uncertainty in Artiﬁcial Intelligence (UAI), 2005.

[14] H. Kurniawati, D. Hsu, and W. Lee. SARSOP: Efﬁcient point-based POMDP planning by approximating optimally reachable belief spaces. In Robotics: Science and Systems IV, 2008.

[15] R. Kaplow. Point-based POMDP solvers: Survey and comparative analysis. Master’s thesis, Montreal, Quebec, Canada, 2010.

[16] T. Cover and J. Thomas. Elements of Information Theory. Wiley-Interscience, 1991.

[17] M. Araya-L´ pez, O. Buffet, V. Thomas, and F. Charpillet. A POMDP extension with beliefo dependent rewards – extended version. Technical Report RR-7433, INRIA, Oct 2010. (See also NIPS supplementary material).

[18] R. Saigal. On piecewise linear approximations to smooth mappings. Mathematics of Operations Research, 4(2):153–161, 1979.

[19] M. Spaan. Cooperative active perception using POMDPs. In AAAI 2008 Workshop on Advancements in POMDP Solvers, July 2008.

[20] S. Ross, J. Pineau, S. Paquet, and B. Chaib-draa. Online planning algorithms for POMDPs. Journal of Artiﬁcial Intelligence Research (JAIR), 32:663–704, 2008. 9