nips nips2012 nips2012-38 nips2012-38-reference knowledge-graph by maker-knowledge-mining

38 nips-2012-Algorithms for Learning Markov Field Policies


Source: pdf

Author: Abdeslam Boularias, Jan R. Peters, Oliver B. Kroemer

Abstract: We use a graphical model for representing policies in Markov Decision Processes. This new representation can easily incorporate domain knowledge in the form of a state similarity graph that loosely indicates which states are supposed to have similar optimal actions. A bias is then introduced into the policy search process by sampling policies from a distribution that assigns high probabilities to policies that agree with the provided state similarity graph, i.e. smoother policies. This distribution corresponds to a Markov Random Field. We also present forward and inverse reinforcement learning algorithms for learning such policy distributions. We illustrate the advantage of the proposed approach on two problems: cart-balancing with swing-up, and teaching a robot to grasp unknown objects. 1


reference text

Abbeel, Pieter and Ng, Andrew Y. Apprenticeship Learning via Inverse Reinforcement Learning. In Proceedings of the Twenty-first International Conference on Machine Learning (ICML’04), pp. 1–8, 2004. Boularias, Abdeslam, Kr¨ mer, Oliver, and Peters, Jan. Learning robot grasping from 3-D images o with Markov Random Fields. In Proceedings of the 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’11), pp. 1548–1553, 2011. Boularias, Abdeslam, Kr¨ mer, Oliver, and Peters, Jan. Structured Apprenticeship Learning. In o Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD’12), pp. 227–242, 2012. Boyan, Justin A. Technical Update: Least-Squares Temporal Difference Learning. Machine Learning, 49:233–246, November 2002. ISSN 0885-6125. Boykov, Yuri, Veksler, Olga, and Zabih, Ramin. Fast Approximate Energy Minimization via Graph Cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23:2001, 1999. Deisenroth, Marc Peter and Rasmussen, Carl Edward. PILCO: A Model-Based and Data-Efficient Approach to Policy Search. In Proceedings of the Twenty-Eighth International Conference on Machine Learning (ICML’11), pp. 465–472, 2011. Dudik, Miroslav, Phillips, Steven J., and Schapire, Robert E. Performance guarantees for regularized maximum entropy density estimation. In Proceedings of the 17th Annual Conference on Computational Learning Theory (COLT’04), pp. 472–486, 2004. Kober, Jens and Peters, Jan. Policy search for motor primitives in robotics. In NIPS, pp. 849–856, 2008. Kohli, Pushmeet, Kumar, Pawan, and Torr, Philip. P3 and beyond: Solving energies with higher order cliques. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR’07), 2007. Kr¨ henb¨ hl, Philipp and Koltun, Vladlen. Efficient Inference in Fully Connected CRFs with Gausa u sian Edge Potentials. In Advances in Neural Information Processing Systems 24, pp. 109–117. 2011. Munoz, Daniel, Vandapel, Nicolas, and Hebert, Martial. Onboard contextual classification of 3-D point clouds with learned high-order Markov random fields. In Proceedings of IEEE International Conference on Robotics and Automation (ICRA’09), 2009. Ormoneit, Dirk and Sen, Saunak. Kernel-based reinforcement learning. In Machine Learning, pp. 161–178, 1999. Sch¨ lkopf, Bernhard, Herbrich, Ralf, and Smola, Alex. A Generalized Representer Theorem . o Computational Learning Theory, 2111:416–426, 2001. Taskar, Ben. Learning Structured Prediction Models: A Large Margin Approach. PhD thesis, Stanford University, CA, USA, 2004. Ziebart, B., Maas, A., Bagnell, A., and Dey, A. Maximum Entropy Inverse Reinforcement Learning. In Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence (AAAI’08), pp. 1433–1438, 2008. 9