nips nips2012 nips2012-51 nips2012-51-reference knowledge-graph by maker-knowledge-mining

51 nips-2012-Bayesian Hierarchical Reinforcement Learning

Source: pdf

Author: Feng Cao, Soumya Ray

Abstract: We describe an approach to incorporating Bayesian priors in the MAXQ framework for hierarchical reinforcement learning (HRL). We deﬁne priors on the primitive environment model and on task pseudo-rewards. Since models for composite tasks can be complex, we use a mixed model-based/model-free learning approach to ﬁnd an optimal hierarchical policy. We show empirically that (i) our approach results in improved convergence over non-Bayesian baselines, (ii) using both task hierarchies and Bayesian priors is better than either alone, (iii) taking advantage of the task hierarchy reduces the computational cost of Bayesian reinforcement learning and (iv) in this framework, task pseudo-rewards can be learned instead of being manually speciﬁed, leading to hierarchically optimal rather than recursively optimal policies. 1

reference text

[1] R.S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.

[2] Leslie Pack Kaelbling, Michael L. Littman, and Andrew W. Moore. Reinforcement learning: A survey. Journal of Artiﬁcial Intelligence Research, 4:237–285, 1996.

[3] Andrew G. Barto and Sridhar Mahadevan. Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 13(4):341–379, 2003.

[4] Martin Stolle and Doina Precup. Learning Options in reinforcement Learning, volume 2371/2002 of Lecture Notes in Computer Science, pages 212–223. Springer, 2002.

[5] Thomas G. Dietterich. Hierarchical reinforcement learning with the maxq value function decomposition. Journal of Artiﬁcial Intelligence Research, 13:227–303, 2000.

[6] D. Andre and S. Russell. State Abstraction for Programmable Reinforcement Learning Agents. In Proceedings of the Eighteenth National Conference on Artiﬁcial Intelligence (AAAI), 2002.

[7] R. Dearden, N. Friedman, and D. Andre. Model based bayesian exploration. In Proceedings of Fifteenth Conference on Uncertainty in Artiﬁcial Intelligence. Morgan Kaufmann, 1999.

[8] R. Dearden, N. Friedman, and S. Russell. Bayesian Q-learning. In Proceedings of the Fifteenth National Conference on Artiﬁcial Intelligence, 1998.

[9] Y. Engel, S. Mannor, and R. Meir. Bayes meets Bellman:the Gaussian process approach to temporal difference learning. In Proceedings of the Twentieth Internationl Conference on Machine Learning, 2003.

[10] Mohammad Ghavamzadeh and Yaakov Engel. Bayesian policy gradient algorithms. In Advances in Neural Information Processing Systems 19. MIT Press, 2007.

[11] Alessandro Lazaric and Mohammad Ghavamzadeh. Bayesian multi-task reinforcement learning. In Proceedings of the 27th International Conference on Machine Learning, 2010.

[12] Aaron Wilson, Alan Fern, Soumya Ray, and Prasad Tadepalli. Multi-task reinforcement learning: a hierarchical bayesian approach. In Proceedings of the 24th international conference on Machine learning, pages 1015–1022, New York, NY, USA, 2007. ACM.

[13] N. Mehta, S. Ray, P. Tadepalli, and T. Dietterich. Automatic discovery and transfer of MAXQ hierarchies. In Andrew McCallum and Sam Roweis, editors, Proceedings of the 25th International Conference on Machine Learning, pages 648–655. Omnipress, 2008.

[14] Nicholas K. Jong and Peter Stone. Hierarchical model-based reinforcement learning: R-MAX + MAXQ. In Proceedings of the 25th International Conference on Machine Learning, 2008.

[15] Ronen I. Brafman, Moshe Tennenholtz, and Pack Kaelbling. R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 2001.

[16] B. Marthi, S. Russell, and D. Andre. A compact, hierarchically optimal q-function decomposition. In 22nd Conference on Uncertainty in Artiﬁcial Intelligence, 2006.

[17] M. Ghavamzadeh and Y. Engel. Bayesian actor-critic algorithms. In Zoubin Ghahramani, editor, Proceedings of the 24th Annual International Conference on Machine Learning, pages 297–304. Omnipress, 2007.

[18] W. R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25:285–294, 1933.

[19] M. J. A. Strens. A Bayesian framework for reinforcement learning. In Proceeding of the 17th International Conference on Machine Learning, 2000.

[20] Zhaohui Dai, Xin Chen, Weihua Cao, and Min Wu. Model-based learning with bayesian and maxq value function decomposition for hierarchical task. In Proceedings of the 8th World Congress on Intelligent Control and Automation, 2010.

[21] Ronald Edward Parr. Hierarchical Control and Learning for Markov Decision Processes. PhD thesis, 1998. 9