nips nips2010 nips2010-201 nips2010-201-reference knowledge-graph by maker-knowledge-mining

201 nips-2010-PAC-Bayesian Model Selection for Reinforcement Learning

Source: pdf

Author: Mahdi M. Fard, Joelle Pineau

Abstract: This paper introduces the ﬁrst set of PAC-Bayesian bounds for the batch reinforcement learning problem in ﬁnite state spaces. These bounds hold regardless of the correctness of the prior distribution. We demonstrate how such bounds can be used for model-selection in control problems where prior information is available either on the dynamics of the environment, or on the value of actions. Our empirical results conﬁrm that PAC-Bayesian model-selection is able to leverage prior distributions when they are informative and, unlike standard Bayesian RL approaches, ignores them when they are misleading. 1

reference text

[1] L. G. Valiant. A theory of the learnable. Commun. ACM, 27(11):1134–1142, 1984.

[2] M. Kearns and S. Singh. Near-optimal reinforcement learning in polynomial time. Machine Learning, 49(2-3):209–232, 2002.

[3] R. I. Brafman and M. Tennenholtz. R-max – A general polynomial time algorithm for near-optimal reinforcement learning. The Journal of Machine Learning Research, 3:213–231, 2003.

[4] A. L. Strehl and M. L. Littman. A theoretical analysis of model-based interval estimation. In Proceedings of the 22nd International Conference on Machine Learning, pages 856–863, 2005.

[5] S. M. Kakade. On the sample complexity of reinforcement learning. PhD thesis, University College London, 2003.

[6] M. O. G. Duff. Optimal learning: Computational procedures for Bayes-adaptive Markov decision processes. PhD thesis, University of Massachusetts Amherst, 2002.

[7] M. J. A. Strens. A Bayesian Framework for Reinforcement Learning. In Proceedings of the 17th International Conference on Machine Learning, pages 943–950, 2000.

[8] T. Wang, D. Lizotte, M. Bowling, and D. Schuurmans. Bayesian sparse sampling for on-line reward optimization. In Proceedings of the 22nd International Conference on Machine Learning, page 963, 2005.

[9] J. Z. Kolter and A. Y. Ng. Near-Bayesian exploration in polynomial time. In Proceedings of the 26th International Conference on Machine Learning, pages 513–520, 2009.

[10] D. A. McAllester. Some PAC-Bayesian theorems. Machine Learning, 37(3):355–363, 1999.

[11] J. Shawe-Taylor and R. C. Williamson. A PAC analysis of a Bayesian estimator. In Proceedings of the 10th Annual Conference on Computational Learning Theory, pages 2–9, 1997.

[12] P. Germain, A. Lacasse, F. Laviolette, and M. Marchand. PAC-Bayesian learning of linear classiﬁers. In Proceedings of the 26th International Conference on Machine Learning, pages 353–360, 2009.

[13] J. Langford and J. Shawe-Taylor. PAC-Bayes and margins. In Proceedings of Advances in Neural Information Processing Systems, pages 439–446, 2002.

[14] R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998.

[15] D. P. Bertsekas and J. N. Tsitsiklis. Neuro-Dynamic Programming (Optimization and Neural Computation Series, 3). Athena Scientiﬁc, 1996.

[16] R. Herbrich and T. Graepel. A PAC-Bayesian margin bound for linear classiﬁers. IEEE Transactions on Information Theory, 48(12):3140–3150, 2002.

[17] R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. Proceedings of Advances in Neural Information Processing Systems, 12:1057–1063, 2000.

[18] J. A. Boyan. Technical update: Least-squares temporal difference learning. 49(2):233–246, 2002. Machine Learning,

[19] A. Farahmand, M. Ghavamzadeh, C. Szepesv´ ri, and S. Mannor. Regularized ﬁtted Q-iteration: Applicaa tion to planning. Recent Advances in Reinforcement Learning, pages 55–68, 2008.

[20] J. Asmuth, L. Li, M. L. Littman, A. Nouri, and D. Wingate. A Bayesian sampling approach to exploration in reinforcement learning. The 25th Conference on Uncertainty in Artiﬁcial Intelligence, 2009.

[21] D. A. McAllester. PAC-Bayesian model averaging. In Proceedings of the 12th Annual Conference on Computational Learning Theory, pages 164–170, 1999.

[22] T. Weissman, E. Ordentlich, G. Seroussi, S. Verdu, and M. J. Weinberger. Inequalities for the L1 deviation of the empirical distribution. Technical report, Information Theory Research Group, HP Laboratories, 2003. 9