nips nips2010 nips2010-201 nips2010-201-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Mahdi M. Fard, Joelle Pineau
Abstract: This paper introduces the first set of PAC-Bayesian bounds for the batch reinforcement learning problem in finite state spaces. These bounds hold regardless of the correctness of the prior distribution. We demonstrate how such bounds can be used for model-selection in control problems where prior information is available either on the dynamics of the environment, or on the value of actions. Our empirical results confirm that PAC-Bayesian model-selection is able to leverage prior distributions when they are informative and, unlike standard Bayesian RL approaches, ignores them when they are misleading. 1
[1] L. G. Valiant. A theory of the learnable. Commun. ACM, 27(11):1134–1142, 1984.
[2] M. Kearns and S. Singh. Near-optimal reinforcement learning in polynomial time. Machine Learning, 49(2-3):209–232, 2002.
[3] R. I. Brafman and M. Tennenholtz. R-max – A general polynomial time algorithm for near-optimal reinforcement learning. The Journal of Machine Learning Research, 3:213–231, 2003.
[4] A. L. Strehl and M. L. Littman. A theoretical analysis of model-based interval estimation. In Proceedings of the 22nd International Conference on Machine Learning, pages 856–863, 2005.
[5] S. M. Kakade. On the sample complexity of reinforcement learning. PhD thesis, University College London, 2003.
[6] M. O. G. Duff. Optimal learning: Computational procedures for Bayes-adaptive Markov decision processes. PhD thesis, University of Massachusetts Amherst, 2002.
[7] M. J. A. Strens. A Bayesian Framework for Reinforcement Learning. In Proceedings of the 17th International Conference on Machine Learning, pages 943–950, 2000.
[8] T. Wang, D. Lizotte, M. Bowling, and D. Schuurmans. Bayesian sparse sampling for on-line reward optimization. In Proceedings of the 22nd International Conference on Machine Learning, page 963, 2005.
[9] J. Z. Kolter and A. Y. Ng. Near-Bayesian exploration in polynomial time. In Proceedings of the 26th International Conference on Machine Learning, pages 513–520, 2009.
[10] D. A. McAllester. Some PAC-Bayesian theorems. Machine Learning, 37(3):355–363, 1999.
[11] J. Shawe-Taylor and R. C. Williamson. A PAC analysis of a Bayesian estimator. In Proceedings of the 10th Annual Conference on Computational Learning Theory, pages 2–9, 1997.
[12] P. Germain, A. Lacasse, F. Laviolette, and M. Marchand. PAC-Bayesian learning of linear classifiers. In Proceedings of the 26th International Conference on Machine Learning, pages 353–360, 2009.
[13] J. Langford and J. Shawe-Taylor. PAC-Bayes and margins. In Proceedings of Advances in Neural Information Processing Systems, pages 439–446, 2002.
[14] R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998.
[15] D. P. Bertsekas and J. N. Tsitsiklis. Neuro-Dynamic Programming (Optimization and Neural Computation Series, 3). Athena Scientific, 1996.
[16] R. Herbrich and T. Graepel. A PAC-Bayesian margin bound for linear classifiers. IEEE Transactions on Information Theory, 48(12):3140–3150, 2002.
[17] R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. Proceedings of Advances in Neural Information Processing Systems, 12:1057–1063, 2000.
[18] J. A. Boyan. Technical update: Least-squares temporal difference learning. 49(2):233–246, 2002. Machine Learning,
[19] A. Farahmand, M. Ghavamzadeh, C. Szepesv´ ri, and S. Mannor. Regularized fitted Q-iteration: Applicaa tion to planning. Recent Advances in Reinforcement Learning, pages 55–68, 2008.
[20] J. Asmuth, L. Li, M. L. Littman, A. Nouri, and D. Wingate. A Bayesian sampling approach to exploration in reinforcement learning. The 25th Conference on Uncertainty in Artificial Intelligence, 2009.
[21] D. A. McAllester. PAC-Bayesian model averaging. In Proceedings of the 12th Annual Conference on Computational Learning Theory, pages 164–170, 1999.
[22] T. Weissman, E. Ordentlich, G. Seroussi, S. Verdu, and M. J. Weinberger. Inequalities for the L1 deviation of the empirical distribution. Technical report, Information Theory Research Group, HP Laboratories, 2003. 9