nips nips2009 nips2009-98 nips2009-98-reference knowledge-graph by maker-knowledge-mining

98 nips-2009-From PAC-Bayes Bounds to KL Regularization

Source: pdf

Author: Pascal Germain, Alexandre Lacasse, Mario Marchand, Sara Shanian, François Laviolette

Abstract: We show that convex KL-regularized objective functions are obtained from a PAC-Bayes risk bound when using convex loss functions for the stochastic Gibbs classiﬁer that upper-bound the standard zero-one loss used for the weighted majority vote. By restricting ourselves to a class of posteriors, that we call quasi uniform, we propose a simple coordinate descent learning algorithm to minimize the proposed KL-regularized cost function. We show that standard p -regularized objective functions currently used, such as ridge regression and p -regularized boosting, are obtained from a relaxation of the KL divergence between the quasi uniform posterior and the uniform prior. We present numerical experiments where the proposed learning algorithm generally outperforms ridge regression and AdaBoost. 1

reference text

[1] Olivier Catoni. PAC-Bayesian surpevised classiﬁcation: the thermodynamics of statistical learning. Monograph series of the Institute of Mathematical Statistics, http://arxiv.org/abs/0712.0248, December 2007.

[2] Pascal Germain, Alexandre Lacasse, Francois Laviolette, and Mario Marchand. A pac-bayes ¸ risk bound for general loss functions. In B. Sch¨ lkopf, J. Platt, and T. Hoffman, editors, Ado vances in Neural Information Processing Systems 19, pages 449–456. MIT Press, Cambridge, MA, 2007.

[3] Pascal Germain, Alexandre Lacasse, Francois Laviolette, and Mario Marchand. PAC-Bayesian ¸ learning of linear classiﬁers. In L´ on Bottou and Michael Littman, editors, Proceedings of e the 26th International Conference on Machine Learning, pages 353–360, Montreal, June 2009. Omnipress.

[4] John Langford. Tutorial on practical prediction theory for classiﬁcation. Journal of Machine Learning Research, 6:273–306, 2005.

[5] John Langford and John Shawe-Taylor. PAC-Bayes & margins. In S. Thrun S. Becker and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, pages 423–430. MIT Press, Cambridge, MA, 2003.

[6] David McAllester. PAC-Bayesian stochastic model selection. Machine Learning, 51:5–21, 2003.

[7] Robert E. Schapire, Yoav Freund, Peter Bartlett, and Wee Sun Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26:1651–1686, 1998.

[8] Matthias Seeger. PAC-Bayesian generalization bounds for gaussian processes. Journal of Machine Learning Research, 3:233–269, 2002.

[9] Manfred K. Warmuth, Karen A. Glocer, and S.V.N. Vishwanathan. Entropy regularized LPBoost. In Proceedings of the 2008 conference on Algorithmic Learning Theory, Springer LNAI 5254,, pages 256–271, 2008. 8