jmlr jmlr2012 jmlr2012-87 jmlr2012-87-reference knowledge-graph by maker-knowledge-mining

87 jmlr-2012-PAC-Bayes Bounds with Data Dependent Priors

Source: pdf

Author: Emilio Parrado-Hernández, Amiran Ambroladze, John Shawe-Taylor, Shiliang Sun

Abstract: This paper presents the prior PAC-Bayes bound and explores its capabilities as a tool to provide tight predictions of SVMs’ generalization. The computation of the bound involves estimating a prior of the distribution of classiﬁers from the available data, and then manipulating this prior in the usual PAC-Bayes generalization bound. We explore two alternatives: to learn the prior from a separate data set, or to consider an expectation prior that does not need this separate data set. The prior PAC-Bayes bound motivates two SVM-like classiﬁcation algorithms, prior SVM and ηprior SVM, whose regularization term pushes towards the minimization of the prior PAC-Bayes bound. The experimental work illustrates that the new bounds can be signiﬁcantly tighter than the original PAC-Bayes bound when applied to SVMs, and among them the combination of the prior PAC-Bayes bound and the prior SVM algorithm gives the tightest bound. Keywords: PAC-Bayes bound, support vector machine, generalization capability prediction, classiﬁcation

reference text

A. Ambroladze, E. Parrado-Hern´ ndez, and J. Shawe-Taylor. Tighter PAC-Bayes bounds. In a B. Sch¨ lkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Syso tems 19, pages 9–16. MIT Press, Cambridge, MA, 2007. P. Bartlett and S. Mendelson. Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3:463–482, 2002. C. L. Blake and C. J. Merz. UCI Repository of Machine Learning Databases. Department of Information and Computer Sciences, University of California, Irvine, http://www.ics.uci.edu/∼mlearn/MLRepository.html, 1998. B. E. Boser, I. Guyon, and V. Vapnik. A training algorithm for optimal margin classiﬁers. In Proceedings of the 5th Annual Conference on Computational Learning Theory, COLT ’92, pages 144–152, 1992. O. Catoni. PAC-Bayesian Supervised Classiﬁcation: The Thermodynamics of Statistical Learning. Institute of Mathematical Statistics, Beachwood, Ohio, USA, 2007. N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines. Cambridge University Press, Cambridge, UK, 2000. P. Germain, A. Lacasse, F. Laviolette, and M. Marchand. PAC-Bayesian learning of linear classiﬁers. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, pages 353–360, 2009. J. Langford. Tutorial on practical prediction theory for classiﬁcation. Journal of Machine Learning Research, 6(Mar):273–306, 2005. J. Langford and J. Shawe-Taylor. PAC-Bayes and margins. In Advances in Neural Information Processing Systems, volume 14, Cambridge MA, 2002. MIT Press. 3530 PAC-BAYES B OUNDS WITH DATA D EPENDENT P RIORS D. A. McAllester. PAC-Bayesian model averaging. In Proceedings of the 12th Annual Conference on Computational Learning Theory, COLT ’99, pages 164–170, 1999. B. Sch¨ lkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge (MA), 2002. o M. Seeger. PAC-Bayesian generalization error bounds for Gaussian process classiﬁcation. Journal of Machine Learning Research, 3:233–269, 2002. J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge, UK, 2004. J. Shawe-Taylor, P. L. Bartlett, R. C. Williamson, and M. Anthony. Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44:1926–1940, 1998. V. Vapnik. Statistical Learning Theory. Wiley, New York, 1998. T. Zhang. Covering number bounds of certain regularized linear function classes. Journal of Machine Learning Research, 2:527–550, 2002. 3531