nips nips2012 nips2012-98 nips2012-98-reference knowledge-graph by maker-knowledge-mining

98 nips-2012-Dimensionality Dependent PAC-Bayes Margin Bound

Source: pdf

Author: Chi Jin, Liwei Wang

Abstract: Margin is one of the most important concepts in machine learning. Previous margin bounds, both for SVM and for boosting, are dimensionality independent. A major advantage of this dimensionality independency is that it can explain the excellent performance of SVM whose feature spaces are often of high or inﬁnite dimension. In this paper we address the problem whether such dimensionality independency is intrinsic for the margin bounds. We prove a dimensionality dependent PAC-Bayes margin bound. The bound is monotone increasing with respect to the dimension when keeping all other factors ﬁxed. We show that our bound is strictly sharper than a previously well-known PAC-Bayes margin bound if the feature space is of ﬁnite dimension; and the two bounds tend to be equivalent as the dimension goes to inﬁnity. In addition, we show that the VC bound for linear classiﬁers can be recovered from our bound under mild conditions. We conduct extensive experiments on benchmark datasets and ﬁnd that the new bound is useful for model selection and is usually signiﬁcantly sharper than the dimensionality independent PAC-Bayes margin bound as well as the VC bound for linear classiﬁers.

reference text

[1] John Shawe-Taylor, Peter L. Bartlett, Robert C. Williamson, and Martin Anthony. Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44(5):1926–1940, 1998.

[2] John Langford and John Shawe-Taylor. PAC-Bayes & Margins. In Advances in Neural Information Processing Systems, pages 423–430, 2002.

[3] David A. McAllester. Simpliﬁed PAC-Bayesian margin bounds. Learning Theory and Kernel Machines, 2777:203–215, 2003.

[4] John Langford. Tutorial on practical prediction theory for classiﬁcation. Journal of Machine Learning Research, 6:273–306, 2005.

[5] Matthias Seeger. PAC-Bayesian generalization error bounds for Gaussian process classiﬁcation. Journal of Machine Learning Research, 3:233–269, 2002.

[6] David A. McAllester. Some PAC-Bayesian theorems. Machine Learning, 37(3):355–363, 1999.

[7] Robert E. Schapire, Yoav Freund, Peter Barlett, and Wee Sun Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics, 26(5):1651–1686, 1998.

[8] Vladimir Koltchinskii and Dmitry Panchenko. Empirical margin distributions and bounding the generalization error of combined classiﬁers. Annals of Statistics, 30:1–50, 2002.

[9] Vladimir Koltchinskii and Dmitry Panchenko. Complexities of convex combinations and bounding the generalization error in classiﬁcation. Annals of Statistics, 33:1455–1496, 2005.

[10] John Langford, Matthias Seeger, and Nimrod Megiddo. An improved predictive accuracy bound for averaging classiﬁers. In International Conference on Machine Learning, pages 290–297, 2001.

[11] Sanjoy Dasgupta and Philip M. Long. Boosting with diverse base classiﬁers. In Annual Conference on Learning Theory, pages 273–287, 2003.

[12] Leo Breiman. Prediction games and arcing algorithms. Neural Computation, 11:1493–1518, 1999.

[13] Liwei Wang, Masashi Sugiyama, Zhaoxiang Jing, Cheng Yang, Zhi-Hua Zhou, and Jufu Feng. A reﬁned margin analysis for boosting algorithms via equilibrium margin. Journal of Machine Learning Research, 12:1835–1863, 2011.

[14] Olivier Catoni. PAC-Bayesian supervised classiﬁcation: The thermodynamics of statistical learning. IMS Lecture Notes–Monograph Series, 56, 2007.

[15] Pascal Germain, Alexandre Lacasse, Francois Laviolette, and Mario Marchand. PAC-Bayesian learning ¸ of linear classiﬁers. In International Conference on Machine Learning, page 45, 2009.

[16] Pascal Germain, Alexandre Lacasse, Francois Laviolette, Mario Marchand, and Sara Shanian. From ¸ PAC-Bayes bounds to KL regularization. In Advances in Neural Information Processing Systems, pages 603–610, 2009.

[17] Jean-Francis Roy, Francois Laviolette, and Mario Marchand. From PAC-Bayes bounds to quadratic pro¸ grams for majority votes. In International Conference on Machine Learning, pages 649–656, 2011.

[18] Amiran Ambroladze, Emilio Parrado-Hern´ ndez, and John Shawe-Taylor. Tighter pac-bayes bounds. In a Advances in Neural Information Processing Systems, pages 9–16, 2006.

[19] John Shawe-Taylor, Emilio Parrado-Hern´ ndez, and Amiran Ambroladze. Data dependent priors in PACa Bayes bounds. In International Conference on Computational Statistics, pages 231–240, 2010.

[20] Peter L. Bartlett. The sample complexity of pattern classiﬁcation with neural networks: the size of the weights is more important than the size of the network. IEEE Transactions on Information Theory, 44(2):525–536, 1998.

[21] Ralf Herbrich and Thore Graepel. A PAC-Bayesian margin bound for linear classiﬁers. IEEE Transactions on Information Theory, 48(12):3140–3150, 2002.

[22] Alexandre Lacasse, Francois Laviolette, Mario Marchand, Pascal Germain, and Nicolas Usunier. PAC¸ Bayes bounds for the risk of the majority vote and the variance of the gibbs classiﬁer. In Advances in Neural Information Processing Systems, pages 769–776, 2006.

[23] Vladimir N. Vapnik. Statistical Learning Theory. Wiley-Interscience, 1998.

[24] Andrew Frank and Arthur Asuncion. UCI machine learning repository, 2010.

[25] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. 9