nips nips2012 nips2012-98 nips2012-98-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Chi Jin, Liwei Wang
Abstract: Margin is one of the most important concepts in machine learning. Previous margin bounds, both for SVM and for boosting, are dimensionality independent. A major advantage of this dimensionality independency is that it can explain the excellent performance of SVM whose feature spaces are often of high or infinite dimension. In this paper we address the problem whether such dimensionality independency is intrinsic for the margin bounds. We prove a dimensionality dependent PAC-Bayes margin bound. The bound is monotone increasing with respect to the dimension when keeping all other factors fixed. We show that our bound is strictly sharper than a previously well-known PAC-Bayes margin bound if the feature space is of finite dimension; and the two bounds tend to be equivalent as the dimension goes to infinity. In addition, we show that the VC bound for linear classifiers can be recovered from our bound under mild conditions. We conduct extensive experiments on benchmark datasets and find that the new bound is useful for model selection and is usually significantly sharper than the dimensionality independent PAC-Bayes margin bound as well as the VC bound for linear classifiers.
[1] John Shawe-Taylor, Peter L. Bartlett, Robert C. Williamson, and Martin Anthony. Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44(5):1926–1940, 1998.
[2] John Langford and John Shawe-Taylor. PAC-Bayes & Margins. In Advances in Neural Information Processing Systems, pages 423–430, 2002.
[3] David A. McAllester. Simplified PAC-Bayesian margin bounds. Learning Theory and Kernel Machines, 2777:203–215, 2003.
[4] John Langford. Tutorial on practical prediction theory for classification. Journal of Machine Learning Research, 6:273–306, 2005.
[5] Matthias Seeger. PAC-Bayesian generalization error bounds for Gaussian process classification. Journal of Machine Learning Research, 3:233–269, 2002.
[6] David A. McAllester. Some PAC-Bayesian theorems. Machine Learning, 37(3):355–363, 1999.
[7] Robert E. Schapire, Yoav Freund, Peter Barlett, and Wee Sun Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics, 26(5):1651–1686, 1998.
[8] Vladimir Koltchinskii and Dmitry Panchenko. Empirical margin distributions and bounding the generalization error of combined classifiers. Annals of Statistics, 30:1–50, 2002.
[9] Vladimir Koltchinskii and Dmitry Panchenko. Complexities of convex combinations and bounding the generalization error in classification. Annals of Statistics, 33:1455–1496, 2005.
[10] John Langford, Matthias Seeger, and Nimrod Megiddo. An improved predictive accuracy bound for averaging classifiers. In International Conference on Machine Learning, pages 290–297, 2001.
[11] Sanjoy Dasgupta and Philip M. Long. Boosting with diverse base classifiers. In Annual Conference on Learning Theory, pages 273–287, 2003.
[12] Leo Breiman. Prediction games and arcing algorithms. Neural Computation, 11:1493–1518, 1999.
[13] Liwei Wang, Masashi Sugiyama, Zhaoxiang Jing, Cheng Yang, Zhi-Hua Zhou, and Jufu Feng. A refined margin analysis for boosting algorithms via equilibrium margin. Journal of Machine Learning Research, 12:1835–1863, 2011.
[14] Olivier Catoni. PAC-Bayesian supervised classification: The thermodynamics of statistical learning. IMS Lecture Notes–Monograph Series, 56, 2007.
[15] Pascal Germain, Alexandre Lacasse, Francois Laviolette, and Mario Marchand. PAC-Bayesian learning ¸ of linear classifiers. In International Conference on Machine Learning, page 45, 2009.
[16] Pascal Germain, Alexandre Lacasse, Francois Laviolette, Mario Marchand, and Sara Shanian. From ¸ PAC-Bayes bounds to KL regularization. In Advances in Neural Information Processing Systems, pages 603–610, 2009.
[17] Jean-Francis Roy, Francois Laviolette, and Mario Marchand. From PAC-Bayes bounds to quadratic pro¸ grams for majority votes. In International Conference on Machine Learning, pages 649–656, 2011.
[18] Amiran Ambroladze, Emilio Parrado-Hern´ ndez, and John Shawe-Taylor. Tighter pac-bayes bounds. In a Advances in Neural Information Processing Systems, pages 9–16, 2006.
[19] John Shawe-Taylor, Emilio Parrado-Hern´ ndez, and Amiran Ambroladze. Data dependent priors in PACa Bayes bounds. In International Conference on Computational Statistics, pages 231–240, 2010.
[20] Peter L. Bartlett. The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Transactions on Information Theory, 44(2):525–536, 1998.
[21] Ralf Herbrich and Thore Graepel. A PAC-Bayesian margin bound for linear classifiers. IEEE Transactions on Information Theory, 48(12):3140–3150, 2002.
[22] Alexandre Lacasse, Francois Laviolette, Mario Marchand, Pascal Germain, and Nicolas Usunier. PAC¸ Bayes bounds for the risk of the majority vote and the variance of the gibbs classifier. In Advances in Neural Information Processing Systems, pages 769–776, 2006.
[23] Vladimir N. Vapnik. Statistical Learning Theory. Wiley-Interscience, 1998.
[24] Andrew Frank and Arthur Asuncion. UCI machine learning repository, 2010.
[25] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. 9