nips nips2000 nips2000-58 nips2000-58-reference knowledge-graph by maker-knowledge-mining

58 nips-2000-From Margin to Sparsity

Source: pdf

Author: Thore Graepel, Ralf Herbrich, Robert C. Williamson

Abstract: We present an improvement of Novikoff's perceptron convergence theorem. Reinterpreting this mistake bound as a margin dependent sparsity guarantee allows us to give a PAC-style generalisation error bound for the classifier learned by the perceptron learning algorithm. The bound value crucially depends on the margin a support vector machine would achieve on the same data set using the same kernel. Ironically, the bound yields better guarantees than are currently available for the support vector solution itself. 1

reference text

[I] M. Aizerman, E. Braverman, and L. Rozonoer. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25:821- 837, 1964.

[2] Y. Freund and R. E. Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 1999.

[3] T. Friess, N. Cristianini, and C. Campbell. The Kernel-Adatron: A fast and simple learning procedure for Support Vector Machines. In Proceedings of the 15- th International Conference in Machine Learning, pages 188- 196, 1998.

[4] T. Graepel, R. Herbrich, and J. Shawe-Taylor. Generalisation error bounds for sparse linear classifiers. In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, pages 298- 303, 2000. in press.

[5] R. Herbrich. Learning Linear Classifiers - Theory and Algorithms. PhD thesis, Technische Universitiit Berlin, 2000. accepted for publication by MIT Press.

[6] R. Herbrich and T. Graepel. A PAC-Bayesian margin bound for linear classifiers: Why SVMs work. In Advances in Neural Information System Processing 13, 2001.

[7] H. Konig. Eigenvalue Distribution of Compact Operators. Birkhiiuser, Basel, 1986.

[8] N. Littlestone and M. Warmuth. Relating data compression and learn ability. Technical report, University of California Santa Cruz, 1986.

[9] T. Mercer. Functions of positive and negative type and their connection with the theory of integral equations. Transaction of London Philosophy Society (A), 209:415446, 1909.

[10] A. Novikoff. On convergence proofs for perceptrons. In Report at the Symposium on Mathematical Theory of Automata, pages 24- 26, Politechnical Institute Brooklyn, 1962.

[11] M. Rosenblatt. Principles of neurodynamics: Perceptron and Theory of Brain Mechanisms. Spartan- Books, Washington D.C., 1962.

[12] J . Shawe-Taylor, P. L. Bartlett, R. C. Williamson, and M. Anthony. Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44(5):1926- 1940, 1998.

[13] V. Vapnik. Statistical Learning Theory. John Wiley and Sons, New York, 1998.

[14] V. Vapnik. The Nature of Statistical Learning Theory. Springer, second edition, 1999.

[15] G. Wahba. Support Vector Machines, Reproducing Kernel Hilbert Spaces and the randomized GACV. Technical report, Department of Statistics, University of Wisconsin, Madison, 1997. TR- NO- 984.