nips nips2006 nips2006-186 nips2006-186-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ofer Dekel, Yoram Singer
Abstract: The standard Support Vector Machine formulation does not provide its user with the ability to explicitly control the number of support vectors used to define the generated classifier. We present a modified version of SVM that allows the user to set a budget parameter B and focuses on minimizing the loss attained by the B worst-classified examples while ignoring the remaining examples. This idea can be used to derive sparse versions of both L1-SVM and L2-SVM. Technically, we obtain these new SVM variants by replacing the 1-norm in the standard SVM formulation with various interpolation-norms. We also adapt the SMO optimization algorithm to our setting and report on some preliminary experimental results. 1
[1] B. E. Boser, I. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classifiers. In Proc. of the Fifth Annual ACM Workshop on Computational Learning Theory, pages 144–152, 1992.
[2] V. N. Vapnik. Statistical Learning Theory. Wiley, 1998.
[3] N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines. Cambridge University Press, 2000.
[4] P. Bartlett and A. Tewari. Sparseness vs estimating conditional probabilities: Some asymptotic results. In Proc. of the Seventeenth Annual Conference on Computational Learning Theory, pages 564–578, 2004.
[5] J. C. Platt. Fast training of Support Vector Machines using sequential minimal optimization. In B. Sch¨ lkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods - Support Vector Learno ing. MIT Press, 1998.
[6] C.J.C. Burges. Simplified support vector decision rules. In Proc. of the Thirteenth International Conference on Machine Learning, pages 71–77, 1996.
[7] E. Osuna and F. Girosi. Reducing the run-time complexity of support vector machines. In B. Sch¨ lkopf, o C. Burges, and A. Smola, editors, Advances in Kernel Methods: Support Vector Learning, pages 271–284. MIT Press, 1999.
[8] B. Sch¨ lkopf, S. Mika, C.J.C. Burges, P. Knirsch, K-R M¨ ller, G. R¨ tsch, and A.J. Smola. Input space o u a versus feature space in kernel-based methods. IEEE Transactions on Neural Networks, 10(5):1000–1017, September 1999.
[9] J-H. Chen and C-S. Chen. Reducing SVM classification time using multiple mirror classifiers. IEEE transactions on systems, man and cybernetics – part B: Cybernetics, 34(2):1173–1183, April 2004.
[10] M. Wu, B. Sch¨ lkopf, and G. Bakir. A direct method for building sparse kernel learning algorithms. o Journal of Machine Learning Research, 7:603–624, 2006.
[11] K.P. Bennett. Combining support vector and mathematical programming methods for classification. In Advances in kernel methods: support vector learning, pages 307–326. MIT Press, 1999.
[12] Y. Lee and O.L. Mangasarian. RSVM: Reduced support vector machines. In Proc. of the First SIAM International Conference on Data Mining, 2001.
[13] K. Crammer, J. Kandola, and Y. Singer. Online classification on a budget. In Advances in Neural Information Processing Systems 16, 2003.
[14] O. Dekel, S. Shalev-Shwartz, and Y. Singer. The Forgetron: A kernel-based perceptron on a fixed budget. In Advances in Neural Information Processing Systems 18, 2005.
[15] N. Cesa-Bianchi and C. Gentile. Tracking the best hyperplane with a simple budget perceptron. In Proc. of the Nineteenth Annual Conference on Computational Learning Theory, 2006.
[16] N. Aronszajn. Theory of reproducing kernels. Transactions of the American Mathematical Society, 68(3):337–404, May 1950.
[17] R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, 1985.
[18] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[19] C. Bennett and R. Sharpley. Interpolation of Operators. Academic Press, 1998.
[20] T. Holmstedt. Interpolation of quasi-normed spaces. Mathematica Scandinavica, 26:177–190, 1970.