jmlr jmlr2011 jmlr2011-95 jmlr2011-95-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ingo Steinwart, Don Hush, Clint Scovel
Abstract: We develop, analyze, and test a training algorithm for support vector machine classifiers without offset. Key features of this algorithm are a new, statistically motivated stopping criterion, new warm start options, and a set of inexpensive working set selection strategies that significantly reduce the number of iterations. For these working set strategies, we establish convergence rates that, not surprisingly, coincide with the best known rates for SVMs with offset. We further conduct various experiments that investigate both the run time behavior and the performed iterations of the new training algorithm. It turns out, that the new algorithm needs significantly less iterations and also runs substantially faster than standard training algorithms for SVMs with offset. Keywords: support vector machines, decomposition algorithms
C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines. http://www.csie. ntu.edu.tw/˜cjlin/papers/libsvm.ps.gz, 2009. P.-H. Chen, R.-E. Fan, and C.-J. Lin. A study on SMO-type decomposition methods for support vector machines. IEEE Trans. Neural Networks, 17:893–908, 2006. N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines. Cambridge University Press, Cambridge, 2000. L. Devroye, L. Gy¨ rfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer, o New York, 1996. R.-E. Fan, P.-H. Chen, and C.-J. Lin. Working set selection using second order information for training support vector machines. J. Mach. Learn. Res., 6:1889–1918, 2005. G. Fung and O. L. Mangasarian. Proximal support vector machine classifiers. In KDD ’01: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 77–86, New York, NY, USA, 2001. ACM. T. Glasmachers and C. Igel. Maximum-gain working set selection for SVMs. J. Mach. Learn. Res., 7:1437–1466, 2006. C.-W. Hsu and C.-J. Lin. A simple decomposition method for support vector machines. Mach. Learn., 46:291–314, 2002. C.-W. Hsu and C.-J. Lin. BSVM. http://www.csie.ntu.edu.tw/˜cjlin/bsvm/, 2006. T.-M. Huang, V. Kecman, and I. Kopriva. Kernel Based Algorithms for Mining Huge Data Sets: Supervised, Semi-supervised, and Unsupervised Learning. Springer, Berlin, 2006. D. Hush and C. Scovel. Polynomial-time decomposition algorithms for support vector machines. Mach. Learn., 51:51–71, 2003. D. Hush, P. Kelly, C. Scovel, and I. Steinwart. QP algorithms with guaranteed accuracy and run time for support vector machines. J. Mach. Learn. Res., 7:733–769, 2006. T. Joachims. Making large-scale SVM learning practical. In B. Sch¨ lkopf, C. Burges, and A. Smola, o editors, Advances in Kernel Methods – Support Vector Learning, chapter 11, pages 169–184. MIT Press, Cambridge, MA, 1999. V. Kecman, T.-M. Huang, and M. Vogt. Iterative single data algorithm for training kernel machines from huge data sets: Theory and performance. In L. Wang, editor, Support Vector Machines: Theory and Applications, pages 255–274. Springer Verlag, 2005. S. Keerthi, V. Sindhwani, and O. Chapelle. An efficient method for gradient-based adaptation of hyperparameters in SVM models. In Advances in Neural Information Processing Systems 19, pages 673–680. MIT Press, Cambridge, MA, 2007. 201 S TEINWART, H USH AND S COVEL S. S. Keerthi, S. K. Shevade, C. Battacharyya, and K. R. K. Murthy. Improvements to Platt’s SMO algorithm for SVM classifier design. Neural Comput., 13:637–649, 2001. C. J. Lin. On the convergence of the decomposition method for support vector machines. IEEE Trans. Neural Networks, 12:1288–1298, 2001. C. J. Lin. Asymptotic convergence of an SMO algorithm without any assumptions. IEEE Trans. Neural Networks, 13:248–250, 2002a. C. J. Lin. A formal analysis of stopping criteria of decomposition methods for support vector machines. IEEE Trans. Neural Networks, 13:248–250, 2002b. N. List and H.-U. Simon. A general convergence theorem for the decomposition method. In Proceedings of the 17th Annual Conference on Learning Theory, pages 363–377. Springer, Heidelberg, 2004. N. List and H. U. Simon. General polynomial time decomposition algorithms. In S. Ben-David, J. Case, and A. Maruko, editors, Proceedings of the 18th Annual Conference on Learning Theory, COLT 2005, pages 308–322. Springer, Heidelberg, 2005. N. List and H. U. Simon. General polynomial time decomposition algorithms. J. Mach. Learn. Res., 8:303–321, 2007. N. List, D. Hush, C. Scovel, and I. Steinwart. Gaps in support vector optimization. In N. Bshouty and C. Gentile, editors, Proceedings of the 20th Conference on Learning Theory, pages 336–348. Springer, New York, 2007. L.Q. Luo and P. Tseng. On the convergence of the coordinate descent method for convex differentiable minimization. J. Optimization Theory Appl., 72:7–35, 1992. O. L. Mangasarian and D. R. Musicant. Lagrangian support vector machines. J. Mach. Learn. Res., 1:161–177, 2001. I. Steinwart. Sparseness of support vector machines. J. Mach. Learn. Res., 4:1071–1105, 2003. I. Steinwart and A. Christmann. Support Vector Machines. Springer, New York, 2008. I. Steinwart, D. Hush, and C. Scovel. An oracle inequality for clipped regularized risk minimizers. In B. Sch¨ lkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing o Systems 19, pages 1321–1328. MIT Press, Cambridge, MA, 2007. M. Vogt. SMO algorithms for support vector machines without bias. Technical report, University of Darmstadt, 2002. http://www.rtm.tu-darmstadt.de/ehemalige_mitarbeiter/˜vogt/ docs/vogt_2002_smowob.pdf. 202