nips nips2004 nips2004-144 nips2004-144-reference knowledge-graph by maker-knowledge-mining

144 nips-2004-Parallel Support Vector Machines: The Cascade SVM

Source: pdf

Author: Hans P. Graf, Eric Cosatto, Léon Bottou, Igor Dourdanovic, Vladimir Vapnik

Abstract: We describe an algorithm for support vector machines (SVM) that can be parallelized efficiently and scales to very large problems with hundreds of thousands of training vectors. Instead of analyzing the whole training set in one optimization step, the data are split into subsets and optimized separately with multiple SVMs. The partial results are combined and filtered again in a ‘Cascade’ of SVMs, until the global optimum is reached. The Cascade SVM can be spread over multiple processors with minimal communication overhead and requires far less memory, since the kernel matrices are much smaller than for a regular SVM. Convergence to the global optimum is guaranteed with multiple passes through the Cascade, but already a single pass provides good generalization. A single pass is 5x – 10x faster than a regular SVM for problems of 100,000 vectors when implemented on a single processor. Parallel implementations on a cluster of 16 processors were tested with over 1 million vectors (2-class problems), converging in a day or two, while a regular SVM never converged in over a week. 1

reference text

[1] V. Vapnik, “Statistical Learning Theory”, Wiley, New York, 1998.

[2] B. Boser, I. Guyon, V. Vapnik, “A training algorithm for optimal margin classifiers” in Proc. 5 th Annual Workshop on Computational Learning Theory, Pittsburgh, ACM, 1992.

[3] E. Osuna, R. Freund, F. Girosi, “Training Support Vector Machines, an Application to Face Detection”, in Computer vision and Pattern Recognition, pp.130-136, 1997.

[4] T. Joachims, “Making large-scale support vector machine learning practical”, in Advances in Kernel Methods, B. Schölkopf, C. Burges, A. Smola, (eds.), Cambridge, MIT Press, 1998.

[5] J.C. Platt, “Fast training of support vector machines using sequential minimal optimization”, in Adv. in Kernel Methods: Schölkopf, C. Burges, A. Smola (eds.), 1998.

[6] C. Chang, C. Lin, “LIBSVM”, http://www.csie.ntu.edu.tw/~cjlin/libsvm/.

[7] R. Collobert, S. Bengio, and J. Mariéthoz. Torch: A modular machine learning software library. Technical Report IDIAP-RR 02-46, IDIAP, 2002.

[8] D. DeCoste and B. Schölkopf, “Training Invariant Support Vector Machines”, Machine Learning, 46, 161-190, 2002.

[9] R. Collobert, Y. Bengio, S. Bengio, “A Parallel Mixture of SVMs for Very Large Scale Problems”, in Neural Information Processing Systems, Vol. 17, MIT Press, 2004.

[10] A. Tveit, H. Engum. Parallelization of the Incremental Proximal Support Vector Machine Classifier using a Heap-based Tree Topology. Tech. Report, IDI, NTNU, Trondheim, 2003.

[11] J. X. Dong, A. Krzyzak , C. Y. Suen, “A fast Parallel Optimization for Training Support Vector Machine.” Proceedings of 3rd International Conference on Machine Learning and Data Mining, P. Perner and A. Rosenfeld (Eds.) Springer Lecture Notes in Artificial Intelligence (LNAI 2734), pp. 96--105, Leipzig, Germany, July 5-7, 2003.

[12] G. Zanghirati, L. Zanni, “A parallel solver for large quadratic programs in training support vector machines”, Parallel Computing, Vol. 29, pp.535-551, 2003.