jmlr jmlr2005 jmlr2005-33 jmlr2005-33-reference knowledge-graph by maker-knowledge-mining

33 jmlr-2005-Fast Kernel Classifiers with Online and Active Learning

Source: pdf

Author: Antoine Bordes, Seyda Ertekin, Jason Weston, Léon Bottou

Abstract: Very high dimensional learning systems become theoretically possible when training examples are abundant. The computing cost then becomes the limiting factor. Any efﬁcient learning algorithm should at least take a brief look at each example. But should all examples be given equal attention? This contribution proposes an empirical answer. We ﬁrst present an online SVM algorithm based on this premise. LASVM yields competitive misclassiﬁcation rates after a single pass over the training examples, outspeeding state-of-the-art SVM solvers. Then we show how active example selection can yield faster training, higher accuracies, and simpler models, using only a fraction of the training example labels.

reference text

´ M. A. Aizerman, E. M. Braverman, and L. I. Rozono´ r. Theoretical foundations of the potential e function method in pattern recognition learning. Automation and Remote Control, 25:821–837, 1964. N. Aronszajn. Theory of reproducing kernels. Transactions of the American Mathematical Society, 68:337–404, 1950. G. Bakır, L. Bottou, and J. Weston. Breaking SVM complexity with cross-training. In Lawrence Saul, Bernhard Sch¨ lkopf, and L´ on Bottou, editors, Advances in Neural Information Processing o e Systems, volume 17, pages 81–88. MIT Press, 2005. A. Bordes and L. Bottou. The Huller: a simple and efﬁcient online SVM. In Proceedings of the 16th European Conference on Machine Learning (ECML2005), Lecture Notes in Artiﬁcial Intelligence, to appear. Springer, 2005. L. Bottou, C. Cortes, J. S. Denker, H. Drucker, I. Guyon, L.D. Jackel, Y. LeCun, U. A. Muller, E. Sackinger, P. Simard, and V. Vapnik. Comparison of classiﬁer methods: a case study in handwritten digit recognition. In Proceedings of the 12th IAPR International Conference on Pattern Recognition, Conference B: Computer Vision & Image Processing., volume 2, pages 77– 82, Jerusalem, October 1994. IEEE. L. Bottou and Y. LeCun. On-line learning for very large datasets. Applied Stochastic Models in Business and Industry, 21(2):137–151, 2005. C. Campbell, N. Cristianini, and A. J. Smola. Query learning with large margin classiﬁers. In Proceedings of ICML’2000, 2000. G. Cauwenberghs and T. Poggio. Incremental and decremental support vector machine learning. In Advances in Neural Processing Systems, 2001. N. Cesa-Bianchi, C. Gentile, and L. Zaniboni. Worst-case analysis of selective sampling for linearthreshold algorithms. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, pages 241–248. MIT Press, Cambridge, MA, 2005. C.-C. Chang and C.-J. Lin. LIBSVM : a library for support vector machines. Technical report, Computer Science and Information Engineering, National Taiwan University, 2001-2004. http://www.csie.ntu.edu.tw/∼cjlin/libsvm. D. Cohn, L. Atlas, and R. Ladner. Training connectionist networks with queries and selective sampling. In D. Touretzky, editor, Advances in Neural Information Processing Systems 2, San Mateo, CA, 1990. Morgan Kaufmann. 1616 FAST K ERNEL C LASSIFIERS WITH O NLINE AND ACTIVE L EARNING R. Collobert and S. Bengio. SVMTorch: Support vector machines for large-scale regression problems. Journal of Machine Learning Research, 1:143–160, 2001. R. Collobert, S. Bengio, and Y. Bengio. A parallel mixture of SVMs for very large scale problems. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, Cambridge, MA, 2002. MIT Press. C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20:273–297, 1995. K. Crammer, J. Kandola, and Y. Singer. Online classiﬁcation on a budget. In Sebastian Thrun, Lawrence Saul, and Bernhard Sch¨ lkopf, editors, Advances in Neural Information Processing o Systems 16. MIT Press, Cambridge, MA, 2004. K. Crammer and Y. Singer. Ultraconservative online algorithms for multiclass problems. Journal of Machine Learning Research, 3:951–991, 2003. N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines and other kernelbased learning methods. Cambridge University Press, Cambridge, UK, 2000. C. Domingo and O. Watanabe. MadaBoost: a modiﬁcation of AdaBoost. In Proceedings of the 13th Annual Conference on Computational Learning Theory, COLT’00, pages 180–189, 2000. B. Eisenberg and R. Rivest. On the sample complexity of PAC learning using random and chosen examples. In M. Fulk and J. Case, editors, Proceedings of the Third Annual ACM Workshop on Computational Learning Theory, pages 154–162, San Mateo, CA, 1990. Kaufmann. V. V. Fedorov. Theory of Optimal Experiments. Academic Press, New York, 1972. Y. Freund and R. E. Schapire. Large margin classiﬁcation using the perceptron algorithm. In J. Shavlik, editor, Machine Learning: Proceedings of the Fifteenth International Conference, San Francisco, CA, 1998. Morgan Kaufmann. T.-T. Frieß, N. Cristianini, and C. Campbell. The kernel Adatron algorithm: a fast and simple learning procedure for support vector machines. In J. Shavlik, editor, 15th International Conf. Machine Learning, pages 188–196. Morgan Kaufmann Publishers, 1998. See (Cristianini and Shawe-Taylor, 2000, section 7.2) for an updated presentation. C. Gentile. A new approximate maximal margin classiﬁcation algorithm. Journal of Machine Learning Research, 2:213–242, 2001. H.-P. Graf, E. Cosatto, L. Bottou, I. Dourdanovic, and V. Vapnik. Parallel support vector machines: The Cascade SVM. In Lawrence Saul, Bernhard Sch¨ lkopf, and L´ on Bottou, editors, Advances o e in Neural Information Processing Systems, volume 17. MIT Press, 2005. I. Guyon, B. Boser, and V. Vapnik. Automatic capacity tuning of very large VC-dimension classiﬁers. In S. J. Hanson, J. D. Cowan, and C. Lee Giles, editors, Advances in Neural Information Processing Systems, volume 5, pages 147–155. Morgan Kaufmann, San Mateo, CA, 1993. T. Joachims. Making large–scale SVM learning practical. In B. Sch¨ lkopf, C. J. C. Burges, and o A. J. Smola, editors, Advances in Kernel Methods — Support Vector Learning, pages 169–184, Cambridge, MA, 1999. MIT Press. 1617 B ORDES , E RTEKIN , W ESTON , AND B OTTOU S. S. Keerthi and E. G. Gilbert. Convergence of a generalized SMO algorithm for SVM classiﬁer design. Machine Learning, 46:351–360, 2002. Y. Li and P. Long. The relaxed online maximum margin algorithm. Machine Learning, 46:361–387, 2002. C.-J. Lin. On the convergence of the decomposition method for support vector machines. IEEE Transactions on Neural Networks, 12(6):1288–1298, 2001. N. Littlestone and M. Warmuth. Relating data compression and learnability. Technical report, University of California Santa Cruz, 1986. G. Loosli, S. Canu, S.V.N. Vishwanathan, A. J. Smola, and M. Chattopadhyay. Une boˆte a outils ı ` rapide et simple pour les SVM. In Michel Liqui` re and Marc Sebban, editors, CAp 2004 e Confrence d’Apprentissage, pages 113–128. Presses Universitaires de Grenoble, 2004. ISBN 9-782706-112249. D. J. C. MacKay. Information based objective functions for active data selection. Neural Computation, 4(4):589–603, 1992. N. Murata and S.-I. Amari. Statistical analysis of learning dynamics. Signal Processing, 74(1): 3–28, 1999. N. J. Nilsson. Learning machines: Foundations of Trainable Pattern Classifying Systems. McGraw– Hill, 1965. A. B. J. Novikoff. On convergence proofs on perceptrons. In Proceedings of the Symposium on the Mathematical Theory of Automata, volume 12, pages 615–622. Polytechnic Institute of Brooklyn, 1962. J. Platt. Fast training of support vector machines using sequential minimal optimization. In B. Sch¨ lkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods — Supo port Vector Learning, pages 185–208, Cambridge, MA, 1999. MIT Press. F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6):386–408, 1958. G. Schohn and D. Cohn. Less is more: Active learning with support vector machines. In Pat Langley, editor, Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), pages 839–846. Morgan Kaufmann, June 2000. B. Sch¨ lkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002. o A. Schrijver. Theory of Linear and Integer Programming. John Wiley and Sons, New York, 1986. I. Steinwart. Sparseness of support vector machines—some asymptotically sharp bounds. In Sebastian Thrun, Lawrence Saul, and Bernhard Sch¨ lkopf, editors, Advances in Neural Information o Processing Systems 16. MIT Press, Cambridge, MA, 2004. 1618 FAST K ERNEL C LASSIFIERS WITH O NLINE AND ACTIVE L EARNING N. Takahashi and T. Nishi. On termination of the SMO algorithm for support vector machines. In Proceedings of International Symposium on Information Science and Electrical Engineering 2003 (ISEE 2003), pages 187–190, November 2003. S. Tong and D. Koller. Support vector machine active learning with applications to text classiﬁcation. In P. Langley, editor, Proceedings of the 17th International Conference on Machine Learning, San Francisco, California, 2000. Morgan Kaufmann. I. W. Tsang, J. T. Kwok, and P.-M. Cheung. Very large SVM training using core vector machines. In Proceedings of the Tenth International Workshop on Artiﬁcial Intelligence and Statistics (AISTAT’05). Society for Artiﬁcial Intelligence and Statistics, 2005. V. Vapnik. Estimation of Dependences Based on Empirical Data. Springer-Verlag, Berlin, 1982. V. Vapnik and A. Lerner. Pattern recognition using generalized portrait method. Automation and Remote Control, 24:774–780, 1963. V. N. Vapnik. Statistical Learning Theory. John Wiley & Sons, 1998. V. N. Vapnik, T. G. Glaskova, V. A. Koscheev, A. I. Mikhailski, and A. Y. Chervonenkis. Algorihms and Programs for Dependency Estimation. Nauka, 1984. In Russian. S. V. N. Vishwanathan, A. J. Smola, and M. Narasimha Murty. SimpleSVM. In Proceedings of ICML 2003, pages 760–767, 2003. J. Weston, A. Bordes, and L. Bottou. Online (and ofﬂine) on an even tighter budget. In Robert G. Cowell and Zoubin Ghahramani, editors, Proceedings of the Tenth International Workshop on Artiﬁcial Intelligence and Statistics, Jan 6-8, 2005, Savannah Hotel, Barbados, pages 413–420. Society for Artiﬁcial Intelligence and Statistics, 2005. G. Zoutendijk. Methods of Feasible Directions. Elsevier, 1960. 1619