jmlr jmlr2006 jmlr2006-70 jmlr2006-70-reference knowledge-graph by maker-knowledge-mining

70 jmlr-2006-Online Passive-Aggressive Algorithms


Source: pdf

Author: Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, Yoram Singer

Abstract: We present a family of margin based online learning algorithms for various prediction tasks. In particular we derive and analyze algorithms for binary and multiclass categorization, regression, uniclass prediction and sequence prediction. The update steps of our different algorithms are all based on analytical solutions to simple constrained optimization problems. This unified view allows us to prove worst-case loss bounds for the different algorithms and for the various decision problems based on a single lemma. Our bounds on the cumulative loss of the algorithms are relative to the smallest loss that can be attained by any fixed hypothesis, and as such are applicable to both realizable and unrealizable settings. We demonstrate some of the merits of the proposed algorithms in a series of experiments with synthetic and real data sets.


reference text

S. Agmon. The relaxation method for linear inequalities. Canadian Journal of Mathematics, 6(3): 382–392, 1954. Y. Altun, I. Tsochantaridis, and T. Hofmann. Hidden markov support vector machines. In Proceedings of the Twentieth International Conference on Machine Learning, 2003. S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. 583 C RAMMER , D EKEL , K ESHET, S HALEV-S HWARTZ AND S INGER M. Collins. Discriminative reranking for natural language parsing. In Machine Learning: Proceedings of the Seventeenth International Conference, 2000. K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector machines. Jornal of Machine Learning Research, 2:265–292, 2001. K. Crammer and Y. Singer. A new family of online algorithms for category ranking. Jornal of Machine Learning Research, 3:1025–1058, 2003a. K. Crammer and Y. Singer. Ultraconservative online algorithms for multiclass problems. Jornal of Machine Learning Research, 3:951–991, 2003b. N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines. Cambridge University Press, 2000. O. Dekel, J. Keshet, and Y. Singer. Large margin hierarchical classification. In Proceedings of the Twenty-First International Conference on Machine Learning, 2004a. O. Dekel, S. Shalev-Shwartz, and Y. Singer. The power of selective memory: Self-bounded learning of prediction suffix trees. In Advances in Neural Information Processing Systems 17, 2004b. A. Elisseeff and J. Weston. A kernel method for multi-labeled classification. In Advances in Neural Information Processing Systems 14, 2001. Y. Freund and R. E. Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37(3):277–296, 1999. C. Gentile. A new approximate maximal margin classification algorithm. Journal of Machine Learning Research, 2:213–242, 2001. C. Gentile. The robustness of the p-norm algorithms. Machine Learning, 53(3), 2002. D.P Helmbold, J. Kivinen, and M. Warmuth. Relative loss bounds for single neurons. IEEE Transactions on Neural Networks, 10(6):1291–1304, 1999. M. Herbster. Learning additive models online with fast evaluating kernels. In Proceedings of the Fourteenth Annual Conference on Computational Learning Theory, pages 444–460, 2001. J. Kivinen, A. J. Smola, and R. C. Williamson. Online learning with kernels. IEEE Transactions on Signal Processing, 52(8):2165–2176, 2002. J. Kivinen and M. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1–64, January 1997. N. Klasner and H.U. Simon. From noise-free to noise-tolerant and from on-line to batch learning. In Proceedings of the Eighth Annual Conference on Computational Learning Theory, pages 250– 264, 1995. Y. Li and P. M. Long. The relaxed online maximum margin algorithm. Machine Learning, 46(1–3): 361–387, 2002. 584 O NLINE PASSIVE -AGGRESSIVE A LGORITHMS N. Littlestone. Mistake bounds and logarithmic linear-threshold learning algorithms. PhD thesis, U. C. Santa Cruz, March 1989. A. B. J. Novikoff. On convergence proofs on perceptrons. In Proceedings of the Symposium on the Mathematical Theory of Automata, volume XII, pages 615–622, 1962. J. Rocchio. Relevance feedback information retrieval. In Gerard Salton, editor, The Smart retrieval system—experiments in automatic document processing, pages 313–323. Prentice-Hall, Englewood Cliffs, NJ, 1971. F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65:386–407, 1958. (Reprinted in Neurocomputing (MIT Press, 1988).). G. Salton and C. Buckley. Term weighting approaches in automatic text retrieval. Information Processing and Management, 24(5), 1988. R.E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pages 80–91, 1998. To appear, Machine Learning. R.E. Schapire and Y. Singer. BoosTexter: A boosting-based system for text categorization. Machine Learning, 32(2/3), 2000. B. Sch¨ lkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, o Optimization and Beyond. MIT Press, 2002. S. Shalev-Shwartz, J. Keshet, and Y. Singer. Learning to align polyphonic music. In Proceedings of the 5th International Conference on Music Information Retrieval, 2004a. http://www.cs.huji.ac.il/∼shais/. S. Shalev-Shwartz, Y. Singer, and A. Ng. Online and batch learning of pseudo-metrics. In Proceedings of the Twenty-First International Conference on Machine Learning, 2004b. B. Taskar, C. Guestrin, and D. Koller. Max-margin markov networks. In Advances in Neural Information Processing Systems 17, 2003. I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support vector machine learning for interdependent and structured output spaces. In Proceedings of the Twenty-First International Conference on Machine Learning, 2004. V. N. Vapnik. Statistical Learning Theory. Wiley, 1998. J. Weston and C. Watkins. Support vector machines for multi-class pattern recognition. In Proceedings of the Seventh European Symposium on Artificial Neural Networks, April 1999. 585