nips nips2003 nips2003-148 nips2003-148-reference knowledge-graph by maker-knowledge-mining

148 nips-2003-Online Passive-Aggressive Algorithms

Source: pdf

Author: Shai Shalev-shwartz, Koby Crammer, Ofer Dekel, Yoram Singer

Abstract: We present a uniﬁed view for online classiﬁcation, regression, and uniclass problems. This view leads to a single algorithmic framework for the three problems. We prove worst case loss bounds for various algorithms for both the realizable case and the non-realizable case. A conversion of our main online algorithm to the setting of batch learning is also discussed. The end result is new algorithms and accompanying loss bounds for the hinge-loss. 1

reference text

[1] H. H. Bauschke and J. M. Borwein. On projection algorithms for solving convex feasibility problems. SIAM Review, 1996.

[2] Y. Censor and S. A. Zenios. Parallel Optimization.. Oxford University Press, 1997.

[3] K. Crammer and Y. Singer. Ultraconservative online algorithms for multiclass problems. Jornal of Machine Learning Research, 3:951–991, 2003.

[4] Y. Freund and R. E. Schapire. Large margin classiﬁcation using the perceptron algorithm. Machine Learning, 37(3):277–296, 1999.

[5] C. Gentile. A new approximate maximal margin classiﬁcation algorithm. Journal of Machine Learning Research, 2:213–242, 2001.

[6] C. Gentile and M. Warmuth. Linear hinge loss and average margin. In NIPS’98.

[7] D. P. Helmbold, R. E. Schapire, Y. Singer, and M. K. Warmuth. A comparison of new and old algorithms for a mixture estimation problem. In COLT’95.

[8] M. Herbster. COLT’01. Learning additive models online with fast evaluating kernels. In

[9] J. Kivinen, D. P. Helmbold, and M. Warmuth. Relative loss bounds for single neurons. IEEE Transactions on Neural Networks, 10(6):1291–1304, 1999.

[10] J. Kivinen, A. J. Smola, and R. C. Williamson. Online learning with kernels. In NIPS’02.

[11] J. Kivinen and M. K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1–64, January 1997.

[12] J. Kivinen and M. K. Warmuth. Relative loss bounds for multidimensional regression problems. Journal of Machine Learning, 45(3):301–329, July 2001.

[13] N. Klasner and H. U. Simon. From noise-free to noise-tolerant and from on-line to batch learning. In COLT’95.

[14] Y. Li and P. M. Long. The relaxed online maximum margin algorithm. Machine Learning, 46(1–3):361–387, 2002.

[15] V. N. Vapnik. Statistical Learning Theory. Wiley, 1998.

[16] E. Xing, A. Y. Ng, M. Jordan, and S. Russel. Distance metric learning, with application to clustering with side-information. In NIPS’03.

[17] M. Zinkevich. Online convex programming and generalized inﬁnitesimal gradient ascent. In ICML’03.