nips nips2010 nips2010-182 nips2010-182-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Francesco Orabona, Koby Crammer
Abstract: We propose a general framework to online learning for classification problems with time-varying potential functions in the adversarial setting. This framework allows to design and prove relative mistake bounds for any generic loss function. The mistake bounds can be specialized for the hinge loss, allowing to recover and improve the bounds of known online classification algorithms. By optimizing the general bound we derive a new online classification algorithm, called NAROW, that hybridly uses adaptive- and fixed- second order information. We analyze the properties of the algorithm and illustrate its performance using synthetic dataset. 1
[1] N. Cesa-Bianchi, A. Conconi, and C. Gentile. A second-order Perceptron algorithm. SIAM Journal on Computing, 34(3):640–668, 2005.
[2] N. Cesa-Bianchi and G. Lugosi. Prediction, learning, and games. Cambridge University Press, 2006.
[3] K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer. Online passive-aggressive algorithms. Journal of Machine Learning Research, 7:551–585, 2006.
[4] K. Crammer, M. Dredze, and F. Pereira. Exact Convex Confidence-Weighted learning. Advances in Neural Information Processing Systems, 22, 2008.
[5] K. Crammer, A. Kulesza, and M. Dredze. Adaptive regularization of weight vectors. Advances in Neural Information Processing Systems, 23, 2009.
[6] K. Crammer and Y. Singer. Ultraconservative online algorithms for multiclass problems. Journal of Machine Learning Research, 3:951–991, 2003.
[7] N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, 2000.
[8] M. Dredze, K. Crammer, and F. Pereira. Online Confidence-Weighted learning. Proceedings of the 25th International Conference on Machine Learning, 2008.
[9] J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Technical Report 2010-24, UC Berkeley Electrical Engineering and Computer Science, 2010. Available at http://cs.berkeley.edu/˜jduchi/ projects/DuchiHaSi10.pdf.
[10] Y. Freund and R. E. Schapire. Large margin classification using the Perceptron algorithm. Machine Learning, pages 277–296, 1999.
[11] C. Gentile. The robustness of the p-norm algorithms. Machine Learning, 53(3):265–299, 2003.
[12] E. Hazan and S. Kale. Extracting certainty from uncertainty: Regret bounded by variation in costs. In Proc. of the 21st Conference on Learning Theory, 2008.
[13] S. Kakade, S. Shalev-Shwartz, and A. Tewari. On the duality of strong convexity and strong smoothness: Learning applications and matrix regularization. Technical report, TTI, 2009. http://www.cs.huji.ac.il/ shais/papers/KakadeShalevTewari09.pdf.
[14] J. Kivinen, A. Smola, and R. Williamson. Online learning with kernels. IEEE Trans. on Signal Processing, 52(8):2165–2176, 2004.
[15] A. Rakhlin and A. Tewari. Lecture notes on online learning. Technical report, 2008. Available at http://www-stat.wharton.upenn.edu/˜rakhlin/papers/online_ learning.pdf.
[16] S. Shalev-Shwartz. Online learning: Theory, algorithms, and applications. Technical report, The Hebrew University, 2007. PhD thesis.
[17] S. Shalev-Shwartz and Y. Singer. A primal-dual perspective of online learning algorithms. Machine Learning Journal, 2007.
[18] L. Xiao. Dual averaging method for regularized stochastic learning and online optimization. In Advances in Neural Information Processing Systems 22, pages 2116–2124. 2009. 9