nips nips2010 nips2010-158 nips2010-158-reference knowledge-graph by maker-knowledge-mining

158 nips-2010-Learning via Gaussian Herding

Source: pdf

Author: Koby Crammer, Daniel D. Lee

Abstract: We introduce a new family of online learning algorithms based upon constraining the velocity ﬂow over a distribution of weight vectors. In particular, we show how to effectively herd a Gaussian weight vector distribution by trading off velocity constraints with a loss function. By uniformly bounding this loss function, we demonstrate how to solve the resulting optimization analytically. We compare the resulting algorithms on a variety of real world datasets, and demonstrate how these algorithms achieve state-of-the-art robust performance, especially with high label noise in the training data. 1

reference text

[1] Nicol´ Cesa-Bianchi, Alex Conconi, and Claudio Gentile. A second-order perceptron algoo rithm. Siam Journal of Commutation, 34(3):640–668, 2005.

[2] Nicolo Cesa-Bianchi and Gabor Lugosi. Prediction, Learning, and Games. Cambridge University Press, New York, NY, USA, 2006.

[3] G. Chechik, V. Sharma, U. Shalit, and S. Bengio. An online algorithm for large scale image similarity learning. In NIPS, 2009.

[4] Michael Collins. Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In EMNLP, 2002.

[5] K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer. Online passive-aggressive algorithms. JMLR, 7:551–585, 2006.

[6] K. Crammer, M. Dredze, and A. Kulesza. Multi-class conﬁdence weighted algorithms. In EMNLP, 2009.

[7] K. Crammer, M. Dredze, and F. Pereira. Exact conﬁdence-weighted learning. In NIPS 22, 2008.

[8] K. Crammer, A. Kulesza, and M. Dredze. Adaptive regularization of weighted vectors. In Advances in Neural Information Processing Systems 23, 2009.

[9] M. Dredze, K. Crammer, and F. Pereira. Conﬁdence-weighted linear classiﬁcation. In ICML, 2008.

[10] A. Gunawardana, M. Mahajan, A Acero, and Pl att J. C. Hidden conditional random ﬁelds for phone classiﬁ cation. In Proceedings of ICSCT, 2005.

[11] J. Mercer. Functions of positive and negative type and their connection with the theory of integral equations. Philos. Trans. Roy. Soc. London A, 209:415–446, 1909.

[12] K. B. Petersen and M. S. Pedersen. The matrix cookbook, 2007.

[13] F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65:386–407, 1958.

[14] Bernhard Sch¨ lkopf and Alexander J. Smola. Learning with Kernels: Support Vector Mao chines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, USA, 2001. 9