jmlr jmlr2010 jmlr2010-34 jmlr2010-34-reference knowledge-graph by maker-knowledge-mining

34 jmlr-2010-Erratum: SGDQN is Less Careful than Expected

Source: pdf

Author: Antoine Bordes, Léon Bottou, Patrick Gallinari, Jonathan Chang, S. Alex Smith

Abstract: The SGD-QN algorithm described in Bordes et al. (2009) contains a subtle ﬂaw that prevents it from reaching its design goals. Yet the ﬂawed SGD-QN algorithm has worked well enough to be a winner of the ﬁrst Pascal Large Scale Learning Challenge (Sonnenburg et al., 2008). This document clariﬁes the situation, proposes a corrected algorithm, and evaluates its performance. Keywords: stochastic gradient descent, support vector machine, conditional random ﬁelds

reference text

A. Bordes, L. Bottou, and P. Gallinari. SGD-QN: Careful quasi-Newton stochastic gradient descent. Journal of Machine Learning Research, 10:1737–1754, July 2009. T. Kudo. CRF++: Yet another CRF toolkit, 2007. http://crfpp.sourceforge.net. J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random ﬁelds: Probabilistic models for segmenting and labeling sequence data. In Carla E. Brodley and Andrea Pohoreckyj Danyluk, editors, Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), pages 282–289, Williams College, Williamstown, 2001. Morgan Kaufmann. B. T. Polyak and A. B. Juditsky. Acceleration of stochastic approximation by averaging. SIAM J. Control Optim., 30(4):838–855, 1992. E. F. Tjong Kim Sang and S. Buchholz. Introduction to the CoNLL-2000 shared task: Chunking. In Claire Cardie, Walter Daelemans, Claire Nedellec, and Erik Tjong Kim Sang, editors, Proceedings of CoNLL-2000 and LLL-2000, pages 127–132. Lisbon, Portugal, 2000. N. Schraudolph, J. Yu, and S. G¨ nter. A stochastic quasi-Newton method for online convex optiu mization. In Proc. 11th Intl. Conf. on Artiﬁcial Intelligence and Statistics (AIstats), pages 433– 440. Soc. for Artiﬁcial Intelligence and Statistics, 2007. S. Sonnenburg, V. Franc, E. Yom-Tov, and M. Sebag. Pascal large scale learning challenge. ICML 2008 Workshop, 2008. http://largescale.first.fraunhofer.de. W. Xu. Towards optimal one pass large scale learning with averaged stochastic gradient descent. Submitted to JMLR, 2010. 2240