jmlr jmlr2010 jmlr2010-34 jmlr2010-34-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Antoine Bordes, Léon Bottou, Patrick Gallinari, Jonathan Chang, S. Alex Smith
Abstract: The SGD-QN algorithm described in Bordes et al. (2009) contains a subtle flaw that prevents it from reaching its design goals. Yet the flawed SGD-QN algorithm has worked well enough to be a winner of the first Pascal Large Scale Learning Challenge (Sonnenburg et al., 2008). This document clarifies the situation, proposes a corrected algorithm, and evaluates its performance. Keywords: stochastic gradient descent, support vector machine, conditional random fields
A. Bordes, L. Bottou, and P. Gallinari. SGD-QN: Careful quasi-Newton stochastic gradient descent. Journal of Machine Learning Research, 10:1737–1754, July 2009. T. Kudo. CRF++: Yet another CRF toolkit, 2007. http://crfpp.sourceforge.net. J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Carla E. Brodley and Andrea Pohoreckyj Danyluk, editors, Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), pages 282–289, Williams College, Williamstown, 2001. Morgan Kaufmann. B. T. Polyak and A. B. Juditsky. Acceleration of stochastic approximation by averaging. SIAM J. Control Optim., 30(4):838–855, 1992. E. F. Tjong Kim Sang and S. Buchholz. Introduction to the CoNLL-2000 shared task: Chunking. In Claire Cardie, Walter Daelemans, Claire Nedellec, and Erik Tjong Kim Sang, editors, Proceedings of CoNLL-2000 and LLL-2000, pages 127–132. Lisbon, Portugal, 2000. N. Schraudolph, J. Yu, and S. G¨ nter. A stochastic quasi-Newton method for online convex optiu mization. In Proc. 11th Intl. Conf. on Artificial Intelligence and Statistics (AIstats), pages 433– 440. Soc. for Artificial Intelligence and Statistics, 2007. S. Sonnenburg, V. Franc, E. Yom-Tov, and M. Sebag. Pascal large scale learning challenge. ICML 2008 Workshop, 2008. http://largescale.first.fraunhofer.de. W. Xu. Towards optimal one pass large scale learning with averaged stochastic gradient descent. Submitted to JMLR, 2010. 2240