jmlr jmlr2009 jmlr2009-83 jmlr2009-83-reference knowledge-graph by maker-knowledge-mining

83 jmlr-2009-SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent

Source: pdf

Author: Antoine Bordes, Léon Bottou, Patrick Gallinari

Abstract: The SGD-QN algorithm is a stochastic gradient descent algorithm that makes careful use of secondorder information and splits the parameter update into independently scheduled components. Thanks to this design, SGD-QN iterates nearly as fast as a ﬁrst-order stochastic gradient descent but requires less iterations to achieve the same accuracy. This algorithm won the “Wild Track” of the ﬁrst PASCAL Large Scale Learning Challenge (Sonnenburg et al., 2008). Keywords: support vector machine, stochastic gradient descent

reference text

S.-I. Amari, H. Park, and K. Fukumizu. Adaptive method of realizing natural gradient learning for multilayer perceptrons. Neural Computation, 12:1409, 2000. S. Becker and Y. Le Cun. Improving the convergence of back-propagation learning with secondorder methods. In Proc. 1988 Connectionist Models Summer School, pages 29–37. Morgan Kaufman, 1989. A. Bordes, S. Ertekin, J. Weston, and L. Bottou. Fast kernel classiﬁers with online and active learning. J. Machine Learning Research, 6:1579–1619, September 2005. L. Bottou. Online algorithms and stochastic approximations. In David Saad, editor, Online Learning and Neural Networks. Cambridge University Press, Cambridge, UK, 1998. L. Bottou. Stochastic gradient descent on toy problems, 2007. http://leon.bottou.org/ projects/sgd. L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In Adv. in Neural Information Processing Systems, volume 20. MIT Press, 2008. L. Bottou and Y. LeCun. On-line learning for very large datasets. Applied Stochastic Models in Business and Industry, 21(2):137–151, 2005. X. Driancourt. Optimisation par descente de gradient stochastique de systèmes modulaires combinant réseaux de neurones et programmation dynamique. PhD thesis, Université Paris XI, Orsay, France, 1994. V. Fabian. Asymptotically efﬁcient stochastic approximation; the RM case. Annals of Statistics, 1 (3):486–495, 1973. C.-J. Hsieh, K.-W. Chang, C.-J. Lin, S. Keerthi, and S. Sundararajan. A dual coordinate descent method for large-scale linear SVM. In Proc. 25th Intl. Conf. on Machine Learning (ICML’08), pages 408–415. Omnipress, 2008. Y. Le Cun, L. Bottou, G. Orr, and K.-R. Müller. Efﬁcient backprop. In Neural Networks, Tricks of the Trade, Lecture Notes in Computer Science LNCS 1524. Springer Verlag, 1998. D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. RCV1: A new benchmark collection for text categorization research. J. Machine Learning Research, 5:361–397, 2004. N. Murata and S.-I. Amari. Statistical analysis of learning dynamics. Signal Processing, 74(1): 3–28, 1999. J. Nocedal. Updating quasi-Newton matrices with limited storage. Mathematics of Computation, 35:773–782, 1980. N. Schraudolph. Local gain adaptation in stochastic gradient descent. In In Proc. of the 9th Intl. Conf. on Artiﬁcial Neural Networks, pages 569–574, 1999. 1753 B ORDES , B OTTOU AND G ALLINARI N. Schraudolph, J. Yu, and S. Günter. A stochastic quasi-Newton method for online convex optimization. In Proc. 11th Intl. Conf. on Artiﬁcial Intelligence and Statistics (AIstats), pages 433– 440. Soc. for Artiﬁcial Intelligence and Statistics, 2007. S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated subgradient solver for SVM. In Proc. 24th Intl. Conf. on Machine Learning (ICML’07), pages 807–814. ACM, 2007. S. Sonnenburg, V. Franc, E. Yom-Tov, and M. Sebag. Pascal large scale learning challenge. ICML’08 Workshop, 2008. http://largescale.first.fraunhofer.de. 1754