nips nips2013 nips2013-20 nips2013-20-reference knowledge-graph by maker-knowledge-mining

20 nips-2013-Accelerating Stochastic Gradient Descent using Predictive Variance Reduction

Source: pdf

Author: Rie Johnson, Tong Zhang

Abstract: Stochastic gradient descent is popular for large scale optimization but has slow convergence asymptotically due to the inherent variance. To remedy this problem, we introduce an explicit variance reduction method for stochastic gradient descent which we call stochastic variance reduced gradient (SVRG). For smooth and strongly convex functions, we prove that this method enjoys the same fast convergence rate as those of stochastic dual coordinate ascent (SDCA) and Stochastic Average Gradient (SAG). However, our analysis is signiﬁcantly simpler and more intuitive. Moreover, unlike SDCA or SAG, our method does not require the storage of gradients, and thus is more easily applicable to complex problems such as some structured prediction problems and neural network learning. 1

reference text

C.J. Hsieh, K.W. Chang, C.J. Lin, S.S. Keerthi, and S. Sundararajan. A dual coordinate descent method for large-scale linear SVM. In ICML, pages 408–415, 2008. Nicolas Le Roux, Mark Schmidt, and Francis Bach. A Stochastic Gradient Method with an Exponential Convergence Rate for Strongly-Convex Optimization with Finite Training Sets. arXiv preprint arXiv:1202.6258, 2012. Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Boston, 2004. Mark Schmidt, Nicolas Le Roux, and Francis Bach. Minimizing ﬁnite sums with the stochastic average gradient. arXiv preprint arXiv:1309.2388, 2013. S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. In International Conference on Machine Learning, pages 807–814, 2007. Shai Shalev-Shwartz and Tong Zhang. Stochastic dual coordinate ascent methods for regularized loss minimization. arXiv preprint arXiv:1209.1873, 2012. T. Zhang. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the Twenty-First International Conference on Machine Learning, 2004. 9