nips nips2012 nips2012-324 nips2012-324-reference knowledge-graph by maker-knowledge-mining

324 nips-2012-Stochastic Gradient Descent with Only One Projection

Source: pdf

Author: Mehrdad Mahdavi, Tianbao Yang, Rong Jin, Shenghuo Zhu, Jinfeng Yi

Abstract: Although many variants of stochastic gradient descent have been proposed for large-scale convex optimization, most of them require projecting the solution at each iteration to ensure that the obtained solution stays within the feasible domain. For complex domains (e.g., positive semideﬁnite cone), the projection step can be computationally expensive, making stochastic gradient descent unattractive for large-scale optimization problems. We address this limitation by developing novel stochastic optimization algorithms that do not need intermediate projections. Instead, only one projection at the last iteration is needed to obtain a feasible solution in the given domain. Our theoretical analysis shows that with a high probability, √ the proposed algorithms achieve an O(1/ T ) convergence rate for general convex optimization, and an O(ln T /T ) rate for strongly convex optimization under mild conditions about the domain and the objective function. 1

reference text

[1] A. Agarwal, P. L. Bartlett, P. D. Ravikumar, and M. J. Wainwright. Information-theoretic lower bounds on the oracle complexity of stochastic convex optimization. IEEE Transactions on Information Theory, 58(5):3235–3249, 2012.

[2] F. Bach and E. Moulines. Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In NIPS, pages 451–459, 2011.

[3] D. P. Bertsekas. Nonlinear Programming. Athena Scientiﬁc, 2nd edition, 1999.

[4] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.

[5] K. L. Clarkson. Coresets, sparse greedy approximation, and the frank-wolfe algorithm. ACM Transactions on Algorithms, 6(4), 2010.

[6] J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efﬁcient projections onto the l1-ball for learning in high dimensions. In ICML, pages 272–279, 2008.

[7] M. Frank and P. Wolfe. An algorithm for quadratic programming. Naval Research Logistics, 3, 1956.

[8] E. Hazan. Sparse approximate solutions to semideﬁnite programs. In LATIN, pages 306–316, 2008.

[9] E. Hazan and S. Kale. Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization. Journal of Machine Learning Research - Proceedings Track, 19:421–436, 2011.

[10] E. Hazan and S. Kale. Projection-free online learning. In ICML, 2012.

[11] M. Jaggi. Sparse Convex Optimization Methods for Machine Learning. PhD thesis, ETH Zurich, Oct. 2011.

[12] M. Jaggi and M. Sulovsk´ . A simple algorithm for nuclear norm regularized problems. In y ICML, pages 471–478, 2010.

[13] G. Lan. An optimal method for stochastic composite optimization. Math. Program., 133(12):365–397, 2012.

[14] J. Liu and J. Ye. Efﬁcient euclidean projections in linear time. In ICML, page 83, 2009.

[15] M. Mahdavi, R. Jin, and T. Yang. Trading regret for efﬁciency: online convex optimization with long term constraints. JMLR, 13:2465–2490, 2012.

[16] A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM J. on Optimization, 19:1574–1609, 2009.

[17] Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, 2004.

[18] Y. Nesterov. Smooth minimization of non-smooth functions. Math. Program., 103(1):127– 152, 2005.

[19] S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for svm. In ICML, pages 807–814, 2007.

[20] Y. Ying and P. Li. Distance metric learning with eigenvalue optimization. JMLR., 13:1–26, 2012.

[21] T. Zhang. Sequential greedy approximation for certain convex optimization problems. Information Theory, IEEE Transactions on, 49:682–691, 2003.

[22] M. Zinkevich. Online convex programming and generalized inﬁnitesimal gradient ascent. In ICML, pages 928–936, 2003. 9