nips nips2011 nips2011-46 nips2011-46-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Andrew Cotter, Ohad Shamir, Nati Srebro, Karthik Sridharan
Abstract: Mini-batch algorithms have been proposed as a way to speed-up stochastic convex optimization problems. We study how such algorithms can be improved using accelerated gradient methods. We provide a novel analysis, which shows how standard gradient methods may sometimes be insufficient to obtain a significant speed-up and propose a novel accelerated gradient algorithm, which deals with this deficiency, enjoys a uniformly superior guarantee and works well in practice. 1
[1] A. Agarwal and J. Duchi. Distributed delayed stochastic optimization. Technical report, arXiv, 2011.
[2] A. Beck and M. Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31(3):167 – 175, 2003.
[3] L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In NIPS, 2007.
[4] O. Dekel, R. Gilad Bachrach, O. Shamir, and L. Xiao. Optimal distributed online prediction using mini-batches. Technical report, arXiv, 2010.
[5] G. Lan. An optimal method for stochastic convex optimization. Technical report, Georgia Institute of Technology, 2009.
[6] A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4):1574–1609, 2009.
[7] Y. Nesterov. A method for unconstrained convex minimization problem with the rate of convergence o(1/k 2 ). Doklady AN SSSR, 269:543–547, 1983.
[8] Y. Nesterov. Smooth minimization of non-smooth functions. 103(1):127–152, 2005. Math. Program.,
[9] Y. Nesterov. Primal-dual subgradient methods for convex problems. Mathematical Programming, 120(1):221–259, August 2009.
[10] O. Shamir O. Dekel, R. Gilad-Bachrach and L. Xiao. Optimal distributed online prediction. In ICML, 2011.
[11] S. Shalev-Shwartz, Y. Singer, N. Srebro, and A. Cotter. Pegasos: primal estimated sub-gradient solver for SVM. Math. Program., 127(1):3–30, 2011.
[12] S. Shalev-Shwartz and N. Srebro. SVM optimization: inverse dependence on training set size. In ICML, 2008.
[13] N. Srebro, K. Sridharan, and A. Tewari. Smoothness, low noise and fast rates. In NIPS, 2010.
[14] S.Shalev-Shwartz. Online Learning: Theory, Algorithms, and Applications. PhD thesis, Hebrew University of Jerusalem, 2007.
[15] P. Tseng. On accelerated proximal gradient methods for convex-concave optimization. Submitted to SIAM Journal on Optimization, 2008.
[16] L. Xiao. Dual averaging methods for regularized stochastic learning and online optimization. Journal of Machine Learning Research, 11:2543–2596, 2010. 9