nips nips2011 nips2011-72 nips2011-72-reference knowledge-graph by maker-knowledge-mining

72 nips-2011-Distributed Delayed Stochastic Optimization

Source: pdf

Author: Alekh Agarwal, John C. Duchi

Abstract: We analyze the convergence of gradient-based optimization algorithms whose updates depend on delayed stochastic gradient information. The main application of our results is to the development of distributed minimization algorithms where a master node performs parameter updates while worker nodes compute stochastic gradients based on local information in parallel, which may give rise to delays due to asynchrony. Our main contribution is to show that for smooth stochastic problems, the delays are asymptotically negligible. In application to distributed optimization, we show n-node architectures whose optimization error in stochastic problems—in √ spite of asynchronous delays—scales asymptotically as O(1/ nT ), which is known to be optimal even in the absence of delays. 1

reference text

[1] A. Agarwal, P. Bartlett, P. Ravikumar, and M. Wainwright. Information-theoretic lower bounds on the oracle complexity of convex optimization. In Advances in Neural Information Processing Systems 23, 2009.

[2] A. Agarwal and J. C. Duchi. Distributed delayed stochastic optimization. URL http://arxiv.org/abs/1104.5525, 2011.

[3] D. P. Bertsekas. Distributed asynchronous computation of ﬁxed points. Mathematical Programming, 27:107–120, 1983.

[4] D. P. Bertsekas and J. N. Tsitsiklis. Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, Inc., 1989.

[5] O. Dekel, R. Gilad-Bachrach, O. Shamir, and L. Xiao. Optimal distributed online prediction using mini-batches. URL http://arxiv.org/abs/1012.1367, 2010.

[6] O. Dekel, R. Gilad-Bachrach, O. Shamir, and L. Xiao. Robust distributed online prediction. URL http://arxiv.org/abs/1012.1370, 2010.

[7] J. Duchi, A. Agarwal, and M. Wainwright. Dual averaging for distributed optimization: convergence analysis and network scaling. IEEE Transactions on Automatic Control, to appear, 2011.

[8] A. Juditsky, A. Nemirovski, and C. Tauvel. Solving variational inequalities with the stochastic mirror-prox algorithm. URL http://arxiv.org/abs/0809.0815, 2008.

[9] G. Lan. An optimal method for stochastic composite optimization. Mathematical Programming Series A, 2010. Online ﬁrst, to appear. URL http://www.ise.uﬂ.edu/glan/papers/OPT SA4.pdf.

[10] J. Langford, A. Smola, and M. Zinkevich. Slow learners are fast. In Advances in Neural Information Processing Systems 22, pages 2331–2339, 2009.

[11] D. Lewis, Y. Yang, T. Rose, and F. Li. RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361–397, 2004.

[12] A. Nedi´, D.P. Bertsekas, and V.S. Borkar. Distributed asynchronous incremental c subgradient methods. In D. Butnariu, Y. Censor, and S. Reich, editors, Inherently Parallel Algorithms in Feasibility and Optimization and their Applications, volume 8 of Studies in Computational Mathematics, pages 381–407. Elsevier, 2001.

[13] A. Nedi´ and A. Ozdaglar. Distributed subgradient methods for multi-agent optimizac tion. IEEE Transactions on Automatic Control, 54:48–61, 2009.

[14] A. Nemirovski and D. Yudin. Problem Complexity and Method Eﬃciency in Optimization. Wiley, New York, 1983.

[15] Y. Nesterov. Primal-dual subgradient methods for convex problems. Mathematical Programming A, 120(1):261–283, 2009.

[16] B. T. Polyak. Introduction to optimization. Optimization Software, Inc., 1987.

[17] S. S. Ram, A. Nedi´, and V. V. Veeravalli. Distributed stochastic subgradient projection c algorithms for convex optimization. Journal of Optimization Theory and Applications, 147(3):516–545, 2010.

[18] H. Robbins and S. Monro. A stochastic approximation method. Annals of Mathematical Statistics, 22:400–407, 1951.

[19] J. Tsitsiklis. Problems in decentralized decision making and computation. PhD thesis, Massachusetts Institute of Technology, 1984.

[20] L. Xiao. Dual averaging methods for regularized stochastic learning and online optimization. Journal of Machine Learning Research, 11:2543–2596, 2010. 9