nips nips2010 nips2010-63 nips2010-63-reference knowledge-graph by maker-knowledge-mining

63 nips-2010-Distributed Dual Averaging In Networks

Source: pdf

Author: Alekh Agarwal, Martin J. Wainwright, John C. Duchi

Abstract: The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. We develop and analyze distributed algorithms based on dual averaging of subgradients, and provide sharp bounds on their convergence rates as a function of the network size and topology. Our analysis clearly separates the convergence of the optimization algorithm itself from the effects of communication constraints arising from the network structure. We show that the number of iterations required by our algorithm scales inversely in the spectral gap of the network. The sharpness of this prediction is conﬁrmed both by theoretical lower bounds and simulations for various networks. 1

reference text

[1] D. P. Bertsekas and J. N. Tsitsiklis. Parallel and distributed computation: numerical methods. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1989.

[2] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah. Randomized gossip algorithms. IEEE Transactions on Information Theory, 52(6):2508–2530, 2006.

[3] F.R.K. Chung. Spectral Graph Theory. AMS, 1998.

[4] J. Duchi, A. Agarwal, and M. Wainwright. Dual averaging for distributed optimization: convergence analysis and network scaling. URL http://arxiv.org/abs/1005.2012, 2010.

[5] J. Friedman, J. Kahn, and E. Szemer´ di. On the second eigenvalue of random regular graphs. e In Proceedings of the Twenty First Annual ACM Symposium on Theory of Computing, pages 587–598, New York, NY, USA, 1989. ACM.

[6] R. Gray. Toeplitz and circulant matrices: A review. Foundations and Trends in Communications and Information Theory, 2(3):155–239, 2006.

[7] P. Gupta and P. R. Kumar. The capacity of wireless networks. IEEE Transactions on Information Theory, 46(2):388–404, 2000.

[8] J. Hiriart-Urruty and C. Lemar´ chal. Convex Analysis and Minimization Algorithms I. e Springer, 1996.

[9] J. Hiriart-Urruty and C. Lemar´ chal. Convex Analysis and Minimization Algorithms II. e Springer, 1996.

[10] R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, 1985.

[11] B. Johansson, M. Rabi, and M. Johansson. A randomized incremental subgradient method for distributed optimization in networked systems. SIAM Journal on Optimization, 20(3):1157– 1170, 2009.

[12] A. Kalai and S. Vempala. Efﬁcient algorithms for online decision problems. Journal of Computer and System Sciences, 71(3):291–307, 2005.

[13] V. Lesser, C. Ortiz, and M. Tambe, editors. Distributed Sensor Networks: A Multiagent Perspective, volume 9. Kluwer Academic Publishers, May 2003.

[14] D. Levin, Y. Peres, and E. Wilmer. Markov Chains and Mixing Times. American Mathematical Society, 2008.

[15] I. Lobel and A. Ozdaglar. Distributed subgradient methods over random networks. Technical Report 2800, MIT LIDS, 2008.

[16] R. McDonald, K. Hall, and G. Mann. Distributed training strategies for the structured perceptron. In North American Chapter of the Association for Computational Linguistics (NAACL), 2010.

[17] A. Nedic and D. P. Bertsekas. Incremental subgradient methods for nondifferentiable optimization. SIAM Journal on Optimization, 12(1):109–138, 2001.

[18] A. Nedic and A. Ozdaglar. Distributed subgradient methods for multi-agent optimization. IEEE Transactions on Automatic Control, 54:48–61, 2009.

[19] A. Nemirovski and D. Yudin. Problem Complexity and Method Efﬁciency in Optimization. Wiley, New York, 1983.

[20] Y. Nesterov. Primal-dual subgradient methods for convex problems. Mathematical Programming A, 120(1):261–283, 2009.

[21] S. Sundhar Ram, A. Nedic, and V. V. Veeravalli. Distributed subgradient projection algorithm for convex optimization. In IEEE International Conference on Acoustics, Speech, and Signal Processing, pages 3653–3656, 2009.

[22] J. Tsitsiklis. Problems in decentralized decision making and computation. PhD thesis, Massachusetts Institute of Technology, 1984.

[23] U. von Luxburg, A. Radl, and M. Hein. Hitting times, commute distances, and the spectral gap for large random geometric graphs. URL http://arxiv.org/abs/1003.1266, 2010.

[24] L. Xiao, S. Boyd, and S. J. Kim. Distributed average consensus with least-mean-square deviation. Journal of Parallel and Distributed Computing, 67(1):33–46, 2007. 9