nips nips2009 nips2009-220 nips2009-220-reference knowledge-graph by maker-knowledge-mining

220 nips-2009-Slow Learners are Fast

Source: pdf

Author: Martin Zinkevich, John Langford, Alex J. Smola

Abstract: Online learning algorithms have impressive convergence properties when it comes to risk minimization and convex games on very large problems. However, they are inherently sequential in their design which prevents them from taking advantage of modern multi-core architectures. In this paper we prove that online learning with delayed updates converges well, thereby facilitating parallel online learning. 1

reference text

[1] Peter L. Bartlett, Elad Hazan, and Alexander Rakhlin. Adaptive online gradient descent. In J. C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, Cambridge, MA, 2008. MIT Press.

[2] L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In J. C. Platt, D. Koller, Y. Singer, and S.T. Roweis, editors, NIPS. MIT Press, 2007.

[3] L´ on Bottou and Yann LeCun. Large scale online learning. In S. Thrun, L. Saul, and e B. Sch¨ lkopf, editors, Advances in Neural Information Processing Systems 16, pages 217– o 224, Cambridge, MA, 2004. MIT Press.

[4] C.T. Chu, S.K. Kim, Y. A. Lin, Y. Y. Yu, G. Bradski, A. Ng, and K. Olukotun. Map-reduce for machine learning on multicore. In B. Sch¨ lkopf, J. Platt, and T. Hofmann, editors, Advances o in Neural Information Processing Systems 19, 2007.

[5] G. Cormack. TREC 2007 spam track overview. In The Sixteenth Text REtrieval Conference (TREC 2007) Proceedings, 2007.

[6] O. Delalleau and Y. Bengio. Parallel stochastic gradient descent, 2007. CIAR Summer School, Toronto.

[7] C.B. Do, Q.V. Le, and C.-S. Foo. Proximal regularization for online and batch learning. In A.P. Danyluk, L. Bottou, and M. L. Littman, editors, Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14-18, 2009, volume 382, page 33. ACM, 2009.

[8] Elad Hazan, Amit Agarwal, and Satyen Kale. Logarithmic regret algorithms for online convex optimization. Machine Learning, 69(2-3):169–192, 2007.

[9] P. J. Huber. Robust Statistics. John Wiley and Sons, New York, 1981.

[10] J. Langford, L. Li, and A. Strehl. Vowpal wabbit online learning project, 2007. http://hunch.net/?p=309.

[11] J. Langford, A.J. Smola, and M. Zinkevich. Slow learners are fast. arXiv:0911.0491.

[12] N. Murata, S. Yoshizawa, and S. Amari. Network information criterion — determining the number of hidden units for artiﬁcial neural network models. IEEE Transactions on Neural Networks, 5:865–872, 1994.

[13] Y. Nesterov and J.-P. Vial. Conﬁdence level solutions for stochastic programming. Technical Report 2000/13, Universit´ Catholique de Louvain - Center for Operations Research and e Economics, 2000.

[14] N. Ratliff, J. Bagnell, and M. Zinkevich. Maximum margin planning. In International Conference on Machine Learning, July 2006.

[15] N. Ratliff, J. Bagnell, and M. Zinkevich. (online) subgradient methods for structured prediction. In Eleventh International Conference on Artiﬁcial Intelligence and Statistics (AIStats), March 2007.

[16] Shai Shalev-Shwartz, Yoram Singer, and Nathan Srebro. Pegasos: Primal estimated subgradient solver for SVM. In Proc. Intl. Conf. Machine Learning, 2007.

[17] Choon Hui Teo, S. V. N. Vishwanthan, Alex J. Smola, and Quoc V. Le. Bundle methods for regularized risk minimization. J. Mach. Learn. Res., 2009. Submitted in February 2009.

[18] S. V. N. Vishwanathan, Nicol N. Schraudolph, Mark Schmidt, and Kevin Murphy. Accelerated training conditional random ﬁelds with stochastic gradient methods. In Proc. Intl. Conf. Machine Learning, pages 969–976, New York, NY, USA, 2006. ACM Press.

[19] M. Weimer, A. Karatzoglou, Q. Le, and A. Smola. Coﬁ rank - maximum margin matrix factorization for collaborative ranking. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20. MIT Press, Cambridge, MA, 2008.

[20] K. Weinberger, A. Dasgupta, J. Attenberg, J. Langford, and A.J. Smola. Feature hashing for large scale multitask learning. In L. Bottou and M. Littman, editors, International Conference on Machine Learning, 2009.

[21] M. Zinkevich. Online convex programming and generalised inﬁnitesimal gradient ascent. In Proc. Intl. Conf. Machine Learning, pages 928–936, 2003. 9