nips nips2011 nips2011-45 nips2011-45-reference knowledge-graph by maker-knowledge-mining

45 nips-2011-Beating SGD: Learning SVMs in Sublinear Time


Source: pdf

Author: Elad Hazan, Tomer Koren, Nati Srebro

Abstract: We present an optimization approach for linear SVMs based on a stochastic primal-dual approach, where the primal step is akin to an importance-weighted SGD, and the dual step is a stochastic update on the importance weights. This yields an optimization method with a sublinear dependence on the training set size, and the first method for learning linear SVMs with runtime less then the size of the training set required for learning! 1


reference text

[1] S. Arora, E. Hazan, and S. Kale. The multiplicative weights update method: a meta algorithm and applications. Manuscript, 2005.

[2] L. Bottou and O. Bousquet. The tradeoffs of large scale learning. Advances in neural information processing systems, 20:161–168, 2008.

[3] N. Cesa-Bianchi, A. Conconi, and C. Gentile. On the generalization ability of on-line learning algorithms. Information Theory, IEEE Transactions on, 50(9):2050–2057, 2004.

[4] K.L. Clarkson, E. Hazan, and D.P. Woodruff. Sublinear optimization for machine learning. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 449–457. IEEE, 2010.

[5] K. Deng, C. Bourke, S. Scott, J. Sunderman, and Y. Zheng. Bandit-based algorithms for budgeted learning. In Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on, pages 463–468. IEEE, 2007.

[6] P. Felzenszwalb, D. Mcallester, and D. Ramanan. A discriminatively trained, multiscale, deformable part model. In In IEEE Conference on Computer Vision and Pattern Recognition (CVPR-2008, 2008.

[7] C. Gentile. The robustness of the p-norm algorithms. Machine Learning, 53(3):265–299, 2003.

[8] A. Kapoor and R. Greiner. Learning and classifying under hard budgets. Machine Learning: ECML 2005, pages 170–181, 2005.

[9] S.S. Keerthi and D. DeCoste. A modified finite newton method for fast solution of large scale linear SVMs. Journal of Machine Learning Research, 6(1):341, 2006. ´

[10] S.A. Plotkin, D.B. Shmoys, and E. Tardos. Fast approximation algorithms for fractional packing and covering problems. In Proceedings of the 32nd annual symposium on Foundations of computer science, pages 495–504. IEEE Computer Society, 1991.

[11] A. Rahimi and B. Recht. Random features for large-scale kernel machines. Advances in neural information processing systems, 20:1177–1184, 2008.

[12] S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for SVM. In Proceedings of the 24th international conference on Machine learning, pages 807–814. ACM, 2007.

[13] S. Shalev-Shwartz and N. Srebro. SVM optimization: inverse dependence on training set size. In Proceedings of the 25th international conference on Machine learning, pages 928–935, 2008.

[14] N. Srebro, K. Sridharan, and A. Tewari. Smoothness, low noise and fast rates. In Advances in Neural Information Processing Systems 23, pages 2199–2207. 2010. 9