nips nips2008 nips2008-214 nips2008-214-reference knowledge-graph by maker-knowledge-mining

214 nips-2008-Sparse Online Learning via Truncated Gradient

Source: pdf

Author: John Langford, Lihong Li, Tong Zhang

Abstract: We propose a general method called truncated gradient to induce sparsity in the weights of online-learning algorithms with convex loss. This method has several essential properties. First, the degree of sparsity is continuous—a parameter controls the rate of sparsiﬁcation from no sparsiﬁcation to total sparsiﬁcation. Second, the approach is theoretically motivated, and an instance of it can be regarded as an online counterpart of the popular L1 -regularization method in the batch setting. We prove small rates of sparsiﬁcation result in only small additional regret with respect to typical online-learning guarantees. Finally, the approach works well empirically. We apply it to several datasets and ﬁnd for datasets with large numbers of features, substantial sparsity is discoverable. 1

reference text

[1] A. Asuncion and D.J. Newman. UCI machine learning repository, 2007. UC Irvine.

[2] N. Cesa-Bianchi, P.M. Long, and M. Warmuth. Worst-case quadratic loss bounds for prediction using linear functions and gradient descent. IEEE Transactions on Neural Networks, 7(3):604–619, 1996.

[3] C.-T. Chu, S.K. Kim, Y.-A. Lin, Y. Yu, G. Bradski, A.Y. Ng, and K. Olukotun. Map-reduce for machine learning on multicore. In Advances in Neural Information Processing Systems 20, pages 281–288, 2008.

[4] O. Dekel, S. Shalev-Schwartz, and Y. Singer. The Forgetron: A kernel-based perceptron on a ﬁxed budget. In Advances in Neural Information Processing Systems 18, pages 259–266, 2006.

[5] J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efﬁcient projections onto the in high dimensions. In Proceedings of ICML-08, pages 272–279, 2008. 1 -ball for learning

[6] J. Kivinen and M.K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1–63, 1997.

[7] J. Langford, L. Li, and A.L. Strehl. Vowpal Wabbit (fast online learning), 2007. http://hunch.net/∼vw/.

[8] Honglak Lee, Alexis Batle, Rajat Raina, and Andrew Y. Ng. Efﬁcient sparse coding algorithms. In Advances in Neural Information Processing Systems 19 (NIPS-07), 2007.

[9] D.D. Lewis, Y. Yang, T.G. Rose, and F. Li. RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361–397, 2004.

[10] S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. In Proceedings of ICML-07, pages 807–814, 2007.

[11] K. Sjöstrand. Matlab implementation of LASSO, LARS, the elastic net and SPCA, June 2005. Version 2.0, http://www2.imm.dtu.dk/pubdb/p.php?3897.

[12] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, B., 58(1):267–288, 1996.

[13] T. Zhang. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of ICML-04, pages 919–926, 2004.

[14] M. Zinkevich. Online convex programming and generalized inﬁnitesimal gradient ascent. In Proceedings of ICML-03, pages 928–936, 2003.