nips nips2009 nips2009-76 nips2009-76-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yoram Singer, John C. Duchi
Abstract: We describe, analyze, and experiment with a new framework for empirical loss minimization with regularization. Our algorithmic framework alternates between two phases. On each iteration we first perform an unconstrained gradient descent step. We then cast and solve an instantaneous optimization problem that trades off minimization of a regularization term while keeping close proximity to the result of the first phase. This yields a simple yet effective algorithm for both batch penalized risk minimization and online learning. Furthermore, the two phase approach enables sparse solutions when used in conjunction with regularization functions that promote sparsity, such as ℓ1 . We derive concrete and very simple algorithms for minimization of loss functions with ℓ1 , ℓ2 , ℓ2 , and ℓ∞ regularization. We 2 also show how to construct efficient algorithms for mixed-norm ℓ1 /ℓq regularization. We further extend the algorithms and give efficient implementations for very high-dimensional data with sparsity. We demonstrate the potential of the proposed framework in experiments with synthetic and natural datasets. 1
[1] D.P. Bertsekas. Nonlinear Programming. Athena Scientific, 1999.
[2] P. Zhao and B. Yu. On model selection consistency of Lasso. Journal of Machine Learning Research, 7:2541–2567, 2006.
[3] N. Meinshausen and P. B¨ hlmann. High dimensional graphs and variable selection with the u Lasso. Annals of Statistics, 34:1436–1462, 2006.
[4] S. Wright, R. Nowak, and M. Figueiredo. Sparse reconstruction by separable approximation. In IEEE International Conference on Acoustics, Speech, and Signal Processing, pages 3373– 3376, 2008.
[5] P. Combettes and V. Wajs. Signal recovery by proximal forward-backward splitting. Multiscale Modeling and Simulation, 4(4):1168–1200, 2005.
[6] J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efficient projections onto the ℓ1 ball for learning in high dimensions. In Proceedings of the 25th International Conference on Machine Learning, 2008.
[7] M. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the Twentieth International Conference on Machine Learning, 2003.
[8] E. Hazan, A. Kalai, S. Kale, and A. Agarwal. Logarithmic regret algorithms for online convex optimization. In Proceedings of the Nineteenth Annual Conference on Computational Learning Theory, 2006.
[9] G. Obozinski, M. Wainwright, and M. Jordan. High-dimensional union support recovery in multivariate regression. In Advances in Neural Information Processing Systems 22, 2008.
[10] J. Langford, L. Li, and T. Zhang. Sparse online learning via truncated gradient. In Advances in Neural Information Processing Systems 22, 2008.
[11] S. Shalev-Shwartz and A. Tewari. Stochastic methods for ℓ1 -regularized loss minimization. In Proceedings of the 26th International Conference on Machine Learning, 2009.
[12] Y. Nesterov. Introductory Lectures on Convex Optimization. Kluwer Academic Publishers, 2004.
[13] J. Duchi and Y. Singer. Efficient online and batch learning using forward-backward splitting. Journal of Machine Learning Research, 10:In Press, 2009.
[14] S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for SVM. In Proceedings of the 24th International Conference on Machine Learning, 2007.
[15] K. Koh, S.J. Kim, and S. Boyd. An interior-point method for large-scale ℓ1 -regularized logistic regression. Journal of Machine Learning Research, 8:1519–1555, 2007.
[16] D. Spiegelhalter and C. Taylor. Machine Learning, Neural and Statistical Classification. Ellis Horwood, 1994.
[17] R.T. Rockafellar. Convex Analysis. Princeton University Press, 1970. 9