jmlr jmlr2012 jmlr2012-18 jmlr2012-18-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Guo-Xun Yuan, Chia-Hua Ho, Chih-Jen Lin
Abstract: Recently, Yuan et al. (2010) conducted a comprehensive comparison on software for L1-regularized classification. They concluded that a carefully designed coordinate descent implementation CDN is the fastest among state-of-the-art solvers. In this paper, we point out that CDN is less competitive on loss functions that are expensive to compute. In particular, CDN for logistic regression is much slower than CDN for SVM because the logistic loss involves expensive exp/log operations. In optimization, Newton methods are known to have fewer iterations although each iteration costs more. Because solving the Newton sub-problem is independent of the loss calculation, this type of methods may surpass CDN under some circumstances. In L1-regularized classification, GLMNET by Friedman et al. is already a Newton-type method, but experiments in Yuan et al. (2010) indicated that the existing GLMNET implementation may face difficulties for some largescale problems. In this paper, we propose an improved GLMNET to address some theoretical and implementation issues. In particular, as a Newton-type method, GLMNET achieves fast local convergence, but may fail to quickly obtain a useful solution. By a careful design to adjust the effort for each iteration, our method is efficient for both loosely or strictly solving the optimization problem. Experiments demonstrate that our improved GLMNET is more efficient than CDN for L1-regularized logistic regression. Keywords: L1 regularization, linear classification, optimization methods, logistic regression, support vector machines
Galen Andrew and Jianfeng Gao. Scalable training of L1-regularized log-linear models. In Proceedings of the Twenty Fourth International Conference on Machine Learning (ICML), 2007. Amir Beck and Marc Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1):183–202, 2009. Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software available at http://www.csie.ntu.edu.tw/˜cjlin/libsvm. Kai-Wei Chang, Cho-Jui Hsieh, and Chih-Jen Lin. Coordinate descent method for large-scale L2loss linear SVM. Journal of Machine Learning Research, 9:1369–1398, 2008. URL http: //www.csie.ntu.edu.tw/˜cjlin/papers/cdl2.pdf. Ingrid Daubechies, Michel Defrise, and Christine De Mol. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics, 57:1413–1457, 2004. Laurent El Ghaoui, Vivian Viallon, and Tarek Rabbani. Safe feature elimination in sparse supervised learning. Technical report, EECS Department, University of California, Berkeley, 2010. 2028 A N I MPROVED GLMNET FOR L1- REGULARIZED L OGISTIC R EGRESSION Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871–1874, 2008. URL http://www.csie.ntu.edu.tw/˜cjlin/papers/liblinear.pdf. Jerome H. Friedman, Trevor Hastie, and Robert Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1):1–22, 2010. Alexandar Genkin, David D. Lewis, and David Madigan. Large-scale Bayesian logistic regression for text categorization. Technometrics, 49(3):291–304, 2007. Cho-Jui Hsieh, Kai-Wei Chang, Chih-Jen Lin, S. Sathiya Keerthi, and Sellamanickam Sundararajan. A dual coordinate descent method for large-scale linear SVM. In Proceedings of the Twenty Fifth International Conference on Machine Learning (ICML), 2008. URL http://www.csie.ntu. edu.tw/˜cjlin/papers/cddual.pdf. Cho-Jui Hsieh, Matyas A. Sustik, Pradeep Ravikumar, and Inderjit S. Dhillon. Sparse inverse covariance matrix estimation using quadratic approximation. In Advances in Neural Information Processing Systems (NIPS) 24, 2011. Fang-Lan Huang, Cho-Jui Hsieh, Kai-Wei Chang, and Chih-Jen Lin. Iterative scaling and coordinate descent methods for maximum entropy. Journal of Machine Learning Research, 11:815– 848, 2010. URL http://www.csie.ntu.edu.tw/˜cjlin/papers/maxent_journal.pdf. Thorsten Joachims. Making large-scale SVM learning practical. In Bernhard Sch¨ lkopf, Christoo pher J. C. Burges, and Alexander J. Smola, editors, Advances in Kernel Methods – Support Vector Learning, pages 169–184, Cambridge, MA, 1998. MIT Press. Kwangmoo Koh, Seung-Jean Kim, and Stephen Boyd. An interior-point method for large-scale l1-regularized logistic regression. Journal of Machine Learning Research, 8:1519–1555, 2007. URL http://www.stanford.edu/˜boyd/l1_logistic_reg.html. Su-In Lee, Honglak Lee, Pieter Abbeel, and Andrew Y. Ng. Efficient l1 regularized logistic regression. In Proceedings of the Twenty-first National Conference on Artificial Intelligence (AAAI-06), pages 1–9, Boston, MA, USA, July 2006. Chih-Jen Lin and Jorge J. Mor´ . Newton’s method for large-scale bound constrained problems. e SIAM Journal on Optimization, 9:1100–1127, 1999. Jun Liu, Jianhui Chen, and Jieping Ye. Large-scale sparse logistic regression. In Proceedings of The 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 547–556, 2009. Olvi L. Mangasarian. A finite Newton method for classification. Optimization Methods and Software, 17(5):913–929, 2002. Mark Schmidt, Glenn Fung, and Romer Rosales. Optimization methods for l1-regularization. Technical Report TR-2009-19, University of British Columbia, 2009. Nicol N. Schraudolph. A fast, compact approximation of the exponential function. Neural Computation, 11:853–862, 1999. 2029 Y UAN , H O AND L IN Shai Shalev-Shwartz and Ambuj Tewari. Stochastic methods for l1 regularized loss minimization. In Proceedings of the Twenty Sixth International Conference on Machine Learning (ICML), 2009. Shirish Krishnaj Shevade and S. Sathiya Keerthi. A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics, 19(17):2246–2253, 2003. Jianing Shi, Wotao Yin, Stanley Osher, and Paul Sajda. A fast hybrid algorithm for large scale l1-regularized logistic regression. Journal of Machine Learning Research, 11:713–741, 2010. Choon Hui Teo, S.V.N. Vishwanathan, Alex Smola, and Quoc V. Le. Bundle methods for regularized risk minimization. Journal of Machine Learning Research, 11:311–365, 2010. Robert Tibshirani, Jacob Bien, Jerome Friedman, Trevor Hastie, Noah Simon, Jonatha Taylor, and Ryan J. Tibshirani. Strong rules for discarding predictors in lasso-type problems. Journal of Royal Statistical Society: Series B, 2011. Paul Tseng and Sangwoon Yun. A coordinate gradient descent method for nonsmooth separable minimization. Mathematical Programming, 117:387–423, 2009. Guo-Xun Yuan, Kai-Wei Chang, Cho-Jui Hsieh, and Chih-Jen Lin. A comparison of optimization methods and software for large-scale l1-regularized linear classification. Journal of Machine Learning Research, 11:3183–3234, 2010. URL http://www.csie.ntu.edu.tw/˜cjlin/ papers/l1.pdf. Guo-Xun Yuan, Chia-Hua Ho, and Chih-Jen Lin. An improved GLMNET for l1-regularized logistic regression. In Proceedings of the Seventeenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 33–41, 2011. 2030