jmlr jmlr2012 jmlr2012-54 jmlr2012-54-reference knowledge-graph by maker-knowledge-mining

54 jmlr-2012-Large-scale Linear Support Vector Regression

Source: pdf

Author: Chia-Hua Ho, Chih-Jen Lin

Abstract: Support vector regression (SVR) and support vector classiﬁcation (SVC) are popular learning techniques, but their use with kernels is often time consuming. Recently, linear SVC without kernels has been shown to give competitive accuracy for some applications, but enjoys much faster training/testing. However, few studies have focused on linear SVR. In this paper, we extend state-of-theart training methods for linear SVC to linear SVR. We show that the extension is straightforward for some methods, but is not trivial for some others. Our experiments demonstrate that for some problems, the proposed linear-SVR training methods can very efﬁciently produce models that are as good as kernel SVR. Keywords: support vector regression, Newton methods, coordinate descent methods

reference text

Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. The million song dataset. In In Proceedings of the Twelfth International Society for Music Information Retrieval Conference (ISMIR 2011), 2011. Bernhard E. Boser, Isabelle Guyon, and Vladimir Vapnik. A training algorithm for optimal margin classiﬁers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pages 144–152. ACM Press, 1992. Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software available at http://www.csie.ntu.edu.tw/˜cjlin/libsvm. Olivier Chapelle. Training a support vector machine in the primal. Neural Computation, 19(5): 1155–1178, 2007. Corina Cortes and Vladimir Vapnik. Support-vector network. Machine Learning, 20:273–297, 1995. Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. LIBLINEAR: A library for large linear classiﬁcation. Journal of Machine Learning Research, 9:1871–1874, 2008. URL http://www.csie.ntu.edu.tw/˜cjlin/papers/liblinear.pdf. Andrew Frank and Arthur Asuncion. //archive.ics.uci.edu/ml. UCI machine learning repository, 2010. URL http: Arthur E. Hoerl and Robert W. Kennard. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1):55–67, 1970. Cho-Jui Hsieh, Kai-Wei Chang, Chih-Jen Lin, S. Sathiya Keerthi, and Sellamanickam Sundararajan. A dual coordinate descent method for large-scale linear SVM. In Proceedings of the Twenty Fifth International Conference on Machine Learning (ICML), 2008. URL http://www.csie.ntu. edu.tw/˜cjlin/papers/cddual.pdf. Thorsten Joachims. Making large-scale SVM learning practical. In Bernhard Sch¨ lkopf, Christoo pher J. C. Burges, and Alexander J. Smola, editors, Advances in Kernel Methods – Support Vector Learning, pages 169–184, Cambridge, MA, 1998. MIT Press. Thorsten Joachims. Training linear SVMs in linear time. In Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006. 3347 H O AND L IN S. Sathiya Keerthi and Dennis DeCoste. A modiﬁed ﬁnite Newton method for fast solution of large scale linear SVMs. Journal of Machine Learning Research, 6:341–361, 2005. Shimon Kogan, Dimitry Levin, Bryan R. Routledge, Jacob S. Sagi, and Noah A. Smith. Predicting risk from ﬁnancial reports with regression. In In Proceedings of the North American Association for Computational Linguistics Human Language Technologies Conference, pages 272–280, 2009. Shuo-Peng Liao, Hsuan-Tien Lin, and Chih-Jen Lin. A note on the decomposition methods for support vector regression. Neural Computation, 14:1267–1281, 2002. Chih-Jen Lin and Jorge J. Mor´ . Newton’s method for large-scale bound constrained problems. e SIAM Journal on Optimization, 9:1100–1127, 1999. Chih-Jen Lin, Ruby C. Weng, and S. Sathiya Keerthi. Trust region Newton method for largescale logistic regression. Journal of Machine Learning Research, 9:627–650, 2008. URL http: //www.csie.ntu.edu.tw/˜cjlin/papers/logistic.pdf. Olvi L. Mangasarian. A ﬁnite Newton method for classiﬁcation. Optimization Methods and Software, 17(5):913–929, 2002. Yurii E. Nesterov. Efﬁciency of coordinate descent methods on huge-scale optimization problems. Technical report, CORE Discussion Paper, Universit´ Catholique de Louvain, Louvain-lae Neuve, Louvain, Belgium, 2010. URL http://www.ucl.be/cps/ucl/doc/core/documents/ coredp2010_2web.pdf. John C. Platt. Fast training of support vector machines using sequential minimal optimization. In Bernhard Sch¨ lkopf, Christopher J. C. Burges, and Alexander J. Smola, editors, Advances in o Kernel Methods - Support Vector Learning, Cambridge, MA, 1998. MIT Press. Peter Richt´ rik and Martin Tak´ c. Iteration complexity of randomized block-coordinate descent a aˇ methods for minimizing a composite function. Technical report, School of Mathematics, University of Edinburgh, 2011. Shai Shalev-Shwartz and Ambuj Tewari. Stochastic methods for l1-regularized loss minimization. Journal of Machine Learning Research, 12:1865–1892, 2011. Shai Shalev-Shwartz, Yoram Singer, and Nathan Srebro. Pegasos: primal estimated sub-gradient solver for SVM. In Proceedings of the Twenty Fourth International Conference on Machine Learning (ICML), 2007. Paul Tseng and Sangwoon Yun. A coordinate gradient descent method for nonsmooth separable minimization. Mathematical Programming, 117:387–423, 2009. Vladimir Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, NY, 1995. Guo-Xun Yuan, Kai-Wei Chang, Cho-Jui Hsieh, and Chih-Jen Lin. A comparison of optimization methods and software for large-scale l1-regularized linear classiﬁcation. Journal of Machine Learning Research, 11:3183–3234, 2010. URL http://www.csie.ntu.edu.tw/˜cjlin/ papers/l1.pdf. 3348