jmlr jmlr2012 jmlr2012-107 jmlr2012-107-reference knowledge-graph by maker-knowledge-mining

107 jmlr-2012-Smoothing Multivariate Performance Measures

Source: pdf

Author: Xinhua Zhang, Ankan Saha, S.V.N. Vishwanathan

Abstract: Optimizing multivariate performance measure is an important task in Machine Learning. Joachims (2005) introduced a Support Vector Method whose underlying optimization problem is commonly solved by cutting plane methods (CPMs) such as SVM-Perf and BMRM. It can be shown that CPMs 1 converge to an ε accurate solution in O λε iterations, where λ is the trade-off parameter between the regularizer and the loss function. Motivated by the impressive convergence rate of CPM on a number of practical problems, it was conjectured that these rates can be further improved. We disprove this conjecture in this paper by constructing counter examples. However, surprisingly, we further discover that these problems are not inherently hard, and we develop a novel smoothing strategy, which in conjunction with Nesterov’s accelerated gradient method, can ﬁnd an ε accuiterations. Computationally, our smoothing technique is also rate solution in O∗ min 1 , √1 ε λε particularly advantageous for optimizing multivariate performance scores such as precision/recall break-even point and ROCArea; the cost per iteration remains the same as that of CPMs. Empirical evaluation on some of the largest publicly available data sets shows that our method converges signiﬁcantly faster than CPMs without sacriﬁcing generalization ability. Keywords: non-smooth optimization, max-margin methods, multivariate performance measures, Support Vector Machines, smoothing

reference text

Alekh Agarwal, Peter Bartlett, Pradeep Ravikumar, and Martin Wainwright. Information-theoretic lower bounds on the oracle complexity of convex optimization. In Neural Information Processing Systems, 2009. 3677 Z HANG , S AHA AND V ISHWANATHAN Satish Balay, Jed Brown, Kris Buschelman, William Gropp, Dinesh Kaushik, Matthew G. Knepley, Lois Curfman McInnes, Barry F. Smith, and Hong Zhang. PETSc Web page, 2011. http://www.mcs.anl.gov/petsc. Peter Bartlett, Michael Jordan, and Jon McAuliffe. Convexity, classiﬁcation, and risk bounds. Journal of the American Statistical Association, 101(473):138–156, 2006. Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31(3):167–175, 2003. Amir Beck and Marc Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1):183–202, 2009. Steve Benson, Lois Curfman McInnes, Jorge Mor´ , Todd Munson, and Jason Sarich. TAO user e manual (revision 1.10.1). Technical Report ANL/MCS-TM-242, Mathematics and Computer Science Division, Argonne National Laboratory, 2010. http://www.mcs.anl.gov/tao. L´ on Bottou. Stochastic gradient SVMs. http://leon.bottou.org/projects/sgd, 2008. e Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, Cambridge, England, 2004. Olivier Chapelle. Training a support vector machine in the primal. Neural Computation, 19(5): 1155–1178, 2007. Chuong Do, Quoc Le, and Chuan-Sheng Foo. Proximal regularization for online and batch learning. In International Conference on Machine Learning ICML, 2009. Michael Ferris and Todd Munson. Interior-point methods for massive support vector machines. SIAM Journal on Optimization, 13(3):783–804, 2002. Vojtˇ ch Franc and S¨ ren Sonnenburg. Optimized cutting plane algorithm for support vector mae o chines. In Andrew McCallum and Sam Roweis, editors, ICML, pages 320–327. Omnipress, 2008. Andrew Frank and Arthur Asuncion. //archive.ics.uci.edu/ml. UCI machine learning repository, 2010. URL http: Jean-Baptiste Hiriart-Urruty and Claude Lemar´ chal. Convex Analysis and Minimization Algoe rithms, I and II, volume 305 and 306. Springer-Verlag, 1996. Cho Jui Hsieh, Kai Wei Chang, Chih Jen Lin, Sathiya Keerthi, and Sundararajan Sellamanickam. A dual coordinate descent method for large-scale linear SVM. In William Cohen, Andrew McCallum, and Sam Roweis, editors, ICML, pages 408–415. ACM, 2008. Thorsten Joachims. A support vector method for multivariate performance measures. In Proc. Intl. Conf. Machine Learning, pages 377–384, San Francisco, California, 2005. Morgan Kaufmann Publishers. Thorsten Joachims. Training linear SVMs in linear time. In Proc. ACM Conf. Knowledge Discovery and Data Mining (KDD). ACM, 2006. 3678 S MOOTHING M ULTIVARIATE P ERFORMANCE M EASURES Thorsten Joachims, Thomas Finley, and Chun-Nam John Yu. Cutting-plane training of structural SVMs. Machine Learning, 77(1):27–59, 2009. Nikolas List and Hans Ulrich Simon. Svm-optimization and steepest-descent line search. In Sanjoy Dasgupta and Adam Klivans, editors, Proc. Annual Conf. Computational Learning Theory, LNCS. Springer, 2009. Arkadi Nemirovski and David Yudin. Problem Complexity and Method Efﬁciency in Optimization. John Wiley and Sons, 1983. Yurii Nesterov. A method for unconstrained convex minimization problem with the rate of convergence O(1/k2 ). Soviet Math. Docl., 269:543–547, 1983. Yurii Nesterov. Introductory Lectures On Convex Optimization: A Basic Course. Springer, 2003. Yurii Nesterov. Smooth minimization of non-smooth functions. Math. Program., 103(1):127–152, 2005. Yurii Nesterov. Gradient methods for minimizing composite objective function. Technical Report 76, CORE Discussion Paper, UCL, 2007. Jorge Nocedal and Stephen Wright. Numerical Optimization. Springer Series in Operations Research. Springer, 1999. Jorge Nocedal and Stephen Wright. Numerical Optimization. Springer Series in Operations Research. Springer, 2nd edition, 2006. John Platt. Sequential minimal optimization: A fast algorithm for training support vector machines. Technical Report MSR-TR-98-14, Microsoft Research, 1998. Bernhard Sch¨ lkopf and Alexander Smola. Learning with Kernels. MIT Press, Cambridge, MA, o 2002. Shai Shalev-Shwartz, Yoram Singer, and Nathan Srebro. Pegasos: Primal estimated sub-gradient solver for SVM. In Proc. Intl. Conf. Machine Learning, 2007. Soeren Sonnenburg, Vojtech Franc, Elad Yom-Tov, and Michele Sebag. Pascal large scale learning challenge. 2008. URL http://largescale.ml.tu-berlin.de/workshop/. S¨ ren Sonnenburg and Vojtˇ ch Franc. COFFIN: A computational framework for linear SVMs. In o e Proceedings of the International Conference on Machine Learning, Haifa, 2010. Choon Hui Teo, S. V. N. Vishwanthan, Alexander Smola, and Quoc Le. Bundle methods for regularized risk minimization. Journal of Machine Learning Research, 11:311–365, January 2010. Tong Zhang. Statistical behavior and consistency of classiﬁcation methods based on convex risk minimization. Ann. Statist., 32(1):56–85, 2004. Xinhua Zhang, Ankan Saha, and S. V. N. Vishwanathan. Regularized risk minimization by Nesterov’s accelerated gradient methods: Algorithmic extensions and empirical studies. Technical report arXiv:1011.0472, 2010. URL http://arxiv.org/abs/1011.0472. 3679 Z HANG , S AHA AND V ISHWANATHAN Xinhua Zhang, Ankan Saha, and S. V. N. Vishwanathan. Lower bounds on rate of convergence of cutting plane methods. In Advances in Neural Information Processing Systems 23, 2011a. Xinhua Zhang, Ankan Saha, and S. V. N. Vishwanathan. Smoothing multivariate performance measures. In Proceedings of UAI, 2011b. Xinhua Zhang, Ankan Saha, and S. V. N. Vishwanathan. Smoothopt, 2012. URL http://webdocs. cs.ualberta.ca/˜xinhua2/SmoothOPT. Tianyi Zhou, Dacheng Tao, and Xindong Wu. NESVM: a fast gradient method for support vector machines. In Proc. Intl. Conf. Data Mining, 2010. 3680