nips nips2009 nips2009-33 nips2009-33-reference knowledge-graph by maker-knowledge-mining

33 nips-2009-Analysis of SVM with Indefinite Kernels

Source: pdf

Author: Yiming Ying, Colin Campbell, Mark Girolami

Abstract: The recent introduction of indeﬁnite SVM by Luss and d’Aspremont [15] has effectively demonstrated SVM classiﬁcation with a non-positive semi-deﬁnite kernel (indeﬁnite kernel). This paper studies the properties of the objective function introduced there. In particular, we show that the objective function is continuously differentiable and its gradient can be explicitly computed. Indeed, we further show that its gradient is Lipschitz continuous. The main idea behind our analysis is that the objective function is smoothed by the penalty term, in its saddle (min-max) representation, measuring the distance between the indeﬁnite kernel matrix and the proxy positive semi-deﬁnite one. Our elementary result greatly facilitates the application of gradient-based algorithms. Based on our analysis, we further develop Nesterov’s smooth optimization approach [17, 18] for indeﬁnite SVM which has an optimal convergence rate for smooth problems. Experiments on various benchmark datasets validate our analysis and demonstrate the efﬁciency of our proposed algorithms.

reference text

[1] R. Bhatia. Matrix analysis. Graduate texts in Mathematics. Springer, 1997.

[2] S. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press, 2004.

[3] J. F. Bonnans and A. Shapiro. Optimization problems with perturbation: A guided tour. SIAM Review, 40: 202–227, 1998.

[4] J. Chen and J. Ye. Training SVM with Indeﬁnite Kernels. ICML, 2008.

[5] N. Cristianini and J. Shawe-Taylor. An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, 2000.

[6] A. d’Aspremont, O. Banerjee and L. El Ghaoui. First-order methods for sparse covariance selection. SIAM Journal on Matrix Analysis and its Applications, 30: 56–66, 2007.

[7] J.M. Danskin. The theory of max-min and its applications to weapons allocation problems, Springer-Verlag, New York, 1967.

[8] T. Graepel, R. Herbrich, P. Bollmann-Sdorra, and K. Obermayer. Classiﬁcation on pairwise proximity data. NIPS, 1998.

[9] B. Haasdonk. Feature space interpretation of SVMs with indeﬁnite kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27: 482–492, 2005.

[10] R. A. Horn and C. R. Johnson. Topics in Matrix Analysis. Cambridge University Press, 1991.

[11] R. I. Kondor and J. Laffferty. Diffusion kernels on graphs and other discrete input spaces. ICML, 2002.

[12] C. Lemar´ chal and C. Sagastiz´ bal. Practical aspects of the Moreau-Yosida regularization: e a theoretical preliminaries. SIAM Journal on Optimization, 7: 367–385, 1997.

[13] G. R. G. Lanckriet, N. Cristianini, N., P. L. Bartlett, L. E. Ghaoui, and M. I. Jordan. Learning the kernel matrix with semideﬁnite programming. J. of Machine Learning Research, 5: 27– 72, 2004.

[14] H.-T. Lin and C. J. Lin. A study on sigmoid kernels for SVM and the training of non-psd kernels by smo-type methods. Technical Report, National Taiwan University, 2003.

[15] R. Luss and A. d’Aspremont. Support vector machine classiﬁcation with indeﬁnite kernels. NIPS, 2007.

[16] A. Nemirovski. Efﬁcient methods in convex programming. Lecture Notes, 1994.

[17] Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course. Springer, 2003.

[18] Y. Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 103:127–152, 2005.

[19] D. Newman, S. Hettich, C. Blake, and C. Merz. UCI repository of machine learning datasets. 1998.

[20] C. S. Ong, X. Mary, S. Canu, and A. J. Smola. Learning with non-positive kernels. ICML, 2004.

[21] E. Pekalska, P. Paclik, and R. P. W. Duin. A generalized kernel approach to dissimilaritybased classiﬁcation. J. of Machine Learning Research, 2: 175–211, 2002.

[22] V. Roth, J. Laub, M. Kawanabe, and J. M. Buhmann. Optimal cluster preserving embedding of nonmetric proximity data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25:1540–1551, 2003.

[23] H. Saigo, J.P.Vert and N. Ueda, and T. Akutsu. Protein homology detection using string alignment kernels. Bioinformatics, 20: 1682–1689., 2004.

[24] B. Sch¨ lkopf, and A.J. Smola. Learning with kernels: Support vector machines, regularizao tion, optimization, and beyond. The MIT Press, 2001. ´v

[25] A. J. Smola, Z. L. O´ ari, and R. C. Williamson. Regularization with dot-product kernels. NIPS, 2000.

[26] G. Wu, Z. Zhang, and E. Y. Chang. An analysis of transformation on non-positive semideﬁnite similarity matrix for kernel machines. Technical Report, UCSB, 2005. 9