nips nips2009 nips2009-33 nips2009-33-reference knowledge-graph by maker-knowledge-mining

33 nips-2009-Analysis of SVM with Indefinite Kernels


Source: pdf

Author: Yiming Ying, Colin Campbell, Mark Girolami

Abstract: The recent introduction of indefinite SVM by Luss and d’Aspremont [15] has effectively demonstrated SVM classification with a non-positive semi-definite kernel (indefinite kernel). This paper studies the properties of the objective function introduced there. In particular, we show that the objective function is continuously differentiable and its gradient can be explicitly computed. Indeed, we further show that its gradient is Lipschitz continuous. The main idea behind our analysis is that the objective function is smoothed by the penalty term, in its saddle (min-max) representation, measuring the distance between the indefinite kernel matrix and the proxy positive semi-definite one. Our elementary result greatly facilitates the application of gradient-based algorithms. Based on our analysis, we further develop Nesterov’s smooth optimization approach [17, 18] for indefinite SVM which has an optimal convergence rate for smooth problems. Experiments on various benchmark datasets validate our analysis and demonstrate the efficiency of our proposed algorithms.


reference text

[1] R. Bhatia. Matrix analysis. Graduate texts in Mathematics. Springer, 1997.

[2] S. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press, 2004.

[3] J. F. Bonnans and A. Shapiro. Optimization problems with perturbation: A guided tour. SIAM Review, 40: 202–227, 1998.

[4] J. Chen and J. Ye. Training SVM with Indefinite Kernels. ICML, 2008.

[5] N. Cristianini and J. Shawe-Taylor. An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, 2000.

[6] A. d’Aspremont, O. Banerjee and L. El Ghaoui. First-order methods for sparse covariance selection. SIAM Journal on Matrix Analysis and its Applications, 30: 56–66, 2007.

[7] J.M. Danskin. The theory of max-min and its applications to weapons allocation problems, Springer-Verlag, New York, 1967.

[8] T. Graepel, R. Herbrich, P. Bollmann-Sdorra, and K. Obermayer. Classification on pairwise proximity data. NIPS, 1998.

[9] B. Haasdonk. Feature space interpretation of SVMs with indefinite kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27: 482–492, 2005.

[10] R. A. Horn and C. R. Johnson. Topics in Matrix Analysis. Cambridge University Press, 1991.

[11] R. I. Kondor and J. Laffferty. Diffusion kernels on graphs and other discrete input spaces. ICML, 2002.

[12] C. Lemar´ chal and C. Sagastiz´ bal. Practical aspects of the Moreau-Yosida regularization: e a theoretical preliminaries. SIAM Journal on Optimization, 7: 367–385, 1997.

[13] G. R. G. Lanckriet, N. Cristianini, N., P. L. Bartlett, L. E. Ghaoui, and M. I. Jordan. Learning the kernel matrix with semidefinite programming. J. of Machine Learning Research, 5: 27– 72, 2004.

[14] H.-T. Lin and C. J. Lin. A study on sigmoid kernels for SVM and the training of non-psd kernels by smo-type methods. Technical Report, National Taiwan University, 2003.

[15] R. Luss and A. d’Aspremont. Support vector machine classification with indefinite kernels. NIPS, 2007.

[16] A. Nemirovski. Efficient methods in convex programming. Lecture Notes, 1994.

[17] Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course. Springer, 2003.

[18] Y. Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 103:127–152, 2005.

[19] D. Newman, S. Hettich, C. Blake, and C. Merz. UCI repository of machine learning datasets. 1998.

[20] C. S. Ong, X. Mary, S. Canu, and A. J. Smola. Learning with non-positive kernels. ICML, 2004.

[21] E. Pekalska, P. Paclik, and R. P. W. Duin. A generalized kernel approach to dissimilaritybased classification. J. of Machine Learning Research, 2: 175–211, 2002.

[22] V. Roth, J. Laub, M. Kawanabe, and J. M. Buhmann. Optimal cluster preserving embedding of nonmetric proximity data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25:1540–1551, 2003.

[23] H. Saigo, J.P.Vert and N. Ueda, and T. Akutsu. Protein homology detection using string alignment kernels. Bioinformatics, 20: 1682–1689., 2004.

[24] B. Sch¨ lkopf, and A.J. Smola. Learning with kernels: Support vector machines, regularizao tion, optimization, and beyond. The MIT Press, 2001. ´v

[25] A. J. Smola, Z. L. O´ ari, and R. C. Williamson. Regularization with dot-product kernels. NIPS, 2000.

[26] G. Wu, Z. Zhang, and E. Y. Chang. An analysis of transformation on non-positive semidefinite similarity matrix for kernel machines. Technical Report, UCSB, 2005. 9