nips nips2003 nips2003-178 nips2003-178-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Thomas R. Strohmann, Andrei Belitski, Gregory Z. Grudic, Dennis DeCoste
Abstract: The Minimax Probability Machine Classification (MPMC) framework [Lanckriet et al., 2002] builds classifiers by minimizing the maximum probability of misclassification, and gives direct estimates of the probabilistic accuracy bound Ω. The only assumptions that MPMC makes is that good estimates of means and covariance matrices of the classes exist. However, as with Support Vector Machines, MPMC is computationally expensive and requires extensive cross validation experiments to choose kernels and kernel parameters that give good performance. In this paper we address the computational cost of MPMC by proposing an algorithm that constructs nonlinear sparse MPMC (SMPMC) models by incrementally adding basis functions (i.e. kernels) one at a time – greedily selecting the next one that maximizes the accuracy bound Ω. SMPMC automatically chooses both kernel parameters and feature weights without using computationally expensive cross validation. Therefore the SMPMC algorithm simultaneously addresses the problem of kernel selection and feature selection (i.e. feature weighting), based solely on maximizing the accuracy bound Ω. Experimental results indicate that we can obtain reliable bounds Ω, as well as test set accuracies that are comparable to state of the art classification algorithms.
[1] A. W. Marshall and I. Olkin. Multivariate chebyshev inequalities. Annals of Mathematical Statistics, 31(4):1001–1014, 1960.
[2] I. Popescu and D. Bertsimas. Optimal inequalities in probability theory: A convex optimization approach. Technical Report TM62, INSEAD, Dept. Math. O.R., Cambridge, Mass, 2001.
[3] G. R. G. Lanckriet, L. E. Ghaoui, C. Bhattacharyya, and M. I. Jordan. Minimax probability machine. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, Cambridge, MA, 2002. MIT Press.
[4] G. R. G. Lanckriet, L. E. Ghaoui, C. Bhattacharyya, and M. I. Jordan. A robust minimax approach to classification. Journal of Machine Learning Research, 3:555–582, 2002.
[5] B. Sch¨ lkopf and A. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002. o
[6] William H. Beyer. CRC Standard Mathemathical Tables, page 12. CRC Press Inc., Boca Raton, FL, 1987.
[7] K. Crammer, J. Keshet, and Y. Singer. Kernel design using boosting. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 15, Cambridge, MA, 2003. MIT Press. Appendix: Proof of Theorem 1 We have to show that the values of m are equal for the two dimensional √ MPMC and the k + 1 dimensional MPMC. We will just show the equivalence for the first term aT Σx a, an analogue argumentation will hold for the second term. For the two dimensional MPMC we have the following for the term under the square root: (k+1) σ (k) K (k+1) σ 2(k) a1 (k+1) (k+1) x x x a1 a2 2 (k+1) σK (k+1) (k) σ (k+1) a2 (19) Kx x x = (k+1) 2 [a1 (k+1) (k+1) a2 σ (k) K (k+1) x x ] σ 2(k) + 2a1 x (k+1) 2 + [a2 ] σ 2 (k+1) Kx Note that we can rewrite (1) (k) (1) (k) σ 2(k) = Cov(c0 + c1 Kx + ... + ck Kx , c0 + c1 Kx + ... + ck Kx ) x = σ (k) (k+1) x Kx = k i=1 k j=1 Cov(c0 + k (i) (j) ci cj Cov(Kx , Kx ) (1) c 1 Kx + ... + (i) (k) (k+1) c k Kx , Kx ) (20) (k+1) c Cov(Kx , Kx ) = i=1 i by using properties of the sample covariance (linearity, Cov(const, X) = 0). For the k + 1 dimensional MPMC let us first determine the k + 1 coefficients: (k+1) (1) (k) (k+1) (k+1) (k+1) = a1 (c0 + c1 Kx + ... + ck Kx ) + a2 Kx − b(k+1) (k+1) (1) (k+1) (k) (k+1) (k+1) (k+1) = a1 c1 Kx + ... + a1 c k Kx + a2 Kx + a1 c0 − b(k+1) The term under the square root then looks like: (k+1) (k+1) T σ 2 (1) ... σK (1) K (k) σK (1) K (k+1) a1 c1 a1 c1 Kx x x x x ... ... ... ... ... ... a(k+1) c σ (k) (1) ... σ 2 (k) σK (k) K (k+1) a(k+1) ck (21) Kx Kx k Kx 1 1 x x (k+1) (k+1) σK (k+1) K (1) ... σK (k+1) K (k) σ 2 (k+1) a2 a2 Kx x x x x Multiplying out (21) and substituting according to the equations in (20) yields exactly expression (19) (which is the aT Σx a term of the two dimensional MPM). Since this equivalence will hold likewise for the aT Σy a term in m, we have shown that m (and therefore Ω) is equal for the two dimensional and the k + 1 dimensional MPMC.