nips nips2008 nips2008-178 nips2008-178-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jooseuk Kim, Clayton Scott
Abstract: We provide statistical performance guarantees for a recently introduced kernel classifier that optimizes the L2 or integrated squared error (ISE) of a difference of densities. The classifier is similar to a support vector machine (SVM) in that it is the solution of a quadratic program and yields a sparse classifier. Unlike SVMs, however, the L2 kernel classifier does not involve a regularization parameter. We prove a distribution free concentration inequality for a cross-validation based estimate of the ISE, and apply this result to deduce an oracle inequality and consistency of the classifier on the sense of both ISE and probability of error. Our results also specialize to give performance guarantees for an existing method of L2 kernel density estimation. 1
[1] B. Sch¨ lkopf and A. J. Smola, Learning with Kernels, MIT Press, Cambridge, MA, 2002. o
[2] C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, 1995.
[3] J. Kim and C. Scott, “Kernel classification via integrated squared error,” IEEE Workshop on Statistical Signal Processing, August 2007.
[4] D. Kim, Least Squares Mixture Decomposition Estimation, unpublished doctoral dissertation, Dept. of Statistics, Virginia Polytechnic Inst. and State Univ., 1995.
[5] Mark Girolami and Chao He, “Probability density estimation from optimally condensed data samples,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 10, pp. 1253–1264, OCT 2003.
[6] B.A. Turlach, “Bandwidth selection in kernel density estimation: A review,” Technical Report 9317, C.O.R.E. and Institut de Statistique, Universit´ Catholique de Louvain, 1993. e
[7] David W.Scott, “Parametric statistical modeling by minimum integrated square error,” Technometrics 43, pp. 274–285, 2001.
[8] A.B. Tsybakov F. Bunea and M.H. Wegkamp, “Sparse density estimation with l1 penalties,” Proceedings of 20th Annual Conference on Learning Theory, COLT 2007, Lecture Notes in Artificial Intelligence, v4539, pp. 530– 543, 2007.
[9] Ph. Rigollet and A.B. Tsybakov, “Linear and convex aggregation of density estimators,” https:// hal.ccsd.cnrs.fr/ccsd-00068216, 2004.
[10] Robert Jenssen, Deniz Erdogmus, Jose C.Principe, and Torbjørn Eltoft, “Towards a unification of information theoretic learning and kernel method,” in Proc. IEEE Workshop on Machine Learning for Signal Processing (MLSP2004), Sao Luis, Brazil.
[11] Peter Hall and Matthew P.Wand, “On nonparametric discrimination using density differeces,” Biometrika, vol. 75, no. 3, pp. 541–547, Sept 1988.
[12] P. Meinicke, T. Twellmann, and H. Ritter, “Discriminative densities from maximum contrast estimation,” in Advances in Neural Information Proceeding Systems 15, Vancouver, Canada, 2002, pp. 985–992.
[13] M. Di Marzio and C.C. Taylor, “Kernel density classification and boosting: an l2 analysis,” Statistics and Computing, vol. 15, pp. 113–123(11), April 2005.
[14] E. Lehmann, Testing statistical hypotheses, Wiley, New York, 1986.
[15] M.P. Wand and M.C. Jones, Kernel Smoothing, Chapman & Hall, 1995.
[16] L. Devroye and G. Lugosi, “Combinatorial methods in density estimation,” 2001.
[17] Charles T. Wolverton and Terry J. Wagner, “Asymptotically optimal discriminant fucntions for pattern classification,” IEEE Trans. Info. Theory, vol. 15, no. 2, pp. 258–265, Mar 1969. 8