nips nips2005 nips2005-47 nips2005-47-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Régis Vert, Jean-philippe Vert
Abstract: We determine the asymptotic limit of the function computed by support vector machines (SVM) and related algorithms that minimize a regularized empirical convex loss function in the reproducing kernel Hilbert space of the Gaussian RBF kernel, in the situation where the number of examples tends to infinity, the bandwidth of the Gaussian kernel tends to 0, and the regularization parameter is held fixed. Non-asymptotic convergence bounds to this limit in the L2 sense are provided, together with upper bounds on the classification error that is shown to converge to the Bayes risk, therefore proving the Bayes-consistency of a variety of methods although the regularization term does not vanish. These results are particularly relevant to the one-class SVM, for which the regularization can not vanish by construction, and which is shown for the first time to be a consistent density level set estimator. 1
[1] B. E. Boser, I. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the 5th annual ACM workshop on Computational Learning Theory, pages 144–152. ACM Press, 1992.
[2] I. Steinwart. Support vector machines are universally consistent. J. Complexity, 18:768–791, 2002.
[3] T. Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. Stat., 32:56–134, 2004.
[4] I. Steinwart and C. Scovel. Fast rates for support vector machines using gaussian kernels. Technical report, Los Alamos National Laboratory, 2004. submitted to Annals of Statistics.
[5] I. Steinwart. Sparseness of support vector machines. J. Mach. Learn. Res., 4:1071–1105, 2003.
[6] P. L. Bartlett and A. Tewari. Sparseness vs estimating conditional probabilities: Some asymptotic results. In Lecture Notes in Computer Science, volume 3120, pages 564–578. Springer, 2004.
[7] A.N. Tikhonov and V.Y. Arsenin. Solutions of ill-posed problems. W.H. Winston, Washington, D.C., 1977.
[8] B. W. Silverman. On the estimation of a probability density function by the maximum penalized likelihood method. Ann. Stat., 10:795–810, 1982.
[9] B. Sch¨ lkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating the o support of a high-dimensional distribution. Neural Comput., 13:1443–1471, 2001.
[10] J. A. Hartigan. Estimation of a convex density contour in two dimensions. J. Amer. Statist. Assoc., 82(397):267–270, 1987.
[11] R. Vert and J.-P. Vert. Consistency and convergence rates of one-class svm and related algorithms. J. Mach. Learn. Res., 2006. To appear.
[12] R. A. DeVore and G. G. Lorentz. Constructive Approximation. Springer Grundlehren der Mathematischen Wissenschaften. Springer Verlag, 1993.
[13] P.I. Bartlett, M.I. Jordan, and J.D. McAuliffe. Convexity, classification and risk bounds. Technical Report 638, UC Berkeley Statistics, 2003.
[14] A. B. Tsybakov. On nonparametric estimation of density level sets. Ann. Stat., 25:948–969, June 1997.
[15] E. Mammen and A. Tsybakov. Smooth discrimination analysis. Ann. Stat., 27(6):1808–1829, 1999.
[16] P. Massart. Some applications of concentration inequalities to statistics. Ann. Fac. Sc. Toulouse, IX(2):245–303, 2000.
[17] P. L. Bartlett, O. Bousquet, and S. Mendelson. Local rademacher complexities. Annals of Statistics, 2005. To appear.
[18] V. Koltchinskii. Localized rademacher complexities. Manuscript, september 2003.