jmlr jmlr2006 jmlr2006-40 jmlr2006-40-reference knowledge-graph by maker-knowledge-mining

40 jmlr-2006-Infinite-σ Limits For Tikhonov Regularization


Source: pdf

Author: Ross A. Lippert, Ryan M. Rifkin

Abstract: We consider the problem of Tikhonov regularization with a general convex loss function: this formalism includes support vector machines and regularized least squares. For a family of kernels that includes the Gaussian, parameterized by a “bandwidth” parameter σ, we characterize the limiting ˜ solution as σ → ∞. In particular, we show that if we set the regularization parameter λ = λσ−2p , the regularization term of the Tikhonov problem tends to an indicator function on polynomials of degree ⌊p⌋ (with residual regularization in the case where p ∈ Z). The proof rests on two key ideas: epi-convergence, a notion of functional convergence under which limits of minimizers converge to minimizers of limits, and a value-based formulation of learning, where we work with regularization on the function output values (y) as opposed to the function expansion coefficients in the RKHS. Our result generalizes and unifies previous results in this area. Keywords: Tikhonov regularization, Gaussian kernel, theory, kernel machines


reference text

N. Aronszajn. Theory of reproducing kernels. Trans. Amer. Math. Soc., 68:337–404, 1950. C. J. C. Burges and D. J. Crisp. Uniqueness of the svm solution. In Neural Information Processing Systems, 1999. Theodoros Evgeniou, Massimiliano Pontil, and Tomaso Poggio. Regularization networks and support vector machines. Adv. In Comp. Math., 13(1):1–50, 2000. Federico Girosi, Michael Jones, and Tomaso Poggio. Regularization theory and neural networks architectures. Neural Computation, 7(2):219–269, 1995. 875 L IPPERT AND R IFKIN S. Sathiya Keerthi and Chih-Jen Lin. Asymptotic behaviors of support vector machines with Gaussian kernel. Neural Computation, 15(7):1667–1689, 2003. Ross A. Lippert and Ryan M. Rifkin. Asymptotics of gaussian regularized least squares. In Y. Weiss, B. Sch¨ lkopf, and J. Platt, editors, Adv. in Neural Info. Proc. Sys. 18. MIT Press, Cambridge, MA, o 2006. R. Tyrrell Rockafellar and Roger J. B. Wets. Variational Analysis. Springer, Berlin, 2004. Bernhard Sch¨ lkopf, Ralf Herbrich, and Alex J. Smola. A generalized representer theorem. In 14th o Annual Conference on Computational Learning Theory, pages 416–426, 2001. Grace Wahba. Spline Models for Observational Data, volume 59 of CBMS-NSF Regional Conference Series in Applied Mathematics. Soc. for Industrial & Appl. Math., 1990. Grace Wahba, Yi Lin, Yoonkyung Lee, and Hao Zhang. On the relation between the GACV and Joachims’ ξα method for tuning support vector machines, with extensions to the nonstandard case. Technical Report 1039, U. Wisconsin department of Statistics, 2001. URL citeseer.ist.psu.edu/wahba01relation.html. Changjiang Yang, Ramani Duraiswami, and Larry Davis. Efficient kernel machines using the improved fast gauss transform. In Lawrence K. Saul, Yair Weiss, and L´ on Bottou, editors, Ade vances in Neural Information Processing Systems 17, pages 1561–1568, Cambridge, MA, 2005. MIT Press. D.-X. Zhou. The covering number in learning theory. J. of Complexity, 18:739–767, 2002. 876