jmlr jmlr2007 jmlr2007-90 jmlr2007-90-reference knowledge-graph by maker-knowledge-mining

90 jmlr-2007-Value Regularization and Fenchel Duality

Source: pdf

Author: Ryan M. Rifkin, Ross A. Lippert

Abstract: Regularization is an approach to function learning that balances ﬁt and smoothness. In practice, we search for a function f with a ﬁnite representation f = ∑i ci φi (·). In most treatments, the ci are the primary objects of study. We consider value regularization, constructing optimization problems in which the predicted values at the training points are the primary variables, and therefore the central objects of study. Although this is a simple change, it has profound consequences. From convex conjugacy and the theory of Fenchel duality, we derive separate optimality conditions for the regularization and loss portions of the learning problem; this technique yields clean and short derivations of standard algorithms. This framework is ideally suited to studying many other phenomena at the intersection of learning theory and optimization. We obtain a value-based variant of the representer theorem, which underscores the transductive nature of regularization in reproducing kernel Hilbert spaces. We unify and extend previous results on learning kernel functions, with very simple proofs. We analyze the use of unregularized bias terms in optimization problems, and low-rank approximations to kernel matrices, obtaining new results in these areas. In summary, the combination of value regularization and Fenchel duality are valuable tools for studying the optimization problems in machine learning. Keywords: kernel machines, duality, optimization, convex analysis, kernel learning

reference text

Yasemin Altun and Alex Smola. Unifying divergence minimization and statistical inference via convex duality. In Proceedings of the 19th Annual Conference on Lwearning Theory, 2006. Yasemin Altun, David McAllester, and Mikhail Belkin. Maximum margin semi-supervised learning for structured variables. In Neural Information Processing Systems, 2005. Andreas Argyriou, Charles A. Micchelli, and Massimiliano Pontil. Learning convex combinations of continuously parametrized basic kernels. In Proceedings of the 18th Annual Conference on Learning Theory, 2005. Nachman Aronszajn. Theory of reproducing kernels. Transactions of the American Mathematical Society, 68:337–404, 1950. Christopher T. H. Baker. The Numerical Treatment of Integral Equations. Clarendon press, 1977. Mokhtar S. Bazaraa, Hanif D. Sherali, and C. M. Shetty. Nonlinear Programming: Theory and Algorithms. Wiley-Interscience, 2nd edition, 1993. Mikhail Belkin and Partha Niyogi. Semi-supervised learning on riemannian manifolds. Machine Learning, 56:209–239, 2003. Mikhail Belkin, Partha Niyogi, and Vikas Sindhwani. Manifold regularization: A geometric framework for learning from examples. Technical Report TR-2004-06, University of Chicago, 2004. Yoshua Bengio, Jean-Francois Paiement, Pascal Vincent, Olivier Delalleau, Nicolas Le Roux, and Marie Ouimet. Out-of-sample extensions for LLE, Isomap, MDS, eigenmaps and spectral clustering. In Neural Information Processing Systems, 2003. Jonathan M. Borwein and Adrian S. Lewis. Convex Analysis and Nonlinear Optimization. CMS Books in Mathematics. Springer, 2000. Scott Shaobing Chen, David L. Donoho, and Michael A. Saunders. Atomic decomposition by basis pursuit. Technical Report 479, Stanford University Department of Statistics, 1995. Corinna Cortes and Vladimir Vapnik. Support vector networks. Machine Learning, 20:1–25, 1995. Nello Cristianini and John Shaw-Taylor. An Introduction To Support Vector Machines. Cambridge University Press, 2000. 477 R IFKIN AND L IPPERT Miroslav Dud´k and Robert E. Schapire. Maximum entropy distribution estimation with generalized ı regularization. In Proceedings of the 19th Annual Conference on Lwearning Theory, pages 123– 138, 2006. Theodoros Evgeniou, Massimiliano Pontil, and Tomaso Poggio. Regularization networks and support vector machines. Advances In Computational Mathematics, 13(1):1–50, 2000. ˚ Federico Girosi. An equivalence between sparse approximation and support vecto machines. Neural Computation, 10:1455–1480, 1998. Tommi S. Jaakkola and David Haussler. Probabilistic kernel regression models. In Proceedings of the 1999 Conference on AI and Statistics. Morgan Kaufmann, 1999. Gert R. G. Lanckriet, Nello Cristianini, Peter Bartlett, Laurent El Ghaoui, and Michael I. Jordan. Learning the kernel matrix with semideﬁnite programming. Journal of Machine Learning Research, 5:24–72, 2004. Zhen Luo and Grace Wahba. Hybrid adaptive splines. Journal of the American Statistical Society, 92:107–116, 1997. Charles A. Micchelli. Interpolation of scattered data: distance matrices and conditionally positive functions. Constructive Approximation, 2(1):11–22, 1986. Tomaso Poggio and Federico Girosi. Networks for approximation and learning. Proceedings of the IEEE, 78(9):1481–1497, September 1990. Tomaso Poggio, Sayan Mukherjee, Ryan M. Rifkin, Alex Rakhlin, and Alessandro Verri. b. In Proceedings of the Conference on Uncertainty in Geometric Computations, 2001. Ali Rahimi, Benjamin Recht, and Trevor Darrell. Learning appearance manifolds from video. In Computer Vision and Pattern Recognition, 2005. Carl Edward Rasmusen and Chris Williams. Gaussian Processes for Machine Learning. MIT Press, 2006. Ben Recht. Unpublished phd thesis. 2006. Ryan M. Rifkin. Everything Old Is New Again: A Fresh Look at Historical Approaches to Machine Learning. PhD thesis, Massachusetts Institute of Technology, 2002. R. Tyrrell Rockafellar and Roger J. B. Wets. Variational Analysis. Springer, Berlin, 2004. Bernhard Sch¨ lkopf, Ralf Herbrich, and Alex J. Smola. A generalized representer theorem. In 14th o Annual Conference on Computational Learning Theory, pages 416–426, 2001. Vikas Sindhwani, Partha Niyogi, and Mikhail Belkin. Beyond the point cloud: from transductive to semi-supervised learning. In Proceedings of the 22nd International Conference on Machine Learning, 2005. Johan A. K. Suykens, Tony Van Gestel, Jos De Brabanter, Bart De Moor, and Joos Vandewalle. Least Squares Support Vector Machines. World Scientiﬁc, 2002. 478 VALUE R EGULARIZATION AND F ENCHEL D UALITY Andrei N. Tikhonov and Vasilii Y. Arsenin. Solutions of Ill-posed problems. W. H. Winston, Washington D.C., 1977. Vladimir Vapnik. Statistical Learning Theory. John Wiley & Sons, 1998. Grace Wahba. Spline Models for Observational Data, volume 59 of CBMS-NSF Regional Conference Series in Applied Mathematics. Society for Industrial & Applied Mathematics, 1990. ¨ Christopher K. I. Willams and Matthias Seeger. Using the Nystr om method to speed up kernel machines. In Advances in Neural Information Processing Systems, 2000. Ji Zhu, Saharon Rosset, Trevor Hastie, and Rob Tibshirani. 1-norm support vector machines. In Neural Information Processing Systems, 2003. 479