jmlr jmlr2005 jmlr2005-47 jmlr2005-47-reference knowledge-graph by maker-knowledge-mining

47 jmlr-2005-Learning from Examples as an Inverse Problem


Source: pdf

Author: Ernesto De Vito, Lorenzo Rosasco, Andrea Caponnetto, Umberto De Giovannini, Francesca Odone

Abstract: Many works related learning from examples to regularization techniques for inverse problems, emphasizing the strong algorithmic and conceptual analogy of certain learning algorithms with regularization algorithms. In particular it is well known that regularization schemes such as Tikhonov regularization can be effectively used in the context of learning and are closely related to algorithms such as support vector machines. Nevertheless the connection with inverse problem was considered only for the discrete (finite sample) problem and the probabilistic aspects of learning from examples were not taken into account. In this paper we provide a natural extension of such analysis to the continuous (population) case and study the interplay between the discrete and continuous problems. From a theoretical point of view, this allows to draw a clear connection between the consistency approach in learning theory and the stability convergence property in ill-posed inverse problems. The main mathematical result of the paper is a new probabilistic bound for the regularized least-squares algorithm. By means of standard results on the approximation term, the consistency of the algorithm easily follows. Keywords: statistical learning, inverse problems, regularization theory, consistency c 2005 Ernesto De Vito, Lorenzo Rosasco, Andrea Caponnetto, Umberto De Giovannini and Francesca Odone. D E V ITO , ROSASCO , C APONNETTO , D E G IOVANNINI AND O DONE


reference text

N. Alon, S. Ben-David, N. Cesa-Bianchi, and D. Haussler. Scale sensitive dimensions, uniform convergence, and learnability. Journal of the ACM, 44:615–631, 1997. A. Arbib, M. The Handbook of Brain Theory and Neural Networks. The MIT Press, Cambridge, MA, 1995. N. Aronszajn. Theory of reproducing kernels. Trans. Amer. Math. Soc., 68:337–404, 1950. P. L. Bartlett and S. Mendelson. Rademacher and Gaussian complexities. Journal of Machine Learning Research, 3:463–482, 2002. M. Bertero, C. De Mol, and E. R. Pike. Linear inverse problems with discrete data. I. General formulation and singular system analysis. Inverse Problems, 1(4):301–330, 1985. 901 D E V ITO , ROSASCO , C APONNETTO , D E G IOVANNINI AND O DONE M. Bertero, C. De Mol, and E. R. Pike. Linear inverse problems with discrete data. II. Stability and regularisation. Inverse Problems, 4(3):573–594, 1988. O. Bousquet and A. Elisseeff. Stability and generalization. Journal of Machine Learning Research, 2:499–526, 2002. D. Chen, Q. Wu, Y. Ying, and D. Zhou. Support vector machine soft margin classifiers: Error analysis. Journal of Machine Learning research, 5:1143–1175, 2004. F. Cucker and S. Smale. Best choices for regularization parameters in learning theory: on the bias-variance problem. Foundations of Computationals Mathematics, 2:413–428, 2002a. F. Cucker and S. Smale. On the mathematical foundations of learning. Bull. Amer. Math. Soc. (N.S.), 39(1):1–49 (electronic), 2002b. E. De Vito, A. Caponnetto, and L. Rosasco. Model selection for regularized least-squares algorithm in learning theory. to be published in Foundations of Computational Mathematics, 2004. L. Devroye, L. Gy¨ rfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Number 31 o in Applications of mathematics. Springer, New York, 1996. H. W. Engl, M. Hanke, and A. Neubauer. Regularization of inverse problems, volume 375 of Mathematics and its Applications. Kluwer Academic Publishers Group, Dordrecht, 1996. T. Evgeniou, M. Pontil, and T. Poggio. Regularization networks and support vector machines. Adv. Comp. Math., 13:1–50, 2000. L. Fine, T. Feedforward Neural Network Methodology. Springer-Verlag, 1999. C. W. Groetsch. The theory of Tikhonov regularization for Fredholm equations of the first kind, volume 105 of Research Notes in Mathematics. Pitman (Advanced Publishing Program), Boston, MA, 1984. M. Gy¨ rfi, L.and Kohler, A. Krzyzak, and H. Walk. A Distribution-free Theory of Non-parametric o Regression. Springer Series in Statistics, New York, 1996, 1996. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, New York, 2001. J. Kaipio and E. Somersalo. Statistical and Computational Inverse Problems. Springer, 2005. V. Kecman. Learning and Soft Computing. The MIT Press, Cambridge, MA, 2001. V. Kurkova. Learning from data as an inverse problem. In J. Antoch, editor, COMPSTAT2004, pages 1377–1384. Springer-Verlag, 2004. 902 L EARNING FROM E XAMPLES AS AN I NVERSE P ROBLEM S. Lang. Real and Functional Analysis. Springer, New York, 1993. S. Mukherjee, P. Niyogi, T. Poggio, and R. Rifkin. Statistical learning: Stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Technical Report CBCL Paper 223, Massachusetts Institute of Technology, january revision 2004. S. Mukherjee, R. Rifkin, and T. Poggio. Regression and classification with regularization. Lectures Notes in Statistics: Nonlinear Estimation and Classification, Proceedings from MSRI Workshop, 171:107–124, 2002. P. Niyogi and F. Girosi. Generalization bounds for function approximation from scattered noisy data. Adv. Comput. Math., 10:51–80, 1999. C.S. Ong and S. Canu. Regularization by early stopping. Technical report, Computer Sciences Laboratory, RSISE, ANU, 2004. I. F. Pinelis and A. I. Sakhanenko. Remarks on inequalities for probabilities of large deviations. Theory Probab. Appl., 30(1):143–148, 1985. ISSN 0040-361X. T. Poggio and F. Girosi. A theory of networks for approximation and learning. In C. Lau, editor, Foundation of Neural Networks, pages 91–106. IEEE Press, Piscataway, N.J., 1992. C. Rudin. A different type of convergence for statistical learning algorithms. Technical report, Program in Applied and Computational Mathematics Princeton University, 2004. B. Sch¨ lkopf and A.J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002. URL o http://www.learning-with-kernels.org. L. Schwartz. Sous-espaces hilbertiens d’espaces vectoriels topologiques et noyaux associ´ s (noyaux e reproduisants). J. Analyse Math., 13:115–256, 1964. C. Scovel and I. Steinwart. Fast rates support vector machines. submitted to Annals of Statistics, 2003. S. Smale and Yao Y. Online learning algorithms. Technical report, Toyota Technological Institute, Chicago, 2004. S. Smale and D. Zhou. Estimating the approximation error in learning theory. Analysis and Applications, 1(1):1–25, 2003. S. Smale and D. Zhou. Shannon sampling and function reconstruction from point values. Bull. Amer. Math. Soc. (N.S.), 41(3):279–305 (electronic), 2004a. S. Smale and D. Zhou. Shannon sampling II : Connections to learning theory. preprint, 2004b. 903 D E V ITO , ROSASCO , C APONNETTO , D E G IOVANNINI AND O DONE I. Steinwart. Sparseness of support vector machines. Journal of Machine Learning Research, 4: 1071–1105, 2003. I. Steinwart. Consistency of support vector machines and other regularized kernel machines. accepted on IEEE Transaction on Information Theory, 2004. A. N. Tikhonov, A. V. Goncharsky, V. V. Stepanov, and A. G. Yagola. Numerical methods for the solution of ill-posed problems, volume 328 of Mathematics and its Applications. Kluwer Academic Publishers Group, Dordrecht, 1995. Translated from the 1990 Russian original by R. A. M. Hoksbergen and revised by the authors. A.N. Tikhonov and V.Y. Arsenin. Solutions of Ill Posed Problems. W. H. Winston, Washington, D.C., 1977. V. N. Vapnik. Statistical learning theory. Adaptive and Learning Systems for Signal Processing, Communications, and Control. John Wiley & Sons Inc., New York, 1998. A Wiley-Interscience Publication. G. Wahba. Spline models for observational data, volume 59 of CBMS-NSF Regional Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1990. V. Yurinsky. Sums and Gaussian vectors, volume 1617 of Lecture Notes in Mathematics. SpringerVerlag, Berlin, 1995. T. Zhang. Leave-one-out bounds for kernel methods. Neural Computation, 13:1397–1437, 2003. 904