jmlr jmlr2010 jmlr2010-82 jmlr2010-82-reference knowledge-graph by maker-knowledge-mining

82 jmlr-2010-On Learning with Integral Operators

Source: pdf

Author: Lorenzo Rosasco, Mikhail Belkin, Ernesto De Vito

Abstract: A large number of learning algorithms, for example, spectral clustering, kernel Principal Components Analysis and many manifold methods are based on estimating eigenvalues and eigenfunctions of operators deﬁned by a similarity function or a kernel, given empirical data. Thus for the analysis of algorithms, it is an important problem to be able to assess the quality of such approximations. The contribution of our paper is two-fold: 1. We use a technique based on a concentration inequality for Hilbert spaces to provide new much simpliﬁed proofs for a number of results in spectral approximation. 2. Using these methods we provide several new results for estimating spectral properties of the graph Laplacian operator extending and strengthening results from von Luxburg et al. (2008). Keywords: spectral convergence, empirical operators, learning integral operators, perturbation methods

reference text

P. M. Anselone. Collectively Compact Operator Approximation Theory and Applications to Integral Equations. Prentice-Hall Inc., Englewood Cliffs, N. J., 1971. With an appendix by Joel Davis, Prentice-Hall Series in Automatic Computation. N. Aronszajn. Theory of reproducing kernels. Trans. Amer. Math. Soc., 68:337–404, 1950. 932 O N L EARNING WITH I NTEGRAL O PERATORS F. Bauer, S. Pereverzev, and L. Rosasco. On regularization algorithms in learning theory. J. Complexity, 23(1):52–72, 2007. M. Belkin. Problems of Learning on Manifolds. PhD thesis, University of Chicago, USA, 2003. M. Belkin and P. Niyogi. Towards a theoretical foundation for Laplacian-based manifold methods. In Proceedings of the 18th Conference on Learning Theory (COLT), pages 486–500, 2005. M. Belkin and P. Niyogi. Convergence of Laplacian eigenmaps. In B. Sch¨ lkopf, J. Platt, and o T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pages 129–136. MIT Press, Cambridge, MA, 2007. V.I. Burenkov. Sobolev Spaces on Domains. B. G. Teubuer, Stuttgart-Leipzig, 1998. R.R. Coifman and S. Lafon. Geometric harmonics: a novel tool for multiscale out-of-sample extension of empirical functions. Appl. Comput. Harmon. Anal., 21(1):31–52, 2006. ISSN 1063-5203. E. De Vito, A. Caponnetto, and L. Rosasco. Model selection for regularized least-squares algorithm in learning theory. Foundations of Computational Mathematics, 5(1):59–85, February 2005a. E. De Vito, L. Rosasco, A. Caponnetto, U. De Giovannini, and F. Odone. Learning from examples as an inverse problem. Journal of Machine Learning Research, 6:883–904, May 2005b. E. De Vito, L. Rosasco, and A. Caponnetto. Discretization error analysis for Tikhonov regularization. Anal. Appl. (Singap.), 4(1):81–99, 2006. E. Gin´ and V. Koltchinskii. Empirical graph Laplacian approximation of Laplace-Beltrami operae tors: Large sample results. High Dimensional Probability, 51:238–259, 2006. M. Hein. Uniform convergence of adaptive graph-based regularization. In Proceedings of the 19th Annual Conference on Learning Theory (COLT), pages 50–64, New York, 2006. Springer. M. Hein, J-Y. Audibert, and U. von Luxburg. From graphs to manifolds - weak and strong pointwise consistency of graph Laplacians. In Proceedings of the 18th Conference on Learning Theory (COLT), pages 470–485, 2005. Student Paper Award. T. Kato. Perturbation Theory for Linear Operators. Springer, Berlin, 1966. T. Kato. Variation of discrete spectra. Commun. Math. Phys., III:501–504, 1987. V. Koltchinskii. Asymptotics of spectral projections of some random matrices approximating integral operators. Progress in Probabilty, 43, 1998. V. Koltchinskii and E. Gin´ . Random matrix approximation of spectra of integral operators. e Bernoulli, 6:113–167, 2000. S. Lafon. Diffusion Maps and Geometric Harmonics. PhD thesis, Yale University, USA, 2004. S. Lang. Real and Functional Analysis. Springer, New York, 1993. S. Mendelson and A. Pajor. Ellipsoid approximation with random vectors. In Proceedings of the 18th Annual Conference on Learning Theory (COLT), pages 429–433, New York, 2005. Springer. 933 ROSASCO , B ELKIN AND D E V ITO S. Mendelson and A. Pajor. On singular values of matrices with independent rows. Bernoulli, 12(5): 761–773, 2006. I. Pinelis. An approach to inequalities for the distributions of inﬁnite-dimensional martingales. Probability in Banach Spaces, 8, Proceedings of the 8th International Conference, pages 128– 134, 1992. L. Schwartz. Sous-espaces hilbertiens d’espaces vectoriels topologiques et noyaux associ´ s (noyaux e reproduisants). J. Analyse Math., 13:115–256, 1964. ISSN 0021-7670. J. Shawe-Taylor, N. Cristianini, and J. Kandola. On the concentration of spectral properties. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 511–517, Cambridge, MA, 2002. MIT Press. J. Shawe-Taylor, C. Williams, N. Cristianini, and J. Kandola. On the eigenspectrum of the Gram matrix and the generalization error of kernel-PCA. IEEE Trans. Inform. Theory, 51(7):2510– 2522, 2005. ISSN 0018-9448. A. Singer. From graph to manifold Laplacian: The convergence rate. Appl. Comput. Harmon. Anal.,, 21:128–134, 2006. S. Smale and D.X. Zhou. Learning theory estimates via integral operators and their approximations. Constr. Approx., 26(2):153–172, 2007. ISSN 0176-4276. S. Smale and D.X. Zhou. Geometry of probability spaces. Constr. Approx., 30(3):311–323, 2009. U. Von Luxburg, O. Bousquet, and M. Belkin. On the convergence of spectral clustering on random samples: the normalized case. In Proceedings of the 17th Annual Conference on Learning Theory (COLT 2004, pages 457–471. Springer, 2004. U. von Luxburg, M. Belkin, and O. Bousquet. Consistency of spectral clustering. Ann. Statist., 36 (2):555–586, 2008. Y. Yao, L. Rosasco, and A. Caponnetto. On early stopping in gradient descent learning. Constr. Approx., 26(2):289–315, 2007. L. Zwald and G. Blanchard. On the convergence of eigenspaces in kernel principal component analysis. In Y. Weiss, B. Sch¨ lkopf, and J. Platt, editors, Advances in Neural Information Processing o Systems 18, pages 1649–1656. MIT Press, Cambridge, MA, 2006. L. Zwald, O. Bousquet, and G. Blanchard. Statistical properties of kernel principal component analysis. In Learning theory, volume 3120 of Lecture Notes in Comput. Sci., pages 594–608. Springer, Berlin, 2004. 934